JP3852442B2

JP3852442B2 - Data encoding method and apparatus

Info

Publication number: JP3852442B2
Application number: JP2003428093A
Authority: JP
Inventors: 隆幸菅原; 順三鈴木
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2003-12-24
Filing date: 2003-12-24
Publication date: 2006-11-29
Anticipated expiration: 2019-03-12
Also published as: JP2004140867A

Description

本発明は、例えば符号化したオーディオ及びビデオデータをそれぞれ所定時間内に再生されるべきパック列としてユニット内に格納するデータ符号化方法及び装置に関し、特に、オーディオ及びビデオデータの符号化に先行して、それらの符号化データの符号量に相当する値から算出されるデータ長や開始アドレスなどのナビゲーションデータを記述するような符号化において、画質を安定に保ちながら実現することができるデータ符号化方法及び装置に関する。 The present invention relates to a data encoding method and apparatus for storing, for example, encoded audio and video data in a unit as a pack sequence to be reproduced within a predetermined time, and in particular, prior to encoding audio and video data. Data coding that can be realized while maintaining stable image quality in coding that describes navigation data such as the data length and start address calculated from the value corresponding to the code amount of the coded data The present invention relates to a method and an apparatus.

近年は、動画に対するデータ圧縮方式がＭＰＥＧ（Moving Picture Image Coding Expert Group）方式として国際標準化されるに至っている。このＭＰＥＧ方式は、映像データを可変圧縮する方式として知られている。このＭＰＥＧ方式には、ＭＰＥＧ１（ＭＰＥＧフェーズ１）やＭＰＥＧ２（ＭＰＥＧフェーズ２）と呼ばれる圧縮方式が規定されている。 In recent years, a data compression method for moving images has been internationally standardized as an MPEG (Moving Picture Image Coding Expert Group) method. This MPEG method is known as a method for variably compressing video data. In this MPEG system, compression systems called MPEG1 (MPEG phase 1) and MPEG2 (MPEG phase 2) are defined.

具体的には、ＭＰＥＧは、幾つかの技術を組み合わせて作成されており、先ず、入力画像信号から動き補償器で復号化した画像信号を差し引くことで時間冗長部分の削減を行う。 Specifically, MPEG is created by combining several techniques. First, the time redundant portion is reduced by subtracting the image signal decoded by the motion compensator from the input image signal.

予測の方法には、基本的なモードとして、過去の画像からの予測を行うモードと、未来の画像からの予測を行うモードと、過去と未来の両方の画像からの予測を行うモードとの３モードが存在する。またこれらのモードは、１６画素×１６画素のマクロブロック（ＭＢ：Macroblock）毎に切り替えて使用できる。予測方法は、入力画像に与えられたピクチャタイプ（Picture＿Type）によって決定される。ピクチャタイプには、片方向ピクチャ間予測符号化画像（Ｐピクチャ：P-picture）と、双方向ピクチャ間予測符号化画像（Ｂピクチャ：B-Picture）と、ピクチャ内独立符号化画像（Ｉピクチャ：I-picture）がある。過去の画像から予測を行って符号化するモードと予測をしないでそのマクロブロックを独立に符号化するモードとの２つのモードが存在するのが、Ｐピクチャ（片方向ピクチャ間予測符号化画像）である。また、未来の画像からの予測を行うモードと、過去の画像からの予測を行うモードと、過去と未来の両方の画像からの予測を行うモードと、予測をしないで独立に符号化するモードの４つのモードが存在するのが、Ｂピクチャ（双方向ピクチャ間予測符号化画像）である。そして、全てのマクロブロックを独立に符号化するのが、Ｉピクチャ（ピクチャ内独立符号化画像）である。なお、Ｉピクチャはイントラピクチャと呼ばれ、このため、片方向ピクチャ間予測符号化画像と双方向ピクチャ間予測符号化画像は非イントラピクチャということができる。 There are three basic prediction modes: a mode in which prediction is performed from a past image, a mode in which prediction is performed from a future image, and a mode in which prediction is performed from both past and future images. A mode exists. In addition, these modes can be switched and used for each macro block (MB) of 16 pixels × 16 pixels. The prediction method is determined by the picture type (Picture_Type) given to the input image. The picture types include a unidirectional inter-picture predictive encoded image (P picture: P-picture), a bidirectional inter-picture predictive encoded image (B picture: B-Picture), and an intra-picture independent encoded image (I picture). : I-picture). There are two modes, a mode in which prediction is performed from a past image and a mode in which the macroblock is independently encoded without prediction, and a P picture (unidirectional inter-picture predictive encoded image). It is. Also, there are a mode that performs prediction from future images, a mode that performs predictions from past images, a mode that performs predictions from both past and future images, and a mode that independently encodes without prediction. There are four modes for B pictures (bidirectional inter-picture predictive coded images). In addition, it is an I picture (intra-picture independent coded image) that codes all macroblocks independently. Note that an I picture is called an intra picture. Therefore, a unidirectional inter-picture predictive encoded image and a bidirectional inter-picture predictive encoded image can be referred to as non-intra pictures.

動き補償では、動き領域をマクロブロック毎にパターンマッチングすることによってハーフペル精度で動きベクトルを検出し、その検出した動きベクトルの動き分だけマクロブロックをシフトしてから予測する。動きベクトルは、水平方向と垂直方向の動きベクトルが存在し、何処からの予測かを示すＭＣ（Motion Compensation）モードとともにマクロブロックの付加情報として伝送される。 In motion compensation, a motion vector is detected with half-pel accuracy by pattern matching of the motion region for each macroblock, and the macroblock is shifted by the amount of motion of the detected motion vector before prediction. The motion vector includes a motion vector in the horizontal direction and the vertical direction, and is transmitted as additional information of the macroblock together with an MC (Motion Compensation) mode indicating a prediction from where.

Ｉピクチャから次のＩピクチャの前のピクチャまではＧＯＰ（Group Of Picture）と呼ばれ、蓄積メディアなどで使用される場合には、一般に約１５ピクチャ程度が１ＧＯＰとして使用される。 From the I picture to the picture before the next I picture is called GOP (Group Of Picture), and when used in storage media, about 15 pictures are generally used as 1 GOP.

図８には、ＭＰＥＧが適用されるオーディオビデオ符号化装置のうち、ビデオエンコーダの基本的な構成を示している。 FIG. 8 shows a basic configuration of a video encoder in an audio video encoding apparatus to which MPEG is applied.

この図８において、入力端子１０１には入力画像信号が供給され、この入力画像信号は演算器１０２と後述する動き補償予測器１１１に送られる。 In FIG. 8, an input image signal is supplied to an input terminal 101, and this input image signal is sent to a calculator 102 and a motion compensation predictor 111 described later.

演算器１０２では、動き補償予測器１１１にて復号化した画像信号と入力画像信号との差分が求められ、その差分画像信号がＤＣＴ器１０３に送られる。 The computing unit 102 obtains the difference between the image signal decoded by the motion compensation predictor 111 and the input image signal, and sends the difference image signal to the DCT unit 103.

ＤＣＴ器１０３では、供給された差分画像信号を直交変換する。ここでＤＣＴ（Discrete Cosine Transform）とは、余弦関数を積分核とした積分変換を有限空間への離散変換とする直交変換である。ＭＰＥＧではマクロブロックを４分割した８×８のＤＣＴブロックに対して、２次元ＤＣＴを行う。なお、一般に、ビデオ信号は低域成分が多く、高域成分が少ないため、ＤＣＴを行うと係数が低域に集中する。このＤＣＴ器１０３でのＤＣＴによって得られたデータ（ＤＣＴ係数）は、量子化器１０４に送られる。 The DCT unit 103 orthogonally transforms the supplied difference image signal. Here, DCT (Discrete Cosine Transform) is an orthogonal transformation in which an integral transformation using a cosine function as an integral kernel is a discrete transformation into a finite space. In MPEG, two-dimensional DCT is performed on an 8 × 8 DCT block obtained by dividing a macroblock into four. In general, a video signal has many low-frequency components and few high-frequency components. Therefore, when DCT is performed, coefficients are concentrated in a low frequency. Data (DCT coefficient) obtained by DCT in the DCT unit 103 is sent to the quantizer 104.

量子化器１０４では、ＤＣＴ器１０３からのＤＣＴ係数を量子化する。この量子化器１０４における量子化では、量子化マトリックスという８×８の２次元周波数を視覚特性で重み付けした値と、その全体をスカラー倍する量子化スケールという値で乗算した値とを量子化値として、ＤＣＴ係数をその量子化値で除算する。なお、当該ビデオエンコーダにて符号化された後の符号化データを、後にデコーダ（ビデオ復号装置）で復号して逆量子化するときは、そのビデオエンコーダにて使用した量子化値で乗算を行うことにより、元のＤＣＴ係数に近似している値を得ることができる量子化器１０４にて量子化されたデータは、可変長符号化器（ＶＬＣ）１０５に送られる。 The quantizer 104 quantizes the DCT coefficient from the DCT unit 103. In the quantization in the quantizer 104, a quantized value is obtained by multiplying a value obtained by weighting an 8 × 8 two-dimensional frequency called a quantization matrix with a visual characteristic and a value called a quantization scale for multiplying the whole by a scalar. Then, the DCT coefficient is divided by the quantized value. In addition, when the encoded data after being encoded by the video encoder is decoded later by a decoder (video decoding device) and inversely quantized, multiplication is performed with the quantization value used by the video encoder. Thus, the data quantized by the quantizer 104 capable of obtaining a value approximating the original DCT coefficient is sent to the variable length encoder (VLC) 105.

ＶＬＣ１０５は、量子化器１０４からの量子化データを可変長符号化する。このＶＬＣ１０５では、量子化された値のうち、直流（ＤＣ）成分に対しては、予測符号化の一つであるＤＰＣＭ（differential pulse code modulation）を使用して符号化する。一方、交流（ＡＣ）成分に対しては、低域から高域に向けていわゆるジグザグスキャン（zigzag scan）を行い、ゼロのラン長及び有効係数値を１つの事象とし、出現確率の高いものから符号長の短い符号を割り当てていく、いわゆるハフマン符号化を行う。また、このＶＬＣ１０５には、動き補償予測器１１１から動きベクトルと予測モードの情報も供給され、当該ＶＬＣ１０５は、可変長符号化データと共に、これら動きベクトルと予測モードの情報をマクロブロックの付加情報として出力する。ＶＬＣ１０５にて可変長符号化されたデータは、バッファメモリ１０６に送られる。 The VLC 105 performs variable length coding on the quantized data from the quantizer 104. In the VLC 105, the direct current (DC) component among the quantized values is encoded using DPCM (differential pulse code modulation) which is one of predictive encoding. On the other hand, for alternating current (AC) components, a so-called zigzag scan is performed from low to high, and the zero run length and effective coefficient value are considered as one event, and the occurrence probability is high. So-called Huffman coding is performed, in which codes having a short code length are assigned. The VLC 105 is also supplied with motion vector and prediction mode information from the motion compensated predictor 111, and the VLC 105 uses the motion vector and prediction mode information as additional information of the macroblock along with variable length encoded data. Output. The data that has been variable-length encoded by the VLC 105 is sent to the buffer memory 106.

バッファメモリ１０６では、ＶＬＣ１０５からの可変長符号化データを一時蓄える。その後、このバッファメモリ１０６から所定の転送レートで読み出された符号化データ（符号化ビットストリーム）は、出力端子１１３から出力されることになる。 The buffer memory 106 temporarily stores variable length encoded data from the VLC 105. Thereafter, the encoded data (encoded bit stream) read from the buffer memory 106 at a predetermined transfer rate is output from the output terminal 113.

また、その出力される符号化データにおけるマクロブロック毎の発生符号量情報は、後述する符号量制御器１１２に送信される。この符号量制御器１１２は、マクロブロック毎の発生符号量と目標符号量との差分である誤差符号量を求め、当該誤差符号量に応じた符号量制御信号を生成して量子化器１０４にフィードバックすることにより、発生符号量制御を行う。当該符号量制御のために量子化器１０４にフィードバックされる符号量制御信号は、量子化器１０４における量子化スケールを制御するための信号である。 The generated code amount information for each macroblock in the output encoded data is transmitted to a code amount controller 112 described later. The code amount controller 112 obtains an error code amount that is a difference between the generated code amount and the target code amount for each macroblock, generates a code amount control signal corresponding to the error code amount, and sends it to the quantizer 104. The generated code amount is controlled by feedback. The code amount control signal fed back to the quantizer 104 for the code amount control is a signal for controlling the quantization scale in the quantizer 104.

一方、量子化器１０４にて量子化された画像データは、逆量子化器１０７にも送られる。 On the other hand, the image data quantized by the quantizer 104 is also sent to the inverse quantizer 107.

この逆量子化器１０７では、量子化器１０４からの量子化データを逆量子化する。この逆量子化により得られたＤＣＴ係数データは、逆ＤＣＴ器１０８に送られる。 In the inverse quantizer 107, the quantized data from the quantizer 104 is inversely quantized. The DCT coefficient data obtained by the inverse quantization is sent to the inverse DCT unit 108.

逆ＤＣＴ器１０８は、逆量子化器１０７からのＤＣＴ係数データを逆ＤＣＴした後、演算器１０９に送る。 The inverse DCT unit 108 performs inverse DCT on the DCT coefficient data from the inverse quantizer 107 and then sends it to the computing unit 109.

演算器１０９では、逆ＤＣＴ器１０８の出力信号に動き補償予測器１１１からの予測差分画像を加算する。これにより、画像信号が復元される。 The arithmetic unit 109 adds the prediction difference image from the motion compensation predictor 111 to the output signal of the inverse DCT unit 108. Thereby, the image signal is restored.

この復元された画像信号は、画像メモリ１１０に一時蓄えられた後、読み出されて動き補償予測器１１１に送られる。 The restored image signal is temporarily stored in the image memory 110, read out, and sent to the motion compensation predictor 111.

画像メモリ１１０から動き補償予測器１１１に送られた画像信号は、演算器１０２にて差分画像を計算するためのリファレンスの復号化画像を生成するために使用される。 The image signal sent from the image memory 110 to the motion compensation predictor 111 is used by the computing unit 102 to generate a reference decoded image for calculating a difference image.

動き補償予測器１１１では、入力画像信号から動きベクトルを検出し、その検出した動きベクトルの動き分だけ画像をシフトしてから予測を行う。この予測によりえられた予測差分画像信号が、演算器１０２及び１０９に送られることになる。また、動き補償予測器１１１にて検出された動きベクトルは、予測モード（ＭＣモード）の情報と共に、ＶＬＣ１０５に送られる。 The motion compensation predictor 111 detects a motion vector from the input image signal, and performs prediction after shifting the image by the amount of motion of the detected motion vector. The prediction difference image signal obtained by this prediction is sent to the computing units 102 and 109. The motion vector detected by the motion compensation predictor 111 is sent to the VLC 105 together with information on the prediction mode (MC mode).

なお、上述のように差分画像信号の符号化を行うのはＰピクチャ及びＢピクチャの場合であり、Ｉピクチャの場合には入力画像信号をそのまま符号化する。 As described above, the differential image signal is encoded in the case of the P picture and the B picture. In the case of the I picture, the input image signal is encoded as it is.

図９には、図８に示したビデオエンコーダにて符号化された符号化データを復号するビデオデコーダの基本的な構成を示す。 FIG. 9 shows a basic configuration of a video decoder that decodes encoded data encoded by the video encoder shown in FIG.

この図９において、入力端子１２１には符号化データが供給される。この符号化データは、可変長復号化器（ＶＬＤ）１２２に送られる。このＶＬＤ１２２は、図８のＶＬＣ１０５における可変長符号化の逆処理である可変長復号化を行う。当該可変長復号により得られるデータは、図８のＶＬＣ１０５への入力である量子化データに、動きベクトル及び予測モードの情報が付加されたものに相当する。ＶＬＤ１２２での可変長復号化により得られた量子化データは、逆量子化器１２３に送られる。 In FIG. 9, encoded data is supplied to the input terminal 121. This encoded data is sent to a variable length decoder (VLD) 122. The VLD 122 performs variable length decoding, which is the inverse process of variable length coding in the VLC 105 of FIG. The data obtained by the variable length decoding corresponds to data obtained by adding motion vector and prediction mode information to quantized data that is input to the VLC 105 in FIG. The quantized data obtained by variable length decoding in the VLD 122 is sent to the inverse quantizer 123.

逆量子化器１２３では、ＶＬＤ１２２からの量子化データを逆量子化する。当該逆量子化されたデータは、図８の量子化器１０４への入力であるＤＣＴ係数データに相当する。この逆量子化器１２３での逆量子化により得られたＤＣＴ係数データは、逆ＤＣＴ器１２４に送られる。また、動きベクトル及び予測モードの情報は、当該逆量子化器１２３から動き補償予測器１２７に送られる。 In the inverse quantizer 123, the quantized data from the VLD 122 is inversely quantized. The inversely quantized data corresponds to DCT coefficient data that is input to the quantizer 104 in FIG. The DCT coefficient data obtained by the inverse quantization in the inverse quantizer 123 is sent to the inverse DCT unit 124. Also, the motion vector and prediction mode information are sent from the inverse quantizer 123 to the motion compensated predictor 127.

逆ＤＣＴ器１２４では、逆量子化器１２３からのＤＣＴ係数を逆ＤＣＴする。
当該逆ＤＣＴ器１２４にて逆ＤＣＴされたデータは、図８のＤＣＴ器１０３への入力である差分画像信号に相当する。この逆ＤＣＴ器１２４にて逆ＤＣＴされた差分画像信号は、演算器１２５に送られる。 The inverse DCT unit 124 performs inverse DCT on the DCT coefficient from the inverse quantizer 123.
Data subjected to inverse DCT by the inverse DCT unit 124 corresponds to a differential image signal that is input to the DCT unit 103 in FIG. The difference image signal subjected to the inverse DCT by the inverse DCT unit 124 is sent to the arithmetic unit 125.

演算器１２５では、逆ＤＣＴ器１２４からの差分画像信号に、動き補償予測器１２７からの予測差分画像を加算する。これにより、復号化データすなわち画像信号が復元される。この復元された画像信号は、図８の入力端子１０１への入力画像信号に略々相当する。当該復元された画像信号（復号化データ）は、出力端子１２８から出力されると同時に、一時、画像メモリ１２６に蓄えられた後、動き補償予測器１２７に送られる。 The computing unit 125 adds the prediction difference image from the motion compensation predictor 127 to the difference image signal from the inverse DCT unit 124. Thereby, the decoded data, that is, the image signal is restored. The restored image signal substantially corresponds to the input image signal to the input terminal 101 in FIG. The restored image signal (decoded data) is output from the output terminal 128, and at the same time, temporarily stored in the image memory 126, and then sent to the motion compensation predictor 127.

動き補償予測器１２７では、動きベクトル及び予測モードに基づいて、画像メモリ１２６から供給された画像信号から予測差分画像を生成し、この予測差分画像を演算器１２５に送る。 The motion compensated predictor 127 generates a prediction difference image from the image signal supplied from the image memory 126 based on the motion vector and the prediction mode, and sends this prediction difference image to the calculator 125.

ＭＰＥＧ２では、前述したように、ビデオデータ及びオーディオデータを同期して転送、且つ再生できるように、それぞれのデータを基準時刻を用いて表現した転送開始時刻と再生時刻を設定することが規定されているが、これらの転送開始時刻や再生開始時刻の情報だけでは、通常再生には問題がないものの、早送りや巻き戻し再生、ランダム再生等の特殊再生や、インタラクティブ性をシステムに持たせる等の再生処理が困難であることが指摘されている。 As described above, MPEG2 stipulates that a transfer start time and a reproduction time in which each data is expressed using a reference time are set so that video data and audio data can be transferred and reproduced in synchronization. However, these transfer start time and playback start time information alone are not a problem for normal playback, but special playback such as fast-forward, rewind playback, and random playback, and playback that gives the system interactivity, etc. It has been pointed out that processing is difficult.

このようなことから、特開平８−２７３３０４号公報に開示されているように、ＭＰＥＧにて符号化されたオーディオ及びビデオデータを所定時間内に再生されるべきパック列としてビデオオブジェクトユニット内に格納し、さらに、このユニットを再生するための再生情報及びサーチをするためのサーチ情報を、当該パック列の先頭にナビゲーションデータとして記録したようなアプリケーションが存在する。 For this reason, as disclosed in JP-A-8-273304, audio and video data encoded in MPEG is stored in a video object unit as a pack sequence to be reproduced within a predetermined time. Furthermore, there is an application in which reproduction information for reproducing this unit and search information for searching are recorded as navigation data at the head of the pack row.

ビデオオブジェクトユニット及びナビゲーションデータについては特開平８−２７３３０４号公報にて既に開示及び詳述されているため、その詳細な説明は省略するが、図１０に示すように、ビデオオブジェクトユニット８５は複数集まってセル８４を構成し、またセル８４は複数集まってビデオオブジェクト８３を構成し、さらに、このビデオオブジェクト８３が複数集まってビデオオブジェクトセット８２を構成している。 Since the video object unit and the navigation data have already been disclosed and detailed in Japanese Patent Laid-Open No. 8-273304, the detailed description thereof is omitted, but a plurality of video object units 85 are gathered as shown in FIG. A plurality of cells 84 constitute a video object 83, and a plurality of the video objects 83 constitute a video object set 82.

ビデオオブジェクトユニット８５は、１つのナビゲーションパック８６を先頭に有するパック列として定義されている。また、このビデオオブジェクトユニット８５内には、ＭＰＥＧ規格に定められたビデオパック８８、副映像パック９０及びオーディオパック９１が配置される。また、ビデオオブジェクトユニット８５には再生順序に従った番号が付されており、当該ビデオオブジェクトユニット８５の再生時間はビデオオブジェクトユニット８５中に含まれる単数又は複数個のＧＯＰから構成されるビデオデータの再生時間に相当する。 The video object unit 85 is defined as a pack row having one navigation pack 86 at the head. In the video object unit 85, a video pack 88, a sub-picture pack 90, and an audio pack 91 defined in the MPEG standard are arranged. The video object unit 85 is numbered according to the playback order, and the playback time of the video object unit 85 is the video data composed of one or more GOPs included in the video object unit 85. Corresponds to playback time.

ナビゲーションパック８６には、ビデオオブジェクトユニット８５を再生するための再生制御情報及びサーチをするためのサーチ情報等が、ナビゲーションデータとして配されている。再生制御情報は、ビデオオブジェクトユニット８５内のビデオデータの再生状態に同期してプレゼンテーションするため、つまり表示の内容を変更するためのナビゲーションデータである。すなわち再生制御情報は、プレゼンテーションデータの状態に従って再生条件を決定するための情報であり、データストリーム上に分散配置されたリアルタイム制御データである。また、サーチ情報は、ビデオオブジェクトユニット８５のサーチを実行する為のナビゲーションデータである。すなわち、当該サーチ情報は、順早送り／逆早戻し再生とシームレス再生のための情報であり、データストリーム上に分散配置されたリアルタイム制御データである。 In the navigation pack 86, reproduction control information for reproducing the video object unit 85, search information for searching, and the like are arranged as navigation data. The reproduction control information is navigation data for making a presentation in synchronization with the reproduction state of the video data in the video object unit 85, that is, for changing the contents of display. That is, the playback control information is information for determining playback conditions according to the state of presentation data, and is real-time control data distributed and arranged on the data stream. The search information is navigation data for executing a search for the video object unit 85. That is, the search information is information for forward / reverse rewind playback and seamless playback, and is real-time control data distributed on the data stream.

特に、ビデオオブジェクトユニット８５をサーチするためのサーチ情報には、セル８４内の先頭アドレスを特定する為の情報が記述される。すなわち、ビデオオブジェクトユニット８５のサーチ情報には、当該サーチ情報を含むビデオオブジェクトユニット８５を基準の第０番とし、再生順序に従って順方向に再生するためのアドレス（フォワードアドレス）として、第１番（＋１）から第２０番（＋２０）、第６０番（＋６０）、第１２０番（＋１２０）及び第２４０番（＋２４０）までのビデオオブジェクトユニット８５の番号（スタートアドレス）が記載される。同様に、ビデオオブジェクトユニット８５のサーチ情報には、当該サーチ情報を含むビデオオブジェクトユニット８５を基準の第０番とし、再生順序とは逆方向に再生するためのアドレス（バックワードアドレス）として第１番（−１）から第２０番（−２０）、第６０番（−６０）、第１２０番（−１２０）及び第２４０番（−２４０）までのビデオオブジェクトユニット８５のスタートアドレスが記載される。 In particular, in the search information for searching the video object unit 85, information for specifying the head address in the cell 84 is described. That is, the search information of the video object unit 85 includes the video object unit 85 including the search information as the reference number 0, and the address (forward address) for playback in the forward direction according to the playback order as the first ( The numbers (start addresses) of the video object units 85 from +1) to 20th (+20), 60th (+60), 120th (+120) and 240th (+240) are described. Similarly, the search information of the video object unit 85 includes the video object unit 85 including the search information as the reference number 0, and the first address as the address (backward address) for playback in the reverse direction to the playback order. The start addresses of the video object units 85 from number (-1) to number 20 (-20), number 60 (-60), number 120 (-120) and number 240 (-240) are described. .

ところで、上述のようなビデオオブジェクトユニットを再生するための再生制御情報及びサーチをするためのサーチ情報を含むナビゲーションデータを、ＭＰＥＧ符号化を開始する前に、ナビゲーションパック内に記述するためには、記憶容量の大きなメモリが必要であり、さらに当該符号化が終了した後に、その符号化結果（符号量）を観測して、所定の再生情報を算出して、ナビゲーションデータを生成しなければならない。 By the way, in order to describe the navigation data including the playback control information for playing back the video object unit as described above and the search information for searching in the navigation pack before starting the MPEG encoding, large memory storage capacity is required, after further the encoding is completed, by observing the encoded result (amount of codes), and calculates the predetermined reproduction information, it must generate the navigation data .

また、特開平８−２７３３０４号公報に記載されているように、そのビデオオブジェクトユニットを再生順序で第０番とし、そのビデオオブジェクトユニットを基準として、少なくともその再生順序で前後１５番まで再生されるビデオオブジェクトユニットのアドレス、再生順序において第２０番、第３０番、第６０番、第１２０番、及び第２４０番までのビデオオブジェクトユニットのアドレスを記述しようとした場合、基本的にＭＰＥＧビデオの符号化データが可変長符号化によるものであるため、いわゆる２パスによる符号化などのように、全部のビデオ符号化データが揃ってからでないと、ビデオオブジェクトユニットのアドレスを算出することができず、したがって、リアルタイムな符号化とナビゲーションデータの記録が出来ない。 Also, as described in Japanese Patent Laid-Open No. 8-273304, the video object unit is numbered 0 in the playback order, and the video object unit is played back at least up to 15th in the playback order with reference to the video object unit. When an attempt is made to describe video object unit addresses up to No. 20, No. 30, No. 60, No. 120 and No. 240 in the video object unit address and playback order, the MPEG video code is basically used. Since the encoded data is based on variable-length encoding, the address of the video object unit cannot be calculated unless all the video encoded data is prepared, as in the so-called 2-pass encoding. Therefore, real-time encoding and navigation data recording are not possible. .

本発明は、上述の課題に鑑みてなされたものであり、最小限の容量のメモリで、ビデオオブジェクトユニットを再生するための再生制御情報及びサーチをするためのサーチ情報を記述するナビゲーションデータを、符号化が開始される前に記述することを可能とし、また、どのような符号化レートであっても最適な画質を維持したまま、リアルタイムな符号化とナビゲーションデータの記載とを可能とするデータ符号化方法及び装置の提供を目的とする。 The present invention has been made in view of the above-described problems, and navigation data describing playback control information for playing back a video object unit and search information for searching, with a minimum amount of memory, Data that can be described before encoding starts, and that enables real-time encoding and description of navigation data while maintaining optimum image quality at any encoding rate It is an object to provide an encoding method and apparatus.

請求項１記載の本発明に係るデータ符号化方法は、上述の課題を解決するために、所定単位の入力データを符号化する際に、符号化レートを決定し、この符号化レートを用いて復号時の復号バッファに相当する仮想バッファのバッファ占有量が復号時点で所定の値よりもアンダーフローしないように符号化し、符号化された符号化データを複数のユニット内に格納するデータ符号化方法において、
前記入力データのうち、ＭＰＥＧのＧＯＰ構造中で他のピクチャの復号時に参照されるリファレンスピクチャとして符号化される各ピクチャに対しては、前記各ピクチャの復号時点における前記仮想バッファの各占有量を前記符号化レートに対応して前記所定の値よりも大きな値として求め、前記各ピクチャを、前記仮想バッファの各占有量がその求めた前記仮想バッファの各占有量となるように符号化し、
前記符号化された符号化データを所定時間内に再生されるべきデータ毎に各ユニット内に格納し、
一のユニットの時間的に前後に再生される所定数のユニットのアドレスと、前記一のユニット内における前記ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスとを、前記符号化レートに基づいて求め、
前記所定数のユニットのアドレス及び前記一のユニット内における前記ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスを前記一のユニット内の先頭に格納することを特徴とすることを特徴とするものである。 In order to solve the above-described problem, the data encoding method according to the first aspect of the present invention determines an encoding rate when encoding a predetermined unit of input data, and uses the encoding rate. Data encoding that encodes so that the buffer occupancy of the virtual buffer corresponding to the decoding buffer at the time of decoding does not underflow below a predetermined value at the time of decoding , and stores the encoded data in a plurality of units In the method
Of the entering force data for each pin Chi catcher encoded as a reference picture to be referred to when decoding other pictures in the GOP structure in the MPEG, each of the virtual buffer at the decoding time of each picture The occupation amount is obtained as a value larger than the predetermined value corresponding to the encoding rate, and each picture is encoded so that each occupation amount of the virtual buffer becomes the obtained occupation amount of the virtual buffer. And
The encoded encoded data is stored in each unit for each data to be reproduced within a predetermined time,
And addresses of a predetermined number of units to be played back and forth in time of one unit, and the end address of the first intra-picture data of the GOP structure of the MPEG in said one unit, the coding rate Based on
Characterized in that said storing the first th end address of the intra-picture data of the GOP structure of the MPEG in the predetermined number of units of addresses and said one unit to the top of said one unit It is what.

請求項２記載の本発明に係るデータ符号化装置は、上述の課題を解決するために、所定単位の入力データに対して符号化レートを決定する符号化レート決定手段と、
前記符号化レート決定手段からの前記符号化レートを用いて、復号時の復号バッファに相当する仮想バッファのバッファ占有量が復号時点で所定の値よりもアンダーフローしないように符号化する符号化手段であり、前記入力データのうち、ＭＰＥＧのＧＯＰ構造中で他のピクチャの復号時に参照されるリファレンスピクチャとして符号化される各ピクチャに対しては、前記各ピクチャの復号時点における前記仮想バッファの各占有量を前記符号化レートに対応して前記所定の値よりも大きな値として求め、前記各ピクチャを、前記仮想バッファの各占有量がその求めた前記仮想バッファの各占有量となるように符号化する符号化手段と、
前記符号化手段で符号化された前記符号化データを所定時間内に再生されるべきデータ毎に各ユニット内に格納するユニット化手段と、
一のユニットの時間的に前後に再生される所定数のユニットのアドレスと、前記一のユニット内における前記ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスとを、前記符号化レートに基づいて求めるアドレス決定手段と、
前記所定数のユニットのアドレス及び前記一のユニット内における前記ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスを前記一のユニット内の先頭に格納する格納手段と、
を有することを特徴とするものである。 In order to solve the above-described problem, the data encoding device according to the present invention described in claim 2 is an encoding rate determining unit that determines an encoding rate for input data of a predetermined unit;
Encoding means for encoding so that the buffer occupancy of the virtual buffer corresponding to the decoding buffer at the time of decoding does not underflow below a predetermined value at the time of decoding using the encoding rate from the encoding rate determining means Among the input data, for each picture encoded as a reference picture to be referenced when decoding other pictures in the MPEG GOP structure, each of the virtual buffers at the time of decoding of each picture The occupation amount is obtained as a value larger than the predetermined value corresponding to the encoding rate, and each picture is encoded so that each occupation amount of the virtual buffer becomes the obtained occupation amount of the virtual buffer. Encoding means for
Unitizing means for storing the encoded data encoded by the encoding means in each unit for each data to be reproduced within a predetermined time;
An address of a predetermined number of units reproduced before and after one unit of time and an end address of the first intra-picture data in the MPEG GOP structure in the one unit are expressed as the encoding rate. Address determining means to be obtained based on
Storage means for storing the addresses of the predetermined number of units and the end address of the first intra-picture data in the MPEG GOP structure in the one unit at the head in the one unit;
It is characterized by having .

本発明に係るデータ符号化方法及びデータ符号化装置によれば、所定数のユニットのアドレス及び一のユニット内におけるＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスを符号化レートに基づいて計算により求めて、一のユニット内の先頭に格納することにより、最小限の容量のメモリを用いて、例えばユニットを再生するための再生制御情報及びサーチをするためのサーチ情報としての前記アドレスを、符号化が開始される前に記録することが可能となり、リアルタイム符号化が可能となる。また、符号化レートに対応して復号時の復号バッファに相当する仮想バッファのバッファ占有量を制御できるので、符号化を行うに当って、それらの値を統計的にもっとも信号品質が良くなる値にすることで、どのような符号化レートでも最適な信号品質を維持したまま、符号化を行うことが可能となる。とくに、ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスを一のユニット内の先頭に格納することで、毎ＧＯＰのはじめのイントラピクチャデータだけをアクセスして、デコーダーに伝送することができる。即ち、本来ならば、イントラをＶＬＣまでとくか、次のピクチャーヘッダーをサーチしていかないとイントラの終了位置が分からないのに対して、このデータ符号化方法及びデータ符号化装置を用いれば、ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスに従ってそこまでのデータを、デコーダーに簡単にスピーディに伝送することが可能となる。これによりＭＰＥＧなどの可変長符号化データの１５倍速などのトリックプレイが、簡単にスピーディな処理だけで実現できる。 According to the data encoding method and a data encoding apparatus according to the present invention, the coding rate of the first-th end address of the intra-picture data in the GOP structure of MPEG in a predetermined number of units of the address and the first unit By calculating based on this and storing it at the head in one unit , using the memory of the minimum capacity, for example, the reproduction control information for reproducing the unit and the search information for searching as described above. The address can be recorded before encoding is started , and real-time encoding is possible. Since it controls the buffer occupancy of the virtual buffer corresponding to the decoding buffer at the time of decoding corresponding to the encoding rate, hitting to do coding, statistically most signal quality these values is improved By setting the value, it is possible to perform encoding while maintaining optimum signal quality at any encoding rate. In particular, the first-th end address Lee Ntorapikuchi catcher data of the GOP structure of MPEG by storing the head in one unit, by accessing only the beginning of the intra-picture relevant catcher data per GOP Can be transmitted to the decoder. That is, if the original or solving intra up VLC, unless Ika searches the next picture header whereas intra end position is not known, the use of this data encoding method and a data encoding device, MPEG data up there Thus the end address of the intra-picture data of the first GOP structure, it is possible to transmit easily to speedily decoder. As a result, trick play such as 15 times speed of variable length encoded data such as MPEG can be realized simply and speedily.

以下、図面を参照して本発明の実施の形態の説明を行う。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１には、本発明のデータ符号化方法及び装置が適用される第１の実施の形態のオーディオビデオ符号化装置の概略的な構成を示す。なお、図１には、ビデオデータの符号化を行うビデオエンコーダの構成を主に示しており、オーディオデータの符号化を行うオーディオエンコーダの構成については図示を省略している。 FIG. 1 shows a schematic configuration of an audio video encoding apparatus according to a first embodiment to which the data encoding method and apparatus of the present invention is applied. FIG. 1 mainly shows the configuration of a video encoder that encodes video data, and does not show the configuration of the audio encoder that encodes audio data.

この図１において、入力端子１には入力画像信号が供給され、この入力画像信号は演算器２と動き補償予測器１１に送られる。 In FIG. 1, an input image signal is supplied to an input terminal 1, and this input image signal is sent to a computing unit 2 and a motion compensation predictor 11.

演算器２では、動き補償予測器１１にて復号化した画像信号と入力画像信号との差分を求め、その差分画像信号をＤＣＴ器３に送る。なお、差分画像信号の符号化を行うのはＰピクチャ及びＢピクチャの場合であり、Ｉピクチャの場合には入力画像信号をそのまま符号化するが、以下の説明では差分画像信号を符号化する場合を例に挙げて説明する。 The computing unit 2 obtains a difference between the image signal decoded by the motion compensation predictor 11 and the input image signal, and sends the difference image signal to the DCT unit 3. The difference image signal is encoded in the case of P picture and B picture. In the case of I picture, the input image signal is encoded as it is, but in the following description, the difference image signal is encoded. Will be described as an example.

ＤＣＴ器３では、演算器２から供給された差分画像信号を直交変換する。このＤＣＴ器３でのＤＣＴ処理によって得られたデータ（ＤＣＴ係数）は、量子化器４に送られる。 The DCT unit 3 orthogonally transforms the difference image signal supplied from the computing unit 2. Data (DCT coefficient) obtained by the DCT processing in the DCT unit 3 is sent to the quantizer 4.

量子化器４では、ＤＣＴ器３からのＤＣＴ係数を量子化し、その量子化データを可変長符号化（ＶＬＣ）器５に送る。 The quantizer 4 quantizes the DCT coefficient from the DCT unit 3 and sends the quantized data to a variable length coding (VLC) unit 5.

ＶＬＣ器５では、量子化器４からの量子化データを可変長符号化する。また、このＶＬＣ器５には、動き補償予測器１１から動きベクトルと予測モードの情報も供給され、当該ＶＬＣ器５は、可変長符号化データと共に、これら動きベクトルと予測モードの情報をマクロブロックの付加情報として出力する。当該ＶＬＣ器５にて可変長符号化されたデータは、一時、バッファメモリ６に蓄えられた後、このバッファメモリ６から所定の転送レートで読み出され、ビデオ符号化データとして後述するユニット化器１７に送られる。 The VLC unit 5 performs variable length coding on the quantized data from the quantizer 4. The VLC unit 5 is also supplied with motion vector and prediction mode information from the motion compensated predictor 11, and the VLC unit 5 converts the motion vector and prediction mode information into macroblocks together with variable-length encoded data. Is output as additional information. The data variable-length encoded by the VLC unit 5 is temporarily stored in the buffer memory 6 and then read out from the buffer memory 6 at a predetermined transfer rate. 17 is sent.

また、バッファメモリ６から出力されるビデオ符号化データにおけるマクロブロック毎の発生符号量は、ＶＢＶバッファ制御器４０に送信される。このＶＢＶバッファ制御器４０は、詳細については後述するが、復号時に復号バッファ占有量がオーバーフローやアンダーフローしないように、ＭＰＥＧにおいてＶＢＶバッファと呼ばれている仮想的な復号バッファを設定し、このＶＢＶバッファの占有量に基づいて実際の符号化による発生符号量を制御するものである。当該ＶＢＶバッファ制御器４０は、符号化の際の発生符号量制御のための符号量制御信号を発生し、量子化器４にフィードバックする。この量子化器４にフィードバックされる符号量制御信号は、量子化器４における量子化スケールを制御するための信号である。 The generated code amount for each macroblock in the video encoded data output from the buffer memory 6 is transmitted to the VBV buffer controller 40. As will be described in detail later, the VBV buffer controller 40 sets a virtual decoding buffer called a VBV buffer in MPEG so that the decoding buffer occupation amount does not overflow or underflow during decoding. The generated code amount by actual encoding is controlled based on the buffer occupancy. The VBV buffer controller 40 generates a code amount control signal for controlling the generated code amount at the time of encoding, and feeds it back to the quantizer 4. The code amount control signal fed back to the quantizer 4 is a signal for controlling the quantization scale in the quantizer 4.

一方、量子化された画像データは、逆量子化器７にも送られる。 On the other hand, the quantized image data is also sent to the inverse quantizer 7.

逆量子化器７では、量子化器４からの量子化データを逆量子化する。この逆量子化により得られたＤＣＴ係数データは、逆ＤＣＴ器８に送られる。 The inverse quantizer 7 inversely quantizes the quantized data from the quantizer 4. The DCT coefficient data obtained by this inverse quantization is sent to the inverse DCT unit 8.

逆ＤＣＴ器８では、逆量子化器７からのＤＣＴ係数データを逆ＤＣＴ処理した後、演算器９に送る。 In the inverse DCT unit 8, the DCT coefficient data from the inverse quantizer 7 is subjected to inverse DCT processing and then sent to the arithmetic unit 9.

演算器９では、動き補償予測器１１からの予測差分画像と逆ＤＣＴ器８からの信号とを加算する。これにより、画像信号が復元される。この復元された画像信号は、一時、画像メモリ１０に蓄えられた後、動き補償予測器１１に送られる。
当該画像メモリ１０から動き補償予測器１１に送られた画像信号は、演算器２にて差分画像を計算するためのリファレンスの復号化画像を生成するために使用される。 The computing unit 9 adds the prediction difference image from the motion compensated predictor 11 and the signal from the inverse DCT unit 8. Thereby, the image signal is restored. The restored image signal is temporarily stored in the image memory 10 and then sent to the motion compensation predictor 11.
The image signal sent from the image memory 10 to the motion compensation predictor 11 is used by the computing unit 2 to generate a reference decoded image for calculating a difference image.

動き補償予測器１１では、入力画像信号から動きベクトルを検出し、その検出した動きベクトルの動き分だけ画像をシフトしてから予測を行う。この予測により得られた予測差分画像信号が、演算器２及び演算器９に送られることになる。
また、動き補償予測器１１にて検出された動きベクトルは、予測モード（ＭＣモード）の情報と共に、ＶＬＣ器５に送られる。 The motion compensated predictor 11 detects a motion vector from the input image signal, and performs prediction after shifting the image by the amount of motion of the detected motion vector. The prediction difference image signal obtained by this prediction is sent to the calculator 2 and the calculator 9.
Further, the motion vector detected by the motion compensation predictor 11 is sent to the VLC unit 5 together with information on the prediction mode (MC mode).

ここまでの構成は前述した図８と略々同様であるが、本発明の第１の実施の形態のオーディオビデオ符号化装置では、更に以下のような構成を有している。 The configuration so far is substantially the same as that of FIG. 8 described above, but the audio video encoding device according to the first embodiment of the present invention further has the following configuration.

オーディオビデオ符号化レート決定器１３では、これから符号化しようとするオーディオ及びビデオの符号化レートが決定される。なお、このオーディオ及びビデオの符号化レートは、ユーザが決定しても、また、自動的に設定されても良い。当該オーディオビデオ符号化レート決定器１３にて決定された符号化レート情報は、ＶＢＶバッファ制御器４０及びユニットアドレス計算器１５に送られる。 The audio video encoding rate determiner 13 determines the audio and video encoding rates to be encoded. Note that the audio and video encoding rates may be determined by the user or automatically set. The coding rate information determined by the audio video coding rate determiner 13 is sent to the VBV buffer controller 40 and the unit address calculator 15.

ここで、当該第１の実施の形態のオーディオビデオ符号化装置のＶＢＶバッファ制御器４０での処理を、図２を用いて以下に説明する。 Here, the processing in the VBV buffer controller 40 of the audio video encoding apparatus of the first embodiment will be described below with reference to FIG.

この図２において、ＶＢＶバッファ制御器４０の端子５３には、バッファメモリ６からのマクロブロック毎の発生符号量情報が入力される。また、端子５２には、オーディオビデオ符号化レート決定器１３にて決定された符号化レート情報が入力され、端子５１からは、量子化器４に対する符号量制御信号（量子化スケールを制御するための信号、すなわち量子化値）が出力される。 In FIG. 2, the generated code amount information for each macroblock from the buffer memory 6 is input to the terminal 53 of the VBV buffer controller 40. Also, the terminal 52 receives the coding rate information determined by the audio video coding rate determiner 13, and the terminal 51 receives a code amount control signal (for controlling the quantization scale) for the quantizer 4. (That is, a quantized value) is output.

端子５２に入力された符号化レート情報は、目標符号量計算器５６とＶＢＶバッファ推移観測器５７とに送られる。目標符号量計算器５６は、符号化レート情報に基づいてピクチャ単位で目標となる符号量（目標符号量）を計算する。 The coding rate information input to the terminal 52 is sent to the target code amount calculator 56 and the VBV buffer transition observer 57. The target code amount calculator 56 calculates a target code amount (target code amount) in units of pictures based on the encoding rate information.

以下に、ＶＢＶバッファ制御器４０におけるピクチャ単位での目標符号量の計算から発生符号量の制御までの流れについて説明する。 The flow from calculation of the target code amount for each picture in the VBV buffer controller 40 to control of the generated code amount will be described below.

例えば、ビデオオブジェクトユニットの目標符号量をＴ(U)とし、また、１ビデオオブジェクトユニットを１ＧＯＰ、１ＧＯＰを１５フレームとする。この１ビデオオブジェクトユニットは時間にして０．５秒に相当するので、当該ビデオオブジェクトユニット（１ＧＯＰの１５ピクチャ分）の目標符号量Ｔ(U)の計算式は、例えば以下の式（１）のようになる。ただし、式中のＮは転送レート（Ｍｂｐｓ）である。 For example, the target code amount of the video object unit is T (U), and 1 video object unit is 1 GOP and 1 GOP is 15 frames. Since one video object unit corresponds to 0.5 seconds in time, the calculation formula of the target code amount T (U) of the video object unit (for 15 pictures of 1 GOP) is, for example, the following formula (1) It becomes like this. However, N in the formula is a transfer rate (Mbps).

Ｔ(U)＝Ｎ／２（Ｍビット）（１）
目標符号量計算器５６では、当該式（１）の計算によって、ビデオオブジェクトユニット毎の目標符号量を求める。 T (U) = N / 2 (M bits) (1)
The target code amount calculator 56 obtains the target code amount for each video object unit by the calculation of the equation (1).

次に、目標符号量計算器５６では、こうして決定したビデオオブジェクトユニットの目標符号量に基づいて、以下に説明するように、第１のステップにより各ピクチャ毎の目標符号量の設定（符号量配分）を行う。 Next, the target code amount calculator 56 sets the target code amount for each picture (code amount distribution) in the first step based on the target code amount of the video object unit thus determined, as described below. )I do.

具体的に説明すると、ＶＢＶバッファ制御器４０の目標符号量計算器５６では、当該第１のステップとして、ＧＯＰ内の各ピクチャに対する目標符号量を、符号化対象のピクチャを含めＧＯＰ内で未だ符号化されていないピクチャに対する目標符号量Ｒを基にして配分する、この配分をＧＯＰ内の符号化ピクチャ順に繰り返す。その際、以下のような２つの仮定を用いて各ピクチャへの目標符号量を設定する。 More specifically, in the target code amount calculator 56 of the VBV buffer controller 40, as the first step, the target code amount for each picture in the GOP is not yet encoded in the GOP including the picture to be encoded. The allocation is performed based on the target code amount R for the pictures that have not been converted, and this distribution is repeated in the order of the encoded pictures in the GOP. At that time, the target code amount for each picture is set using the following two assumptions.

第１の仮定として、各ピクチャを符号化する際に用いる平均量子化スケールと発生符号量との積は、画面が変化しない限りピクチャタイプ毎に一定値となると仮定する。各ピクチャを符号化した後、各ピクチャタイプ毎に所定の重み付けパラメータ（例えば画面の複雑さを示す重み付けパラメータ）Ｘｉ，Ｘｐ，Ｘｂを、以下の式（２）〜式（４）により更新する。 As a first assumption, it is assumed that the product of the average quantization scale used when encoding each picture and the generated code amount is a constant value for each picture type unless the screen changes. After encoding each picture, predetermined weighting parameters (for example, weighting parameters indicating the complexity of the screen) Xi, Xp, and Xb for each picture type are updated by the following equations (2) to (4).

Ｘｉ＝Ｓｉ×Ｑｉ（２）
Ｘｐ＝Ｓｐ×Ｑｐ（３）
Ｘｂ＝Ｓｂ×Ｑｂ（４）
なお、これら式中のｉはＩピクチャを、ｐはＰピクチャを、ｂはＢピクチャを表す。また、これら式中のＳｉ，Ｓｐ，Ｓｂは各ピクチャの一つ前の同ピクチャタイプの符号化結果の発生符号量であり、Ｑｉ，Ｑｐ，Ｑｂは各ピクチャの符号化時の平均量子化スケールである。すなわち、重み付けパラメータＸｉ，Ｘｐ，Ｘｂは、これら式（２）〜式（４）から、一つ前の同ピクチャタイプの符号化結果の発生符号量Ｓと平均量子化スケールＱの積で定義される。 Xi = Si × Qi (2)
Xp = Sp × Qp (3)
Xb = Sb × Qb (4)
In these equations, i represents an I picture, p represents a P picture, and b represents a B picture. In these equations, Si, Sp, and Sb are generated code amounts of the encoding result of the same picture type immediately before each picture, and Qi, Qp, and Qb are average quantization scales when encoding each picture. It is. That is, the weighting parameters Xi, Xp, and Xb are defined by the product of the generated code amount S of the previous encoding result of the same picture type and the average quantization scale Q from these equations (2) to (4). The

また、第２の仮定として、独立符号化されるＩピクチャの量子化スケールＱｉを基準とし、このＩピクチャの量子化スケールＱｉとＰピクチャの量子化スケールＱｐとの比率ＫｐがＫｐ＝１．０、Ｉピクチャの量子化スケールＱｉとＢピクチャの量子化スケールＱｂとの比率ＫｂがＫｂ＝１．４となるときに、常に全体の画質が最適化される（理想的な画質が達成される）と仮定する。 As a second assumption, the quantization scale Qi of the I picture to be independently encoded is used as a reference, and the ratio Kp between the quantization scale Qi of the I picture and the quantization scale Qp of the P picture is Kp = 1.0. When the ratio Kb between the quantization scale Qi of the I picture and the quantization scale Qb of the B picture is Kb = 1.4, the overall image quality is always optimized (ideal image quality is achieved). Assume that

これら第１，第２の仮定の元で、目標符号量計算器４０では、例えば以下の式（５）〜（７）により、Ｉピクチャの目標符号量Ｔｉ、Ｐピクチャの目標符号量Ｔｐ、Ｂピクチャの目標符号量Ｔｂを求める。 Under these first and second assumptions, the target code amount calculator 40 uses, for example, the following equations (5) to (7) to set the target code amount Ti for I picture and the target code amount Tp, B for P picture. A target code amount Tb of the picture is obtained.

ただし、これら式（５）〜式（７）中のＮｐ，ＮｂはＧＯＰ内のＰピクチャやＢピクチャの未符号化ピクチャ枚数である。

However, Np and Nb in the equations (5) to (7) are the number of uncoded pictures of P pictures and B pictures in the GOP.

すなわち、先ず、ＧＯＰ内の未符号化ピクチャのうち、符号化対象となるピクチャとピクチャタイプの異なるピクチャについては、上述した画質最適化条件のもとで、そのピクチャの発生する符号量が、符号化対象ピクチャの発生符号量の何倍となるか推定する。 That is, first, among the unencoded pictures in the GOP, for a picture having a different picture type from the picture to be encoded, the code amount generated by the picture under the above-described image quality optimization condition is It is estimated how many times the generated code amount of the picture to be converted is.

次に、未符号化ピクチャ全体で発生する推定符号量が、符号化対象ピクチャの何枚分の符号量に相当するかを求める。符号化対象ピクチャに対する目標符号量は、未符号化ピクチャに対する目標符号量Ｒを、この枚数で割ることによって与えられる。このようにして求めた目標符号量を基にして、各ピクチャタイプを符号化する毎に、ＧＯＰ内の未符号化ピクチャに対する目標符号量Ｒを、下記式（８）〜式（１０）のように更新する。 Next, it is determined how many code amounts of the encoding target picture the estimated code amount generated in the entire uncoded picture corresponds to. The target code amount for the encoding target picture is given by dividing the target code amount R for the uncoded picture by this number. Each time each picture type is encoded based on the target code amount obtained in this way, the target code amount R for the uncoded picture in the GOP is expressed by the following equations (8) to (10). Update to

Ｒ＝Ｒ−Ｓｉ（８）
Ｒ＝Ｒ−Ｓｐ（９）
Ｒ＝Ｒ−Ｓｂ（１０）
次に、ＶＢＶバッファ制御器４０では、目標符号量発生符号量比較器５５において、目標符号量計算器５６の第１のステップで求められた各ピクチャに対する目標符号量Ｔｉ，Ｔｐ，Ｔｂと、図２の端子５３を介してバッファメモリ６から供給された実際の発生符号量とを比較し、各ピクチャの目標符号量に対する発生符号量との誤差符号量を生成する。この誤差符号量情報は、フィードバック量子化値決定器５４に送られる。 R = R-Si (8)
R = R-Sp (9)
R = R-Sb (10)
Next, in the VBV buffer controller 40, the target code amount generation code amount comparator 55 displays the target code amounts Ti, Tp, Tb for each picture obtained in the first step of the target code amount calculator 56, as shown in FIG. The actual generated code amount supplied from the buffer memory 6 via the second terminal 53 is compared, and an error code amount with respect to the generated code amount with respect to the target code amount of each picture is generated. The error code amount information is sent to the feedback quantization value determiner 54.

当該フィードバック量子化値決定器５４では、第２のステップとして、各ピクチャに対する目標符号量Ｔｉ，Ｔｐ，Ｔｂと実際の発生符号量とを一致させるために、各ピクチャタイプ毎に独立に設定した３種類の仮想バッファの容量を元に、量子化スケールをマクロブロック単位のフィードバック制御で求める。 In the feedback quantization value determiner 54, as a second step, 3 is set independently for each picture type in order to match the target code amounts Ti, Tp, Tb for each picture with the actual generated code amounts. Based on the capacity of each type of virtual buffer, the quantization scale is obtained by feedback control in units of macroblocks.

すなわち、先ず、例えばｊ番目のマクロブロックの符号化に先立ち、仮想バッファの占有量を、下記式（１１）〜（１３）にて求める。 That is, first, for example, prior to encoding the j-th macroblock, the occupation amount of the virtual buffer is obtained by the following equations (11) to (13).

これら式中のｄ０ⁱ，ｄ０^p，ｄ０^bは各ピクチャタイプ毎の仮想バッファの初期占有量で、Ｂ_jはピクチャの先頭からｊ番目のマクロブロックまでの発生ビット量、ＭＢ＿cntはＩピクチャ内のマクロブロック数である。

In these equations, d0 ⁱ , d0 ^p , and d0 ^b are initial occupancy amounts of the virtual buffer for each picture type, B _j is the generated bit amount from the beginning of the picture to the j-th macroblock, and MB_cnt is in the I picture The number of macroblocks.

次に、ｊ番目のマクロブロックに対する量子化スケールＱｊを下記式（１４）により計算する。 Next, the quantization scale Qj for the jth macroblock is calculated by the following equation (14).

Ｑ_j＝ｄｊ×３１／ｒ（１４）
なお、式中のｒはフィードバックの応答速度を決定するパラメータであり、当該ｒは下記式（１５）で与えられる。 Q _j = dj × 31 / r (14)
Note that r in the equation is a parameter that determines the response speed of feedback, and the r is given by the following equation (15).

ｒ＝２×bit_rate／picture_rata （１５）
上述したアルゴリズムは、ＭＰＥＧ標準化で使用されたテストモデルＴＭ５に記載されており、１９９５年テレビジョン学会誌vol49、No.4、P４５５〜４５６にも掲載されている。 r = 2 × bit_rate / picture_rata (15)
The algorithm described above is described in the test model TM5 used in the MPEG standardization, and is also published in the 1995 Television Society Journal vol49, No. 4, P455-456.

ここで、本発明の第１の実施の形態のオーディオビデオ符号化装置の場合、ＶＢＶバッファ制御器４０では、前述したように目標符号量計算器５６が第１のステップとして各ピクチャの目標符号量を計算した時点で、ＶＢＶバッファ推移観測器５７において、前記ＶＢＶバッファの推移をその目標符号量で符号化したと仮定した場合のＶＢＶバッファ値を予め予想し、その予想値の基づいて目標符号量を設定するようにしている。すなわち、ＶＢＶバッファ推移観測器５７では、端子５３を介してバッファメモリ６から供給される発生符号量と、端子５２を介してオーディオビデオ符号化レート決定器１３から供給される符号化レートとに基づいて、ＶＢＶバッファ量を監視し、そのＶＢＶバッファ量に基づいて、目標符号量計算器５６が第１のステップにて設定した目標符号量を設定するようにしている。 Here, in the audio video encoding apparatus according to the first embodiment of the present invention, in the VBV buffer controller 40, as described above, the target code amount calculator 56 performs the target code amount of each picture as the first step. VBV buffer transition observer 57 predicts in advance the VBV buffer value assuming that the transition of the VBV buffer is encoded with the target code amount, and the target code amount is calculated based on the predicted value. Is set. That is, the VBV buffer transition observer 57 is based on the generated code amount supplied from the buffer memory 6 via the terminal 53 and the coding rate supplied from the audio video coding rate determiner 13 via the terminal 52. Thus, the VBV buffer amount is monitored, and based on the VBV buffer amount, the target code amount calculator 56 sets the target code amount set in the first step.

ところで、一般的なＭＰＥＧにおけるＶＢＶバッファ制御器は、本来は、図３に示すように、復号装置において復号を行ったとした場合に復号バッファ（符号化装置におけるＶＢＶバッファ）の占有値がどのように推移しているかを予想しながら符号量を制御するものである。当該予想に用いるバッファ（ＶＢＶバッファ）はあくまで仮想バッファであるが、ＭＰＥＧではＣＢＲ（constant bit rate）の場合に、当該ＶＢＶバッファの最大容量（MaxValue値）をオーバーフローしないように、また、最小容量（０）をアンダーフローしないように制御しながら符号化を行わなければならない。なお、図３の縦軸はこの仮想的な復号装置の復号バッファ（すなわちＶＢＶバッファ）の占有量を表し、横軸は時間を表している。また、バッファ占有量の変化を表す傾きは、転送レートすなわち符号化レートに相当する。 By the way, the VBV buffer controller in a general MPEG originally has an occupancy value of the decoding buffer (VBV buffer in the encoding device) when decoding is performed in the decoding device as shown in FIG. The code amount is controlled while predicting whether or not it is changing. The buffer (VBV buffer) used for the prediction is only a virtual buffer. However, in the case of CBR (constant bit rate) in MPEG, the maximum capacity (MaxValue value) of the VBV buffer is not overflowed, and the minimum capacity ( 0) must be encoded while being controlled so as not to underflow. The vertical axis in FIG. 3 represents the occupation amount of the decoding buffer (that is, the VBV buffer) of this virtual decoding device, and the horizontal axis represents time. In addition, the gradient representing the change in the buffer occupancy corresponds to the transfer rate, that is, the encoding rate.

この図３では、標準テレビジョン放送方式のＮＴＳＣ（National Television System Committee）に対応したビデオ信号を符号化した場合を例に上げており、したがって、各ピクチャの復号タイミングは１／２９．９７秒単位で行われることになる。すなわち、復号バッファ（ＶＢＶバッファ）には、ビデオオブジェクトユニットの第１番目のピクチャであるＩピクチャの１２０Ｋビットの圧縮データが初期値として溜められ、その後、この１２０Ｋビット分のデータが読み出されて復号されることになる。ただし、当該ＶＢＶバッファにおける復号は仮想的な復号であり、ＭＰＥＧで規定したモデルでは時間０で一瞬にして復号されることになるため、このときのＶＢＶバッファ（復号バッファ）からは１２０Ｋビットのデータが一瞬に抜き取られる。次に、１／２９．９７秒かけて第２番目のピクチャであるＰピクチャの８０Ｋビット分の圧縮データが当該復号バッファ（ＶＢＶバッファ）に入力され、その後、当該Ｐピクチャの復号のためにその８０Ｋビットの圧縮データが一瞬にして抜き取られる。次に、１／２９．９７秒かけて第３番目のピクチャであるＢピクチャの４０Ｋビット分の圧縮データが当該復号バッファ（ＶＢＶバッファ）に入力され、その後、当該Ｂピクチャの復号のためにその４０Ｋビットの圧縮データが一瞬にして抜き取られる。以下、各ピクチャについて上述同様のデータ入力と抜き取り処理がなされる。 In FIG. 3, the case where a video signal corresponding to the NTSC (National Television System Committee) of the standard television broadcasting system is encoded is taken as an example. Therefore, the decoding timing of each picture is 1 / 29.97 seconds. Will be done. That is, in the decoding buffer (VBV buffer), 120K-bit compressed data of the I picture, which is the first picture of the video object unit, is stored as an initial value, and then the 120K-bit data is read out. It will be decrypted. However, the decoding in the VBV buffer is virtual decoding, and in the model defined by MPEG, decoding is performed instantaneously at time 0. Therefore, data of 120 Kbits is obtained from the VBV buffer (decoding buffer) at this time. Is extracted in an instant. Next, 80 Kbits of compressed data of the P picture, which is the second picture, is input to the decoding buffer (VBV buffer) over 1 / 29.97 seconds, and then the P picture is decoded for decoding the P picture. 80K-bit compressed data is extracted in an instant. Next, 40 Kbits of compressed data of the B picture, which is the third picture, is input to the decoding buffer (VBV buffer) over 1 / 29.97 seconds, and then the B picture is decoded for decoding the B picture. 40K-bit compressed data is extracted in an instant. Thereafter, the same data input and extraction processing as described above is performed for each picture.

このように、ＭＰＥＧにおける一般的なＶＢＶバッファ制御は、復号バッファ（ＶＢＶバッファ）に入力される圧縮データの転送レート、すなわち図３のグラフの直線の傾きに相当する符号化レートに依存する。 Thus, general VBV buffer control in MPEG depends on the transfer rate of compressed data input to the decoding buffer (VBV buffer), that is, the encoding rate corresponding to the slope of the straight line in the graph of FIG.

これに対し、本発明の第１の実施の形態のオーディオビデオ符号化装置では、ＶＢＶバッファ制御器４０のＶＢＶバッファ推移観測器５７において、端子５３を介してバッファメモリ６から供給される発生符号量と、端子５２を介してオーディオビデオ符号化レート決定器１３から供給される符号化レートとに基づいてＶＢＶバッファ量を監視し、図３と同様に表記する図４に示すように、ビデオオブジェクトユニット内で最初に他の画像の復号のために参照されることになる第１のリファレンス画像の復号化時点（すなわち独立符号化される画像データであるＩピクチャの符号化時点）では、当該ＶＢＶバッファ占有値を図中ＶＢＶ値１に収束するようにし、次に、他の画像の復号のために参照されることになる第２のリファレンス画像の復号化時点（すなわち最初のＰピクチャの符号化時点）では、ＶＢＶバッファ占有値を図中ＶＢＶ値２に収束するように、さらに次に、他の画像の復号のために参照されることになる第３のリファレンス画像の復号化時点（すなわち次のＰピクチャの符号化時点）では、ＶＢＶバッファ占有値を図中ＶＢＶ値３に収束するようにする処理を繰り返し、ビデオオブジェクトユニットの最後の画像符号化時点（すなわち最後のＢピクチャの符号化時点）のＶＢＶバッファ占有値を図中ＶＢＶ値Ｅに収束するように、目標符号量計算器５６が第１のステップにて設定した目標符号量を設定するようにしている。 On the other hand, in the audio video encoding device according to the first embodiment of the present invention, the generated code amount supplied from the buffer memory 6 via the terminal 53 in the VBV buffer transition observer 57 of the VBV buffer controller 40. And the VBV buffer amount based on the encoding rate supplied from the audio video encoding rate determiner 13 via the terminal 52, and as shown in FIG. At the time of decoding of the first reference image that will be referred to for decoding of other images in the first time (that is, the time of coding of I picture that is image data that is independently coded), the VBV buffer The occupancy value converges to VBV value 1 in the figure, and then the second reference image that will be referenced for decoding other images. At the time of encoding (that is, the time of encoding the first P picture), the VBV buffer occupancy value is converged to VBV value 2 in the figure, and is then referred to for decoding of other images. At the time of decoding of the third reference picture (that is, the time of coding of the next P picture), the process of converging the VBV buffer occupancy value to VBV value 3 in the figure is repeated, and the last picture code of the video object unit is repeated. The target code amount set by the target code amount calculator 56 in the first step is set so that the VBV buffer occupancy value at the time of conversion (that is, the encoding point of the last B picture) converges to the VBV value E in the figure. Like to do.

このように、ＶＢＶバッファ制御器４０において、独立符号化されるＩピクチャと、復号時に他の画像の参照画像となるＰピクチャと、ビデオオブジェクトユニットの最後のＢピクチャの目標符号量とを設定することにより、後述するユニットアドレス計算器１５において、サーチのための基準ユニット及び当該基準ユニットの少なくとも前後に再生される所定数のユニットのアドレスと、当該ユニット内のデータのうち前記独立符号化されるＩピクチャ、及び復号時に他のデータの参照データとなされるＰピクチャ、及びビデオオブジェクトユニットの最後のＢピクチャの各終了アドレスの計算が非常に容易となり、予め指定したアドレス値に簡単に制御することが可能となる。 In this way, the VBV buffer controller 40 sets the I picture that is independently encoded, the P picture that becomes a reference picture of another picture at the time of decoding, and the target code amount of the last B picture of the video object unit. Thus, in the unit address calculator 15 to be described later, the reference unit for search and the addresses of a predetermined number of units reproduced at least before and after the reference unit and the data in the unit are independently encoded. The calculation of the end addresses of the I picture, the P picture used as reference data for other data at the time of decoding, and the last B picture of the video object unit is very easy, and can be easily controlled to a predetermined address value. Is possible.

図１に戻り、ユニットアドレス計算器１５では、図４にて説明したのと同様のＶＢＶ値１〜ＶＢＶ値Ｅと転送レート情報（符号化レート情報）とを用い、以下に説明する各式によって、図５に示すようにビデオオブジェクトユニット内で最初に他の画像の復号のために参照されることになる第１のリファレンス画像（独立符号化される画像データであるＩピクチャ）の終了アドレス１ＥＡを計算し、他の画像の復号のために参照されることになる第２のリファレンス画像（最初のＰピクチャ）の終了アドレス２ＥＡを計算し、次に他の画像の復号のために参照されることになる第３のリファレンス画像（次のＰピクチャ）の終了アドレス３ＥＡを計算し、以下同様に、各リファレンス画像の終了アドレスを計算し、さらに、ビデオオブジェクトユニットの最後の画像（最後のＢピクチャ）の終了アドレスＴＥＡを計算する。 Returning to FIG. 1, the unit address calculator 15 uses the same VBV value 1 to VBV value E and transfer rate information (encoding rate information) as described in FIG. As shown in FIG. 5, the end address 1EA of the first reference image (I picture that is independently encoded image data) to be referred to for decoding of other images first in the video object unit. , Calculate the end address 2EA of the second reference image (first P picture) that will be referenced for decoding other images, and then reference for decoding other images The end address 3EA of the third reference picture (next P picture) to be calculated is calculated, and the end address of each reference picture is calculated in the same manner. Calculating the end address of TEA last image of the unit (the last B picture).

ここで、オーディオビデオ符号化レート決定器１３からの符号化レート情報のうち、ビデオデータの符号化レートをVideoRate（ｋｂｐｓ）とし、オーディオデータの符号化レートをAudioRate（ｋｂｐｓ）とすると、ビデオオブジェクトユニットの最後の画像（Ｂピクチャ）の終了アドレスＴＥＡは、次式（１６）のように算出される。 Here, of the encoding rate information from the audio video encoding rate determiner 13, if the encoding rate of the video data is VideoRate (kbps) and the encoding rate of the audio data is AudioRate (kbps), the video object unit The end address TEA of the last image (B picture) is calculated as in the following equation (16).

ＴＥＡ＝（VideoRate＋AudioRate）×15／29.97 （１６）
また、ＶＢＶバッファに予め設定した最大容量をMaxValueとし、図５に示したように、ビデオオブジェクトユニット内のＩピクチャの終了アドレスを１ＥＡとし、ビデオオブジェクトユニット内の最初のＰピクチャの終了アドレスを２ＥＡ、ビデオオブジェクトユニット内の次のＰピクチャの終了アドレスを３ＥＡとすると、これら終了アドレス１ＥＡ〜３ＥＡは、下記式（１７）〜式（１９）のように算出される。 TEA = (VideoRate + AudioRate) x 15 / 29.97 (16)
Further, the maximum capacity preset in the VBV buffer is set to MaxValue, as shown in FIG. 5, the end address of the I picture in the video object unit is set to 1EA, and the end address of the first P picture in the video object unit is set to 2EA. Assuming that the end address of the next P picture in the video object unit is 3EA, these end addresses 1EA to 3EA are calculated as in the following equations (17) to (19).

1EA =(MaxValue-VBV値1)+(AudioRate)×1/29.97 (17)
2EA =(MaxValue-VBV値1)+(VideoRate)×3/29.97-(VBV値2-VBV値1)
+(AudioRate)×4/29.97 (18)
3EA =(MaxValue-VBV値1)+(VideoRate)×6/29.97-(VBV値3-VBV値1)
+(AudioRate)×7/29.97 (19)
但し、これら式（１６）〜式（１９）において単位はｋビット、ビデオオブジェクトユニットは１５フレームで丁度１ＧＯＰ、ＩピクチャやＰピクチャの間にあるＢピクチャは２枚であることが分かっていると仮定する。また、オーディオデータは固定転送レートと仮定し、単位時間当たりのサンプル数を固定としているが、オーディオデータの符号化が可変長符号化であれば、ビデオデータに対応した位置の（ビデオデータが出力される時間に対応する）オーディオデータの符号量を考慮して計算すればよい。また、このようなことは、目標符号量計算器５６、後述する目標符号量メモリをオーディオ用に装備することにより実現可能である。 1EA = (MaxValue-VBV value 1) + (AudioRate) x 1 / 29.97 (17)
2EA = (MaxValue-VBV value 1) + (VideoRate) x 3 / 29.97- (VBV value 2-VBV value 1)
+ (AudioRate) × 4 / 29.97 (18)
3EA = (MaxValue-VBV value 1) + (VideoRate) x 6 / 29.97- (VBV value 3-VBV value 1)
+ (AudioRate) × 7 / 29.97 (19)
However, in Equations (16) to (19), it is known that the unit is k bits, the video object unit is 15 frames, exactly 1 GOP, and there are two B pictures between the I picture and the P picture. Assume. In addition, the audio data is assumed to have a fixed transfer rate, and the number of samples per unit time is fixed. However, if the encoding of the audio data is variable length encoding, the video data is output at the position corresponding to the video data. The calculation may be performed in consideration of the code amount of the audio data (corresponding to the time to be performed). Further, this can be realized by installing a target code amount calculator 56 and a target code amount memory described later for audio.

上述したように、本実施の形態のオーディオビデオ符号化装置によれば、ＶＢＶバッファ制御器４０において、独立符号化されるＩピクチャと復号時に他の画像の参照画像となるＰピクチャとビデオオブジェクトユニットの最後のＢピクチャの目標符号量とを設定すると共に、それら目標符号量に合うように発生符号量を制御し、また、ユニットアドレス計算器１５において、サーチのための基準ユニット及び当該基準ユニットの少なくとも前後に再生される所定数のユニットのアドレスと当該ユニット内のデータのうち前記独立符号化されるＩピクチャ、及び復号時に他のデータの参照データとなされるＰピクチャ、ビデオオブジェクトユニットの最後のＢピクチャの各終了アドレスを計算することにより、図１０に示したようなビデオオブジェクトユニットを再生するためのサーチ情報を記録するナビゲーションデータ、すなわち、サーチのためにそのビデオオブジェクトユニットを基準として少なくとも前後に再生される所定数のユニットのアドレス（ＴＥＡ）と、独立符号化された画像（Ｉピクチャ）を構成できるデータの終了アドレス（第１のリファレンス画像の終了アドレス１ＥＡ）、及び第２，第３，・・・のリファレンス画像までの各終了アドレス（２ＥＡ，３ＥＡ，・・・）を、予め指定した値に簡単に制御することが可能となる。 As described above, according to the audio video encoding apparatus of the present embodiment, in the VBV buffer controller 40, an I picture that is independently encoded, a P picture that becomes a reference image of another image at the time of decoding, and a video object unit And the target code amount of the last B picture is set, and the generated code amount is controlled so as to match the target code amount. In the unit address calculator 15, the reference unit for search and the reference unit of the reference unit are controlled. At least the address of a predetermined number of units reproduced before and after, the I picture that is independently encoded among the data in the unit, the P picture that is used as reference data for other data at the time of decoding, the last of the video object unit By calculating each end address of the B picture, the video option as shown in FIG. Navigation data for recording search information for reproducing the object unit, that is, the addresses (TEA) of a predetermined number of units reproduced at least before and after the video object unit for the search, and independently encoded End address (end address 1EA of the first reference image) of data that can form the image (I picture), and end addresses (2EA, 3EA,... Up to the second, third,... Reference images). ) Can be easily controlled to a value designated in advance.

このユニットアドレス計算器１５にて求められた情報は、ナビゲーションデータ生成器１６に送られる。 Information obtained by the unit address calculator 15 is sent to the navigation data generator 16.

ナビゲーションデータ生成器１６は、そのビデオオブジェクトユニットを再生順序で第０番として、そのビデオオブジェクトユニットを基準として少なくともその再生順序で前後１５番まで再生されるビデオオブジェクトユニット、再生順序において第２０番、第３０番、第６０番、第１２０番、及び第２４０番までのビデオオブジェクトユニットのアドレスなどを、必要に応じてアドレスをスカラー倍することで計算し、所定の順番にレイアウトして、ユニット化器１７へ送信する。 The navigation data generator 16 sets the video object unit as the 0th in the playback order, the video object unit played back at least up to the 15th in the playback order based on the video object unit, the 20th in the playback order, The addresses of video object units up to 30th, 60th, 120th, and 240th are calculated by multiplying the addresses by scalars as necessary, and are laid out in a predetermined order and unitized. To the device 17.

ユニット化器１７では、端子１９から供給されたオーディオ符号化データと、バッファメモリ６から供給されたビデオ符号化データと、ナビゲーションデータ生成器１６から供給されたナビゲーションデータとを用いて、図１０にて説明したようなビデオオブジェクトユニットを生成し、そのユニット化された符号化データを出力する。すなわち当該ユニット化器１７では、送信されてきたナビゲーションデータをパケット化（パック化）すると共にビデオ符号化データとオーディオ符号化データなどをパケット化（パック化）し、さらにナビゲーションデータのパケット（ナビゲーションパック）を先頭に配置し、その後にビデオデータのパケット（ビデオパック）とオーディオデータのパケット（オーディオパック）などを配置して、所定の１つのビデオオブジェクトユニットを生成し、この１つのビデオオブジェクトユニットを送信する。当該１つのビデオオブジェクトユニットを送信し終わると、次のビデオオブジェクトユニットのためのナビゲーションデータを受け取って同様にユニット化する。これらのユニット化された符号化データは出力端子１８から出力される。 The unitizer 17 uses the audio encoded data supplied from the terminal 19, the video encoded data supplied from the buffer memory 6, and the navigation data supplied from the navigation data generator 16 in FIG. The video object unit as described above is generated, and the encoded data that is unitized is output. That is, the unitizer 17 packetizes (packets) the transmitted navigation data, packetizes the video encoded data and the audio encoded data, etc., and further packs the navigation data packet (navigation pack). ) At the head, and then a video data packet (video pack) and an audio data packet (audio pack) are arranged to generate a predetermined video object unit. Send. When the transmission of the one video object unit is completed, navigation data for the next video object unit is received and similarly unitized. These unitized encoded data are output from the output terminal 18.

上述したように本発明の第１の実施の形態のオーディオビデオ符号化装置においては、余分なメモリを持たずに、ビデオオブジェクトユニットを再生するための再生制御情報及びサーチをするためのサーチ情報を記述するナビゲーションデータを、符号化が開始される前に、記録することが可能となる。 As described above, in the audio video encoding apparatus according to the first embodiment of the present invention, the reproduction control information for reproducing the video object unit and the search information for performing the search are provided without an extra memory. The navigation data to be described can be recorded before encoding starts.

次に、図６には、本発明の第２の実施の形態のオーディオビデオ符号化装置の概略構成を示す。なお、この図６に示すオーディオビデオ符号化装置において、図１と同一の構成要素には同じ指示符号を付し、それらの説明は省略し、図１とは異なる構成要素についてのみ説明する。 Next, FIG. 6 shows a schematic configuration of an audio video encoding apparatus according to the second embodiment of the present invention. In the audio / video encoding apparatus shown in FIG. 6, the same components as those in FIG. 1 are denoted by the same reference numerals, description thereof will be omitted, and only components different from those in FIG. 1 will be described.

この図６に示す第２の実施の形態のオーディオビデオ符号化装置では、ＶＬＣ器５とバッファメモリ６との間に後述する無効ビット付加器２２を設けると共に、ＶＢＶバッファ制御器４１が図７に示したような構成を有している。なお、図７において、前述した図２と同一の構成要素には同じ指示符号を付し、それらの説明は省略し、図２と異なる構成要素についてのみ説明する。 Second in the exemplary embodiment an audio video encoding apparatus of shown in FIG. 6, provided with an invalid bit adder 22 to be described later between the VLC unit 5 and the buffer memory 6, VBV buffer controller 41 in FIG. 7 It has the structure as shown . In FIG. 7, the same constituent elements as those in FIG. 2 described above are denoted by the same reference numerals, description thereof will be omitted, and only constituent elements different from those in FIG. 2 will be described.

図７に示すＶＢＶバッファ制御器４１において、端子５３を介してバッファメモリ６から供給された発生符号量情報は、目標符号量発生符号量比較器５５に送られると同時に、無効ビット計算器６８にも送られる。 In the VBV buffer controller 41 shown in FIG. 7, the generated code amount information supplied from the buffer memory 6 via the terminal 53 is sent to the target code amount generated code amount comparator 55 and at the same time to the invalid bit calculator 68. Is also sent.

目標符号量計算器５６は前述同様であり符号化レート情報に基づいて各ピクチャの目標符号量を求める。また、この第２の実施の形態においても、ＶＢＶバッファ推移観測器５７によって、ＶＢＶバッファの占有量を前述したＶＢＶ値１〜ＶＢＶ値Ｅの値に収束させるべきタイミングは、それぞれ、独立符号化される画像である第１リファレンス画像（Ｉピクチャ）及び、第２リファレンス画像（最初のＰピクチャ）、第３リファレンス画像（次のＰピクチャ）、・・・、及びビデオオブジェクトユニットの最終画像（最後のＢピクチャ）の符号化時点である。なお、以下の説明では、これらのＶＢＶバッファの値がＶＢＶ値１〜ＶＢＶ値Ｅに収束される各画像を収束点画像と呼ぶとする。 The target code amount calculator 56 is the same as described above, and obtains the target code amount of each picture based on the coding rate information. Also in this second embodiment, the timing at which the VBV buffer transition observer 57 converges the VBV buffer occupancy to the aforementioned VBV value 1 to VBV value E is independently encoded. A first reference image (I picture), a second reference image (first P picture), a third reference image (next P picture),... (B picture) is encoded. In the following description, each image in which the values of these VBV buffers are converged to VBV value 1 to VBV value E is referred to as a convergence point image.

目標符号量メモリ６７は、目標符号量計算器５６から供給された目標符号量情報を一時蓄積し、その後読み出して仮目標符号量設定器６６と無効ビット計算器６８に供給する。 The target code amount memory 67 temporarily stores the target code amount information supplied from the target code amount calculator 56, and then reads it and supplies it to the temporary target code amount setter 66 and the invalid bit calculator 68.

仮目標符号量設定器６６は、目標符号量メモリ６７から供給された目標符号量の値の約１０％程度低めの値を、仮目標符号量として設定する。 The temporary target code amount setting unit 66 sets a value about 10% lower than the value of the target code amount supplied from the target code amount memory 67 as the temporary target code amount.

ここで、上述した各収束点画像の目標符号量は、当該仮目標符号量設定器６６において、上記目標符号量計算器５６にて算出された目標符号量の約１０％程度低めに設定される。この仮目標符号量は、目標符号量発生符号量比較器５５に送られる。 Here, the target code amount of each convergence point image described above is set to be about 10% lower than the target code amount calculated by the target code amount calculator 56 in the temporary target code amount setter 66. . This provisional target code amount is sent to the target code amount generation code amount comparator 55.

したがって、この図７の場合、目標符号量発生符号量比較器５５は、端子５３を介してバッファメモリ６から供給された発生符号量と、仮目標符号量設定器６６にて設定された仮目標符号量とを比較し、仮目標符号量に対する発生符号量との誤差符号量を生成する。この誤差符号量情報は、フィードバック量子化値決定器５４に送られる。これにより、発生符号量は、仮目標符号量に制御される。 Therefore, in the case of FIG. 7, the target code amount generated code amount comparator 55 includes the generated code amount supplied from the buffer memory 6 via the terminal 53 and the temporary target code amount setter 66 set. The code amount is compared, and an error code amount with the generated code amount with respect to the temporary target code amount is generated. The error code amount information is sent to the feedback quantization value determiner 54. Thereby, the generated code amount is controlled to the temporary target code amount.

一方、無効ビット計算器６８では、１ピクチャ分の符号化が終了した時点で、予めバッファメモリ６から入力されたマクロブロック毎の発生符号量を加算して、１ピクチャの総発生符号量とピクチャの目標符号量との差を計算し、ピクチャの目標符号量に足りない分の符号量を、無効ビット符号量情報として出力する。
この無効ビット符号量情報は、端子５９を介して図６の無効ビット付加器２２に送られる。 On the other hand, the invalid bit calculator 68 adds the generated code amount for each macroblock input in advance from the buffer memory 6 when the encoding for one picture is completed, and adds the total generated code amount of one picture to the picture. The difference from the target code amount is calculated, and the amount of code that is insufficient for the target code amount of the picture is output as invalid bit code amount information.
This invalid bit code amount information is sent to the invalid bit adder 22 of FIG.

図６の無効ビット付加器２２では、ＶＬＣ器５からの符号化データに、ＶＢＶバッファ制御器４１からの無効ビット符号量情報に対応した無効ビットを付加する。これにより、当該無効ビット付加器２２から出力されるデータは、目標符号量に合うように正確に制御されたデータとなり、この符号化データがバッファメモリ６に送られる。 The invalid bit adder 22 in FIG. 6 adds invalid bits corresponding to invalid bit code amount information from the VBV buffer controller 41 to the encoded data from the VLC unit 5. As a result, the data output from the invalid bit adder 22 is accurately controlled to match the target code amount, and this encoded data is sent to the buffer memory 6.

また、無効ビット計算器６８からの無効ビット符号量情報は、ＶＢＶバッファ推移観測器５７にも送信される。ＶＢＶバッファ推移観測器５７では、当該無効ビット符号量の値も前述の収束点画像の符号量としてカウントする。 The invalid bit code amount information from the invalid bit calculator 68 is also transmitted to the VBV buffer transition observer 57. In the VBV buffer transition observer 57, the value of the invalid bit code amount is also counted as the code amount of the convergence point image.

なお、収束点画像のみでは、収束不可能な場合も考えられる。すなわち、収束点画像以前のピクチャが大きな符号量を発生してしまう可能性である。したがって、本実施の形態では、ある画像に対して多くの符号量を割り当てたい場合であっても、下記式（２０）に適合するように、その画像に対する符号量を配分することが望ましい。 Note that it may be impossible to converge only with the convergence point image. In other words, a picture before the convergence point image may generate a large code amount. Therefore, in the present embodiment, even when it is desired to assign a large amount of code to a certain image, it is desirable to allocate the code amount for that image so as to conform to the following equation (20).

（収束点画像nから次の収束点画像n+1までの画像の枚数）×（VideoRate/29.97）−（VBV値n−VBV値n-1）（２０）
この第２の実施の形態のオーディオビデオ符号化装置によれば、第１の実施の形態のオーディオビデオ符号化装置と同様の効果を有するだけでなく、符号量を１バイトの誤差も含まずに非常に正確に制御できるので、ナビゲーションデータの内容と実際の符号化データとが矛盾する可能性を非常に低くすることが可能である。 (Number of images from convergence point image n to next convergence point image n + 1) × (VideoRate / 29.97) − (VBV value n−VBV value n−1) (20)
According to the audio video encoding device of the second embodiment, not only has the same effect as the audio video encoding device of the first embodiment, but also the code amount does not include an error of 1 byte. Since the control can be performed very accurately, the possibility that the contents of the navigation data contradict the actual encoded data can be extremely reduced.

本発明の第１の実施の形態のオーディオビデオ符号化装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the audio video encoding apparatus of the 1st Embodiment of this invention. 本発明の第１の実施の形態のオーディオビデオ符号化装置のＶＢＶバッファ制御器の具体的構成をブロック図である。FIG. 3 is a block diagram showing a specific configuration of a VBV buffer controller of the audio video encoding device according to the first embodiment of the present invention. 符号化の際の一般的な仮想復号バッファ（ＶＢＶバッファ）占有量制御の説明に用いる図である。It is a figure used for description of the general virtual decoding buffer (VBV buffer) occupation amount control in the case of encoding. 本発明の実施の形態のオーディオビデオ符号化装置による符号化の際の仮想復号バッファ（ＶＢＶバッファ）占有量のバッファ制御タイミングと収束値の説明に用いる図である。It is a figure used for description of the buffer control timing and convergence value of the virtual decoding buffer (VBV buffer) occupation amount at the time of encoding by the audio video encoding device of the embodiment of the present invention. ユニットアドレス計算器の計算内容の説明に用いる図である。It is a figure used for description of the calculation content of a unit address calculator. 本発明の第２の実施の形態のオーディオビデオ符号化装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the audio video encoding apparatus of the 2nd Embodiment of this invention. 本発明の第２の実施の形態のオーディオビデオ符号化装置のＶＢＶバッファ制御器の具体的構成をブロック図である。It is a block diagram about the concrete structure of the VBV buffer controller of the audio video encoding apparatus of the 2nd Embodiment of this invention. 従来のビデオエンコーダの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the conventional video encoder. 従来のビデオデコーダの概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the conventional video decoder. ビデオオブジェクトユニットとナビゲーションデータが配されるビデオオブジェクトセットの構成説明に用いる図である。It is a figure used for description of a structure of a video object set in which a video object unit and navigation data are arranged.

符号の説明Explanation of symbols

１…画像信号の入力端子、２、９…演算器、３…ＤＣＴ器、４…量子化器、
５…ＶＬＣ、６…バッファメモリ、７…逆量子化器、８…逆ＤＣＴ器、
１０…画像メモリ、１１…動き補償予測器、
１３…オーディオビデオ符号化レート決定器（符号化レート決定手段）、
１５…ユニットアドレス計算器（アドレス決定手段）、
１６…ナビゲーションデータ生成器（記述手段）、
１７…ユニット化器（ユニット化手段）、１８…出力端子、
１９…オーディオ符号化のデータ入力端子、２２…無効ビット付加器、
４０，４１…ＶＢＶバッファ制御器（符号量制御手段）、
５１…符号量制御信号の出力端子、５２…符号化レート情報の入力端子、
５３…発生符号量の入力端子、５４…フィードバック量子化値決定器、
５５…目標符号量発生符号量比較器、５６…目標符号量計算器（目標符号量計算手段）、５７…ＶＢＶバッファ推移観測器（バッファ推移観測手段）、
６６…仮目標符号量設定器、６７…目標符号量メモリ、６８…無効ビット計算器、
５９…無効ビット符号量情報の出力端子。 DESCRIPTION OF SYMBOLS 1 ... Image signal input terminal, 2, 9 ... Operation unit, 3 ... DCT device, 4 ... Quantizer,
5 ... VLC, 6 ... Buffer memory, 7 ... Inverse quantizer, 8 ... Inverse DCT device,
10: Image memory, 11: Motion compensation predictor,
13: Audio video encoding rate determiner (encoding rate determining means),
15 ... Unit address calculator (address determination means),
16 ... navigation data generator (description means),
17 ... Unitizer (unitization means), 18 ... Output terminal,
19 ... Data input terminal for audio encoding, 22 ... Invalid bit adder,
40, 41 ... VBV buffer controller (code amount control means),
51 ... Output terminal for code amount control signal, 52 ... Input terminal for coding rate information,
53 ... Input terminal of generated code amount, 54 ... Feedback quantized value determiner,
55 ... target code amount generation code amount comparator, 56 ... target code amount calculator (target code amount calculation means), 57 ... VBV buffer transition observer (buffer transition observation means),
66 ... Temporary target code amount setter, 67 ... Target code amount memory, 68 ... Invalid bit calculator,
59... Invalid bit code amount information output terminal.

Claims

所定単位の入力データを符号化する際に、符号化レートを決定し、この符号化レートを用いて復号時の復号バッファに相当する仮想バッファのバッファ占有量が復号時点で所定の値よりもアンダーフローしないように符号化し、符号化された符号化データを複数のユニット内に格納するデータ符号化方法において、
前記入力データのうち、ＭＰＥＧのＧＯＰ構造中で他のピクチャの復号時に参照されるリファレンスピクチャとして符号化される各ピクチャに対しては、前記各ピクチャの復号時点における前記仮想バッファの各占有量を前記符号化レートに対応して前記所定の値よりも大きな値として求め、前記各ピクチャを、前記仮想バッファの各占有量がその求めた前記仮想バッファの各占有量となるように符号化し、
前記符号化された符号化データを所定時間内に再生されるべきデータ毎に各ユニット内に格納し、
一のユニットの時間的に前後に再生される所定数のユニットのアドレスと、前記一のユニット内における前記ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスとを、前記符号化レートに基づいて求め、
前記所定数のユニットのアドレス及び前記一のユニット内における前記ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスを前記一のユニット内の先頭に格納することを特徴とするデータ符号化方法。 When coding the input data of a predetermined unit, and determines the coding rate, the under than a predetermined value in the buffer occupancy of the virtual buffer decoding time corresponding to the decoding buffer at the time of decoding by using the coding rate In a data encoding method for encoding so as not to flow and storing the encoded data in a plurality of units ,
Of the entering force data for each pin Chi catcher encoded as a reference picture to be referred to when decoding other pictures in the GOP structure in the MPEG, each of the virtual buffer at the decoding time of each picture The occupation amount is obtained as a value larger than the predetermined value corresponding to the encoding rate, and each picture is encoded so that each occupation amount of the virtual buffer becomes the obtained occupation amount of the virtual buffer. And
The encoded encoded data is stored in each unit for each data to be reproduced within a predetermined time,
And addresses of a predetermined number of units to be played back and forth in time of one unit, and the end address of the first intra-picture data of the GOP structure of the MPEG in said one unit, the coding rate Based on
Data encoding and storing, at the head of the 1st end address of the one of the intra-picture data unit of the GOP structure of the MPEG in the predetermined number of units of addresses and said one unit Method.

所定単位の入力データに対して符号化レートを決定する符号化レート決定手段と、
前記符号化レート決定手段からの前記符号化レートを用いて、復号時の復号バッファに相当する仮想バッファのバッファ占有量が復号時点で所定の値よりもアンダーフローしないように符号化する符号化手段であり、前記入力データのうち、ＭＰＥＧのＧＯＰ構造中で他のピクチャの復号時に参照されるリファレンスピクチャとして符号化される各ピクチャに対しては、前記各ピクチャの復号時点における前記仮想バッファの各占有量を前記符号化レートに対応して前記所定の値よりも大きな値として求め、前記各ピクチャを、前記仮想バッファの各占有量がその求めた前記仮想バッファの各占有量となるように符号化する符号化手段と、
前記符号化手段で符号化された前記符号化データを所定時間内に再生されるべきデータ毎に各ユニット内に格納するユニット化手段と、
一のユニットの時間的に前後に再生される所定数のユニットのアドレスと、前記一のユニット内における前記ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスとを、前記符号化レートに基づいて求めるアドレス決定手段と、
前記所定数のユニットのアドレス及び前記一のユニット内における前記ＭＰＥＧのＧＯＰ構造中の第１番目のイントラピクチャデータの終了アドレスを前記一のユニット内の先頭に格納する格納手段と、
を有することを特徴とするデータ符号化装置。 Coding rate determining means for determining a coding rate for a predetermined unit of input data;
Encoding means for encoding so that the buffer occupancy of the virtual buffer corresponding to the decoding buffer at the time of decoding does not underflow below a predetermined value at the time of decoding using the encoding rate from the encoding rate determining means Among the input data, for each picture encoded as a reference picture to be referenced when decoding other pictures in the MPEG GOP structure, each of the virtual buffers at the time of decoding of each picture The occupation amount is obtained as a value larger than the predetermined value corresponding to the encoding rate, and each picture is encoded so that each occupation amount of the virtual buffer becomes the obtained occupation amount of the virtual buffer. Encoding means for
Unitizing means for storing the encoded data encoded by the encoding means in each unit for each data to be reproduced within a predetermined time;
An address of a predetermined number of units reproduced before and after one unit of time and an end address of the first intra-picture data in the MPEG GOP structure in the one unit are expressed as the encoding rate. Address determining means to be obtained based on
Storage means for storing the addresses of the predetermined number of units and the end address of the first intra-picture data in the MPEG GOP structure in the one unit at the head in the one unit;
Features and to Lud over data encoding apparatus that has a.