CN1219403C

CN1219403C - Visual model induced MPEG video code string rate inversion method

Info

Publication number: CN1219403C
Application number: CN 02157889
Authority: CN
Inventors: 张勇东; 曹岗; 林守勋; 李***
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2002-12-20
Filing date: 2002-12-20
Publication date: 2005-09-14
Anticipated expiration: 2022-12-20
Also published as: CN1510923A

Abstract

The present invention relates to an MPEG video code stream and code rate conversion method for introducing a visual model, which comprises the procedures: partially decoding an input code stream; truncating DCT coefficient and removing coefficient higher than a cut-off frequency; controlling a code rate and newly determining quantizing factors of each macro block; recoding. The present invention skillfully uses a Fovea vision model in transformation, effectively improves conversion efficiency, generates low code rate code stream with correspondingly good subjective quality, and further decreases computational amounts.

Description

Introduce the MPEG video stream code rate conversion method of vision mode

Technical field

The present invention relates to MPEG video stream code rate conversion method.

Background technology

Along with video compression technology and development of internet technology, the diverse network multimedia service as multipoint videoconference, video request program, Digital Television etc., constantly occurs.In order to support various services, video server must adapt to the heterogeneity of client and transmission channel, thereby requires it to have the function of video code flow conversion.The code stream conversion comprises syntax conversion, (room and time) conversion of resolution, code check conversion etc.The present invention is directed to the code check conversion, promptly existing video code flow is converted to the code stream of the lower code check that adapts with it according to the actual bandwidth restriction of transmission channel.

Video code flow conversion at present has many methods, may be summarized to be three class architectures: the conversion of (1) cascade pixel domain; (2) cascade pixel domain conversion fast; (3) DCT (discrete cosine transform) territory conversion.Cascade pixel domain conversion needs through complete decoding, the process of recompile again, and amount of calculation is big, and conversion speed is very slow.The conversion of DCT territory is directly carried out on the DCT territory, need not the DCT/IDCT process, and amount of calculation is very little, but its flexibility is restricted, and is difficult to realize when requiring to change motion vector, is difficult for realizing expansion.The conversion of cascade pixel domain is the simplification version of cascade pixel domain conversion fast, owing to do not need to carry out estimation, so conversion speed is changed apparently higher than the cascade pixel domain; But have owing to will carry out the DCT/IDCT process, so conversion speed to be lower than the conversion of DCT territory.

At present existing video code flow conversion does not utilize human visual system (HVS) characteristic well, and the low code check code stream that causes being converted to can not be consistent with the HVS characteristic well, and subjective quality is relatively poor, and conversion efficiency is low.

Summary of the invention

The purpose of this invention is to provide a kind of quick MPEG video stream code rate conversion method consistent, in the heterogeneous network environment, transmit the better video code flow of subjective quality with the HVS characteristic.

To achieve these goals, a kind of MPEG video stream code rate conversion method of introducing vision mode comprises step:

Code stream to input carries out partial decoding of h;

The DCT coefficient blocks, and removes the coefficient that is higher than cut-off frequency;

Rate Control redefines the quantizing factor of each macro block;

Encode again.

The present invention has utilized the Fovea vision mode dexterously in conversion, improve conversion efficiency effectively, produces the relatively better low code check code stream of subjective quality, and has further reduced amount of calculation.

Description of drawings

Fig. 1 is a structural representation of the present invention;

Fig. 2 is that the multiresolution frequency band of 8 * 8 DCT coefficient block is represented.

Embodiment

In order to understand the present invention better, at first the Fovea vision mode is described.According to HVS be studies show that: human eye is heterogeneous for the sampling of visual information.Generally speaking, a blinkpunkt is arranged when human eye is watched piece image, can be described as the Fovea point, have the highest perceived sharpness at this some place human eye.With this point is the center, extends the human eye perceived sharpness towards periphery and descends fast.According to such characteristic, people provide the Fovea vision mode that can be applicable to encoding video pictures: given Fovea point, in the image more arbitrarily (x, y), its cut-off frequency (but maximum perceived frequency of human eye) f _c(x, y) determine by following formula:

f_{c} (x, y) = \min {\frac{1}{8} : d &GreaterEqual; B [i, V], 1 \leq i \leq 8, i &Element; Z^{+}}

d＝(x-x _f)2 ⁺(y-y _f) ²

B[i，V]＝min{r ²：[f _c(r，V)×8]＝i，r∈Z ⁺}

f_{c} (r, V) = \frac{1}{1 + K \arctan (\frac{r - R}{V})}

Wherein, (x _f, y _f) Fovea point coordinates in the representative image, V represents the distance of viewpoint to image, model parameter k=13.75, and the R representative is the radius in central circular zone with the Fovea point, giving the highest perceived sharpness to this zone (is f _c=1.0) coding.Be higher than cut-off frequency f in the vision intermediate frequency rate _c(x, information y) can not be by the human eye perception.

One two field picture is divided into 8 zones, has identical cut-off frequency in each zone, different regional cut-off frequency differences, the cut-off frequency span is:

\frac{i}{8} (1 \leq i \leq 8, i &Element; Z^{+}) .

Fig. 1 has provided structural representation of the present invention, and the meaning of abridging among the figure is: VLD-variable length decoding, VLC-mutilation long codes, DCT-discrete cosine transform, IDCT-inverse discrete cosine transformation, Q-quantification, IQ-inverse quantization, MV-motion vector, MC-motion compensation, the storage of FM-frame.To have amount of calculation less in view of the architecture of quick cascade pixel domain conversion, and structure is flexible, and the advantage of being convenient to expand the present invention is based on this structure, and has carried out corresponding improvement according to the Fovea vision mode.The present invention mainly is made of following components:

● partial decoding of h

Code check to input is R ₁Mpeg video stream carry out variable length decoding (VLC), carry out inverse quantization (IQ1) according to the quantizing factor information in the code stream afterwards, obtain each 8 * 8 DCT coefficient.

● the DCT coefficient blocks

According to the Fovea vision mode, if the coefficient that is higher than cut-off frequency in 8 * 8 DCT pieces can with its removal, can not be influenced subjective visual quality do by the perception of people's subjective vision, can improve conversion efficiency effectively.DCT coefficient truncation module is for realizing that this purpose adds.

Can be similar to and think that one 8 * 8 have unique cut-off frequency, the central point of generally getting 8 * 8 is representative, by its cut-off frequency f of this piece of coordinate Calculation _cOne 8 * 8 DCT coefficient block can be divided into 8 frequency bands, constitutes multi-resolution representation, as shown in Figure 2.For any frequency band m, its frequency f (m) is:

\frac{m}{8} (1 \leq m \leq 8, m &Element; Z^{+}) .

Can followingly explain based on the DCT coefficient method for cutting of Fovea vision mode like this: given Fovea point, for one 8 * 8 DCT piece, its cut-off frequency is f _c, its a DCT coefficient be F (u, v), this coefficient belongs to frequency band m, so:

F (u, v) = \{\begin{matrix} F (u, v) & f (m) \leq f_{c} \\ 0 & f (m) > f_{c} \end{matrix}

● Rate Control

Will be the code check of MPEG video code flow by R ₁Reduce to R ₂, will use the Rate Control module to redefine the quantizing factor of each macro block, according to quantizing factor to DCT coefficient re-quantization.The present invention improves original MPEG TM5 bit rate control method according to the Fovea vision mode, constitutes the new bit rate control method based on the Fovea vision mode, and its key step is as follows:

(1) picture frame level target code bit number distributes

Concrete grammar is identical with the TM5 method, no longer elaborates.

(2) macro-block level target code bit number distributes

The number of coded bits of supposing a two field picture is R, and total M macro block in this image has N 8 * 8 in each macro block.Original TM5 method is to each macro block mean allocation target code bit number, and promptly for any macro block k, the target code bit number that it is assigned with is

r^{(k)} = \frac{R}{M} .

After improving, the target code bit number of macro block distributes (cut-off frequency in the macro block is high more, and the target code bit number that is assigned with should be many more) according to the size of cut-off frequency, that is:

r^{(k)} = \frac{Σ_{j = 0}^{N} {({f_{c}}^{(k)} (j))}^{2}}{Σ_{i = 0}^{M \times N} {(f_{c} (i))}^{2}} R

Wherein The quadratic sum of N 8 * 8 cut-off frequency in the expression macro block k,

Quadratic sum for all cut-off frequencies of 8 * 8 in the image.

(3) Rate Control

According to the full scale of virtual buffering region (VBV), determine the reference quantization factor Q of each macro block _iThe method of Cai Yonging is identical with TM5 herein, no longer elaborates.

(4) adaptive quantizing

In the TM5 method, come self adaptation to determine its final quantizing factor according to the spatial activity of macro block, and the spatial activity of macro block is the minimum value of all 8 * 8 block space activity in this macro block, and wherein 8 * 8 block space activity are to be determined by the information change rate V in the piece, that is:

V = \frac{1}{64}

Σ_{i = 0}^{64} {(p_{i} - p_{mean})}^{2},

Wherein

p_{mean} = \frac{1}{64} Σ_{i = 0}^{64} {p_{i}}^{2}

P wherein _iThe brightness value of i pixel in the expression piece.Information such on compression domain can't obtain, and the present invention proposes the computational methods of DCT block space activity V_DCT for this reason:

V_DCT = \frac{1}{N} Σ_{i = 0}^{N} {| F_{i} |}^{2}

Wherein, the number that is lower than all ac coefficients of this piece cut-off frequency in this DCT piece is N, F _iRepresent one value in this N coefficient.

According to the spatial activity of all 8 * 8 DCT block space activity macro blocks in the macro block, determine spatial activity (after the standardizing) NV of this macro block _i, the final quantizing factor mq of this macro block so _iFor:

mq _i＝Q _i×NV _i

● encode again

Final quantizing factor mq according to each macro block _iCoefficient to all the DCT pieces in this macro block quantizes (Q2) again, carries out mutilation long codes (VLC) afterwards again, and the generation code check is R ₂The MPEG video code flow.

● the error drift compensation

Above process promptly can realize the conversion of MPEG video code flow.Yet, cause error drift, the picture quality of the code stream that influence conversion back is generated owing to can cause not the matching of reference picture of coding side and decoding end to the quantification again (Q2) of DCT coefficient.Need the error drift compensating module to avoid error drift for this reason.

The difference of the DCT coefficient after quantizing preceding DCT coefficient again and quantizing is carried out idct transform, obtain the pixel domain coefficient, send in the frame memory.Then according to the resulting motion vector of partial decoding of h (MV) information, carry out motion compensation (MC) in pixel domain, and utilize dct transform to convert the DCT coefficient to the predicted value of gained, feedback the residual error DCT coefficient addition with original predictive frame, thereby realize the error drift compensation.

Owing to will carry out IDCT and dct transform, therefore to compare with the conversion of DCT territory, operand is bigger.But according to the Fovea vision mode, can calculate, the present invention proposes the DCT/IDCT quick calculation method in view of the above, significantly reduce the DCT/IDCT amount of calculation a part of DCT coefficient.Original DCT and IDCT computing formula are respectively:

F (u, v) = \frac{1}{4} C (u) C (v) Σ_{i = 0}^{7} Σ_{j = 0}^{7} f (i, j) \times \cos \frac{πu (2 i + 1)}{16} \cos \frac{πv (2 j + 1)}{16}

f (i, j) = \frac{1}{4} Σ_{i = 0}^{7} Σ_{j = 0}^{7} C (u) C (v) F (u, v) \times \cos \frac{πu (2 i + 1)}{16} \cos \frac{πv (2 j + 1)}{16}

If one 8 * 8 cut-off frequency is

\frac{t}{8} (1 \leq t \leq 8, t &Element; Z^{+})

All high frequency DCT coefficients that are higher than cut-off frequency can be disregarded not by the human eye perception in this piece so, and promptly assignment is 0.Therefore when this piece is carried out the DCT/IDCT conversion, only calculate the DCT coefficient that is lower than cut-off frequency, thereby DCT and IDCT computing formula become:

f (i, j) = \frac{1}{4} Σ_{i = 0}^{i} Σ_{j = 0}^{i} C (u) C (v) F (u, v) \times \cos \frac{πu (2 i + 1)}{16} \cos \frac{πv (2 j + 1)}{16}

At last, it may be noted that in the present invention that the selection that Fovea is ordered can be realized with alternant way by mouse by the user.

Claims

1. MPEG video stream code rate conversion method of introducing vision mode comprises step:

Code stream to input carries out partial decoding of h;

Rate Control redefines the quantizing factor of each macro block;

Encode again.

2. by the described method of claim 1, it is characterized in that described partial decoding of h comprises step:

Video flowing to input carries out variable length decoding;

Carry out inverse quantization according to the quantizing factor in the code stream.

3. by the described method of claim 1, it is characterized in that described Rate Control comprises step:

Picture frame level target code bit number distributes;

Macro-block level target code bit number distributes, and distributes according to the size of cut-off frequency;

According to the full scale of virtual buffering region, determine the reference quantization factor Q of each macro block _i

Adaptive quantizing.

4. by the described method of claim 1, it is characterized in that described coding again comprises step:

According to the final quantizing factor of each macro block, the coefficient of all the DCT pieces in this macro block is quantized;

Carry out the mutilation long codes again.

5. by the described method of claim 1, it is characterized in that also comprising the error drift compensation process:

The difference of the DCT coefficient after quantizing preceding DCT coefficient again and quantizing is carried out idct transform;

According to the resulting motion vector information of partial decoding of h, carry out motion compensation in pixel domain;

Utilize dct transform to convert the DCT coefficient to resulting predicted value, and feedback the residual error DCT coefficient addition with original predictive frame.

6. by the described method of claim 5, it is characterized in that the conversion Calculation formula of described DCT/IDCT is as follows:

f (i, j) = \frac{1}{4} Σ_{i = 0}^{t} Σ_{j = 0}^{t} C (u) C (v) F (u, v) \times \cos \frac{πu (2 i + 1)}{16} \cos \frac{πv (2 j + 1)}{16}