CN114745549A - Video coding method and system based on region of interest - Google Patents

Video coding method and system based on region of interest Download PDF

Info

Publication number
CN114745549A
CN114745549A CN202210350595.8A CN202210350595A CN114745549A CN 114745549 A CN114745549 A CN 114745549A CN 202210350595 A CN202210350595 A CN 202210350595A CN 114745549 A CN114745549 A CN 114745549A
Authority
CN
China
Prior art keywords
frame
prediction
original image
region
satd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210350595.8A
Other languages
Chinese (zh)
Other versions
CN114745549B (en
Inventor
毕江
王立冬
金强
肖春艳
韩强
樊思津
张文东
周骋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Radio And Television Station
Sumavision Technologies Co Ltd
Original Assignee
Beijing Radio And Television Station
Sumavision Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Radio And Television Station, Sumavision Technologies Co Ltd filed Critical Beijing Radio And Television Station
Priority to CN202210350595.8A priority Critical patent/CN114745549B/en
Publication of CN114745549A publication Critical patent/CN114745549A/en
Application granted granted Critical
Publication of CN114745549B publication Critical patent/CN114745549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a video coding method and a system based on a region of interest, wherein a downsampling module downsamples an original image to obtain a low-resolution image. And the primary selection prediction module divides the low-resolution image into a plurality of macro blocks, and performs intra-frame or inter-frame prediction on the macro blocks in the region of interest to obtain the result of the optimal primary selection prediction mode of the current macro block. And the coding module is used for setting coding units in the original image to code each coding unit, and acquiring the optimal prediction angle Dir required by real coding according to the result of the optimal primary selection prediction mode of the current macro block in the coding processbestOr the best predicted motion vector and the corresponding rate-distortion optimized RDO value.

Description

Video coding method and system based on region of interest
Technical Field
The invention relates to a video coding technology, in particular to code rate control when a video is coded according to a region of interest.
Background
The purpose of video coding is to remove redundant information in video and compress data amount, and currently, a hybrid coding framework of 'prediction + transformation + quantization' is generally adopted for compression coding.
Prediction is to predict the information of the current pixel by using the information of the known pixel, and can be divided into two categories: intra prediction and inter prediction. The intra-frame prediction is to predict the pixel value of the current pixel based on the spatial correlation among pixels in the same frame image, for example, the pixel value (predicted value) of the current block is projected and predicted by using the reconstructed pixel of the adjacent domain block of the current coding unit; inter-frame prediction is based on the correlation between different frame images in the time domain, and predicts the pixel value of the current pixel, for example, the motion trajectories of the current coding block and the corresponding block in the reference image are tracked and predicted, the current coding unit is predicted by using the reference block adjacent in the time domain, and the motion estimation precision is improved by means of interpolation calculation and the like.
And subtracting the pixel values of the predicted value and the original video image point by point, carrying out variable coding on the residual error, and further concentrating the energy in a low-frequency area through cosine transformation.
Quantization is the only process which brings video quality loss, the coding process needs to balance the selection process of QP (quantization parameter), the QP is larger, more high-frequency signals are lost, and the image becomes fuzzy and loses texture details; the smaller the QP, the larger the residual coefficients that remain, which may exceed the nominal bandwidth of the code rate.
With the requirements of ultra-high definition, high dynamic range, high color gamut, high-fluency video playing and good viewing experience, higher demands are made on the coding performance and quality. Because the information amount of the video image presented by the terminal is larger than that of the prior art, audiences tend to be more sensitive to human eyes in a flat low-frequency area or an eyeball-attracting area in a scene, such as elements of facial expressions, rolling identification captions, television station logos and the like in a television play and a stage evening; for scenes with severe changes in the time-space domain scale, such as objects moving at high speed, decoration with complex textures, human eyes are often ignored, and accurate understanding of image content and corresponding coding technology are important links for improving quality and balancing code rate distribution in the coding process.
The traditional video coding needs to traverse various combinations of coding tools, such as coding unit partition strategies and prediction methods with different sizes, measure coding loss by comparing rate-distortion cost functions of different combinations, and determine an optimal coding mode. This process is the most time-consuming link in the encoding process, but it cannot guarantee the best encoding quality. For example, the importance of the local region is described by using the conventional complexity index such as SATD (sum of absolute differences after hadamard transformation of the residual), SAD (sum of absolute differences), etc., which tends to allocate more bits to the region to which the human eye is not sensitive, and consumes more computing resources, resulting in poor real-time performance.
In view of the problems in the prior art, an object of the present invention is to provide a method and a system for video coding decision based on regions of interest, which reduce the computational resources required for coding and improve the real-time performance while providing better viewing experience for users.
Disclosure of Invention
In order to solve the above problem, a first technical solution is a region-of-interest based video coding method, including,
an information reading step, which is to sequentially acquire original image data of each frame of a video and pixel position information of an interested area in the original image, wherein the original image at least comprises one interested area;
a down-sampling step, namely down-sampling the original image to obtain a low-resolution image;
a preliminary selection prediction step, namely dividing the low-resolution image into a plurality of macro blocks, carrying out intra-frame prediction on the macro blocks in the region of interest, traversing the prediction angle supported in the coding standard, calculating the distortion SATD value of the intra-frame prediction pixel after projection reconstruction and the pixel of the low-resolution image, and obtaining the inter-frame minimum distortion SATDbestCorresponding predicted angle Dirbest
For the original image of an I frame, SATD (minimum distortion degree)bestCorresponding predicted angle DirbestAs a result of the best preliminary prediction mode for the current macroblock;
for the original image of the P frame or the B frame, searching the coordinates of the area corresponding to the interested area in the adjacent frame, calculating the motion vector corresponding to the change of the barycentric position of the interested area between the reference frame and the reference frame, searching by taking the motion vector as a starting vector, sequentially calculating the SATD under different offsets, and determining the optimal motion vector prediction value MVbestAnd minimum value of inter prediction distortion (SATD)interComparing the minimum distortion degree SATDbestAnd the minimum value SATD of the inter-prediction distortion degreeinterSelecting a prediction result with small distortion as a result of the optimal initial selection prediction mode of the current macro block;
an encoding step of setting encoding units in the original image to encode each encoding unit, and in the encoding process, for the original image of the I frame, according to the prediction angle DirbestConstructing a prediction reference angle set, traversing the angles in the set, comparing the RDO values corresponding to all the angles, and obtaining the optimal prediction angle Dir required by real codingbest
For the original image of P frame or B frame, according to the selection node of the initial selection prediction stepIf intra-frame prediction is selected, the optimal prediction angle Dir required by real coding is obtained according to the same method as the I framebest(ii) a If the inter-frame prediction is selected, stretching the motion vector obtained in the preliminary selection prediction step according to the scaling scale, and comparing rate distortion optimization RDO values of different stretched motion vectors in the same search range to obtain the optimal prediction motion vector required by the real coding and the corresponding rate distortion optimization RDO value.
Therefore, in the encoding process, the invention can predict only the macro block of the interested area according to the pixel position information of the interested area, and calculate the optimal prediction angle Dir required by real encoding when encoding the encoding unit according to the obtained result of the optimal prediction modebestAnd the corresponding rate distortion optimization RDO value or the optimal prediction motion vector and the corresponding rate distortion optimization RDO value improve the real-time performance of coding. In addition, different coding strategies do not need to be selected according to different characteristics of the interested region, and the method is particularly suitable for video coding of the interested region with different characteristics mixed in the same frame of image, so that better watching experience is provided for a user, and the overall efficiency is improved.
Preferably, in the encoding step, a reference quantization parameter QP is allocated to the original picturebaseCounting the sum of the SATD values of the distortion degrees of different interested areas in the original image, allocating a local target code rate to the interested areas according to the area ratio of the interested areas to the original image, taking the sum of the SATD values as the input of a code rate control algorithm, allocating a quantization parameter QP to each interested area according to the local target code rate,
Figure BDA0003579959140000041
wherein, clip3(x, min, max) limits x to (min, max).
Because the size of the region of interest is smaller than that of the whole video image, when the bit number is allocated to the coding unit in the region of interest, a certain degree of offset is carried out relative to the quantization parameter QP of the current image, namely, the code rate resource is moderately inclined to the region of interest, so that the effective utilization of the code rate is improved, and the code rate allocation of the region of interest is more reasonable.
Preferably, the original image includes Y, U, V data of three channels, and in the down-sampling step, the data of the Y component in the original image is down-sampled to obtain a low-resolution image.
Because the data of the Y component is only downsampled to obtain the low-resolution image, the calculation time and the calculation amount are saved.
Preferably, in the down-sampling step, pixels closest to the edge of the low-resolution image are sequentially copied, and then pixels extending outward around the low-resolution image are added.
Since the pixels closest to the edge of the low-resolution image are copied in sequence and then added with the pixels which are expanded outwards at the periphery of the low-resolution image, the search area can contain the boundary of the low-resolution image.
Preferably, in the encoding step, for the original image of the I frame, the encoding unit divides the original image into minimum sizes, constructs a prediction reference angle set for macroblocks of different division levels, traverses angles in the set, and compares RDO values corresponding to the angles to obtain an optimal prediction angle for a real encoding process of each layer.
Because the I frame is a reference frame when the P frame and the B frame are coded, the coding unit is divided into the minimum size, the detail information of the image can be reserved, and the coding quality of the whole video sequence is improved.
Preferably, the region of interest is a face region, and in the encoding step, when the original image of the P frame or the B frame contains the face region in the current encoding unit, it is determined whether the encoding unit contains the face region and the edge of the background or five sense organs, and if so, the encoding unit is divided into the minimum size; and if not, dividing a layer, and when the sum of the rate-distortion optimized RDO values of all the sub-units after division is smaller than the rate-distortion RDO value corresponding to the optimal prediction mode when the sub-units are not divided, dividing the coding unit by a layer.
Therefore, high-frequency detail information of the human face area and the edge of the background or the five sense organs can be reserved, and the code rate is more reasonably utilized while the video impression experience is improved.
Preferably, the region of interest is a caption region, and in the encoding step, when the current encoding unit contains a caption region for an original image of a P frame or a B frame, it is determined whether the encoding unit contains a boundary position between a caption and a background region, and if so, the encoding unit is divided into minimum sizes; if not, the coding unit is not divided.
Since the movement of the subtitles is usually a rigid movement of horizontal or vertical movement, the calculation time of the roughing process can be saved. The real-time performance of the coding is improved.
Preferably, the region of interest is a fixed identifier region, and in the encoding step, when the current encoding unit of the original image of the P frame or the B frame includes a fixed identifier, it is determined whether the encoding unit includes an edge of the fixed identifier, and if so, the encoding unit is divided into minimum sizes; if not, the coding unit is not divided.
Because fixed marks such as television station marks and the like are usually fixed at fixed positions of video pictures, the video impression experience is improved, and meanwhile, the calculation time in the rough selection process can be saved. The real-time performance of the coding is improved.
A second technical solution is a region-of-interest based video coding system, comprising,
the information reading module 100 sequentially obtains original image data of each frame of a video and pixel position information of an interested area in the original image, wherein the original image at least comprises one interested area;
a down-sampling module 200, which down-samples the original image to obtain a low resolution image;
the initial selection prediction module 300 divides the low-resolution image into a plurality of macro blocks, performs intra-frame prediction on the macro blocks in the region of interest, traverses the prediction angle supported in the coding standard, calculates the distortion SATD value of the intra-frame prediction pixel after projection reconstruction and the pixel of the low-resolution image, and obtains the inter-frame minimum distortion SATD valuebestAnd pair ofCorresponding predicted angle Dirbest
For the original image of an I frame, SATD (minimum distortion degree)bestCorresponding predicted angle DirbestAs a result of the best preliminary prediction mode for the current macroblock;
for the original image of the P frame or the B frame, searching the coordinates of the area corresponding to the interested area in the adjacent frame, calculating the motion vector corresponding to the change of the barycentric position of the interested area between the reference frame and the reference frame, searching by taking the motion vector as a starting vector, sequentially calculating the SATD under different offsets, and determining the optimal motion vector prediction value MVbestAnd minimum value of inter prediction distortion (SATD)interComparing the minimum distortion degree SATDbestAnd the minimum value SATD of the inter-prediction distortion degreeinterSelecting a prediction result with small distortion as a result of the optimal initial selection prediction mode of the current macro block;
an encoding module 400, configured to set encoding units in the original image to encode each encoding unit, and during encoding, for the original image of the I frame, according to the prediction angle DirbestConstructing a prediction reference angle set, traversing the angles in the set, comparing the RDO values corresponding to all the angles, and obtaining the optimal prediction angle Dir required by real codingbest
For the original image of the P frame or the B frame, according to the selection result of the initial selection prediction step, if the selected image is intra-frame prediction, the optimal prediction angle Dir required by real coding is obtained according to the same method as the I framebest(ii) a If the inter-frame prediction is selected, stretching the motion vector obtained in the preliminary selection prediction step according to the scaling scale, and comparing rate distortion optimization RDO values of different stretched motion vectors in the same search range to obtain the optimal prediction motion vector required by the real coding and the corresponding rate distortion optimization RDO value.
The technical effect is the same as that of the first technical scheme.
Preferably, the encoding module 400 allocates a reference quantization parameter QP to the original imagebaseAll in allCalculating the sum of the SATD values of the distortion degrees of different interested areas in the original image, allocating a local target code rate to the interested areas according to the area ratio of the interested areas to the original image, taking the sum of the SATD values as the input of a code rate control algorithm, allocating a quantization parameter QP to each interested area according to the local target code rate,
Figure BDA0003579959140000061
wherein, clip3(x, min, max) limits x to (min, max).
The technical effect is the same as that of the first technical scheme.
Drawings
FIG. 1 is a diagram illustrating an embodiment of a region of interest based video encoding system;
fig. 2 is an explanatory diagram of downsampling an original image;
fig. 3 is an explanatory diagram of dividing a low resolution image into macroblocks;
FIG. 4 is a flowchart of setting an optimal prediction mode when the region of interest is a face region;
FIG. 5 is a flowchart illustrating setting an optimal prediction mode when the region of interest is a subtitle region;
fig. 6 is a flowchart of setting the optimal prediction mode when the region of interest is the logo region.
Detailed Description
In the following detailed description of the preferred embodiments of the invention, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration, specific features of the invention, such that the advantages and features of the invention may be more readily understood and appreciated. The following description is an embodiment of the claimed invention, and other embodiments related to the claims not specifically described also fall within the scope of the claims.
First, a video coding decision based on a region of interest will be explained.
In the first step, each frame of original image of the video is down-sampled by 1/s times to obtain a low-resolution image, for example, the Y component of the original image is down-sampled to save the calculation time.
And step two, outwards expanding the edge of the low-resolution image by s pixels to obtain an expanded low-resolution image. By expanding the pixels, the boundary of the low-resolution image can be searched during searching, and the searching range is improved.
And a third step of dividing the low resolution image portion of the extended low resolution image into a number of macroblocks.
In addition, the original image of each frame takes a human face, a caption, a station caption and the like as interested areas through a system such as a neural network and the like to be recognized, and the position information of each pixel of the interested areas is obtained. The texture features and motion features of the face, the subtitles and the station captions are different and the same as the region of interest.
And fourthly, carrying out intra-frame prediction on the macro blocks positioned in the interested area to obtain the best coarse preselection result. The result of the optimal coarse pre-selection is used for calculating the rate distortion optimization RDO value of the coding unit during coding and the corresponding optimal prediction angle and optimal motion vector in the real coding process.
In the fifth step, the original image is divided into a plurality of coding units, for example, into a plurality of coding units of 64 × 64 pixels, and encoded. In the encoding process, if the original image is an I frame, the encoding unit is divided into minimum sizes, for example, 8x8 pixels, so as to retain the detail information of the image and improve the encoding quality of the whole video sequence.
If the original image of P frame or B frame, when the coding unit belongs to the face area, it is determined whether it belongs to the edge area of the face of P frame or B frame and five sense organs, if it includes, the coding unit is subdivided into the minimum size, for example, 8x8 pixels. If the maximum coding unit is 64x64 pixels, the coding unit is divided into 4 sub-units of 32x32 pixels.
And judging whether the coding unit containing the subtitle area is positioned at the boundary position of the subtitle and the background area, and if so, dividing the coding unit into 8x8 pixels with the minimum size to ensure the definition of the edge area. If not, because the motion of the subtitles in the image is regular vertical or horizontal movement, the subtitles belong to rigid motion, too large deformation cannot be caused, and the division of the coding units is not performed any more.
And a sixth step of calculating the rate-distortion optimized RDO value of the coding unit and the corresponding optimal prediction angle and the optimal motion vector in the real coding process according to the result of the optimal coarse pre-selection during the initial pre-selection prediction.
If the decision mode of the current coding unit in the initial selection prediction is intra-frame prediction, making a decision according to the decision process of the prediction direction of the I frame image in the initial selection prediction, and calculating the optimal rate distortion optimization RDO value; if the prediction is inter-frame prediction, the motion vector obtained by rough selection is firstly stretched, for example, motion estimation is carried out in a rectangular frame with a search range of 8x4, and finally, the best prediction motion vector and the corresponding rate distortion optimization RDO value are determined.
If the current video is a P frame or a B frame and the coding unit belongs to the logo region, similarly, the coding unit of the edge position is divided into 8x8 pixels with the minimum size; if not, the partition of the coding unit is not performed. For the decision of the coding mode, similarly, if the best prediction mode of the initial prediction is intra-frame prediction, the method in rough selection is used, for example, a reference angle set is constructed, and the best projection angle is determined after traversal. And if the inter-frame prediction is obtained by the initial selection prediction, the motion vector during the initial selection prediction is directly used as the prediction vector of the current coding unit after being stretched, and the coding mode is determined by comparing whether the rate distortion optimization RDO value of the residual is reserved or not.
A seventh step of allocating a reference quantization parameter QP to the original picture during encodingbaseCounting the sum of the SATD values of the distortion degrees of different interested areas in the original image, allocating a local target code rate to the interested areas according to the area ratio of the interested areas to the original image, taking the sum of the SATD values as the input of a code rate control algorithm, allocating a quantization parameter QP to each interested area according to the local target code rate,
Figure BDA0003579959140000091
wherein, clip3(x, min, max) limits x to (min, max).
Since the region of interest (ROI) is small relative to the size of the video image, when allocating the number of bits to the coding units in these regions, a strategy of performing a certain degree of offset on the reference QP of the current image is adopted, so that the bitrate resource can be moderately tilted toward the ROI. On the basis of not increasing the code rate remarkably, the watching experience of the whole video is further improved.
When the offset is distributed for the region of interest, the offset range needs to be limited, the occurrence of a maximum or minimum value is avoided, and the offset range is set to be-3 according to the change of quantization parameters after the compression rate is doubled in different indexes.
The down-sampling may be performed at a ratio of 1/2, 1/4, or 1/8 in both directions of the length and width of the image, for example, as needed.
At a sampling ratio of 1/16, i.e., 1/4 down-samples each frame image in the vertical and horizontal directions to the original size, the first image is 1/16 at the original resolution. The invention adopts Gaussian filter function to carry out down-sampling to obtain low-resolution images.
In the invention, the initial selection prediction is carried out on the low-resolution image, so that the speed of the rough selection prediction can be improved, and the time is saved.
Before rough selection prediction is carried out, the low-resolution image is sequentially expanded outwards by 16 pixel values from top to bottom and from left to right, namely the pixels closest to the edge of the image are sequentially copied for 16 times to obtain the expanded low-resolution image. The method can support the prediction process to refer to data except for the down-sampling image, and is compatible with the initial selection prediction speed and the accuracy of the initial selection prediction process.
Meanwhile, the coordinates of the region of interest are correspondingly scaled in cooperation with the low-resolution image, that is, the positions of the pixels of the region of interest in the low-resolution image are not changed.
Fig. 1 is a diagram illustrating an embodiment of a region-of-interest based video coding system. As shown in fig. 2, the present embodiment includes a region of interest recognition device 90, an information reading module 100, a down-sampling module 200, a preliminary selection prediction module 300, and an encoding module 400.
The region-of-interest identifying device 90 detects a region of interest in the video image, and obtains position information of each pixel of the region of interest in each frame image. In this embodiment, the region-of-interest recognition apparatus 90 includes a face recognition module 91, a subtitle recognition module 92, and a logo recognition module 93, and the face pixel position, the subtitle pixel position, and the logo pixel position information recognized by each module are respectively read by the information reading module 100 together with the video image. That is, the information reading module 100 sequentially reads the original image information and the pixel position information of the identified region of interest frame by frame.
The down-sampling module 200 down-samples the original image to obtain a low resolution image. In this embodiment, the original image includes Y, U, V three-channel data, and the downsampling module 200 downsamples the Y component data in the original image to obtain a low resolution image, sequentially copies the pixels closest to the edge of the low resolution image, and adds the pixels extending outward around the low resolution image to obtain an extended low resolution image.
Fig. 2 is an explanatory diagram of a generation process from an original image, a low resolution image, and an extended low resolution image. With the down sampling of the image, the pixel position of the interested area is correspondingly adjusted, so that the position of the interested area in the original image is ensured to be consistent with the position of the interested area in the low-resolution image. Wherein the face 21, the subtitle 22, and the station caption 23 are respectively recognized as the regions of interest by the region of interest recognition device 90.
The initial selection prediction module 300 divides the low-resolution image into a plurality of macroblocks (see the macroblock 11 in fig. 3) of 8 × 8 pixels, performs intra-frame prediction on the macroblocks in the region of interest, traverses the prediction angle supported in the coding standard, calculates the distortion SATD values of the intra-frame prediction pixels after projection reconstruction and the pixels of the low-resolution image, and obtains the inter-frame minimum distortion SATD valuebestCorresponding predicted angle DirbestI.e. by traversing the prediction angle, obtaining the best prediction direction and distortion factor(Steps S10, S11, S20, S21, S30, S31 in FIGS. 4 to 6).
Judging whether the current frame is a P frame or a B frame (see steps S12, S22, S32 in FIGS. 4 to 6), if not, the picture is the original picture of the I frame, the best prediction mode is intra-frame prediction (steps 13, 23, 33 in FIGS. 4 to 6), and the SATD with the minimum distortion is determined after traversing all possible prediction anglesbestCorresponding to the predicted angle DirbestThe minimum distortion degree SATDbestCorresponding predicted angle DirbestAs a result of the best preliminary prediction mode for the current macroblock.
In the case of a P frame or a B frame, the macroblock can be intra-predicted and inter-predicted at this time. When the current macro block belongs to the face area, because the motion state of people in a video scene is not fixed, the optimal rough selection prediction mode of the current macro block has two possibilities of intra-frame prediction and inter-frame prediction, and the respective optimal prediction methods are judged.
Firstly, inter-frame prediction is carried out, coordinate axes of corresponding face areas in adjacent frames are searched according to the positions of the face areas of P frames or B frames, motion vectors corresponding to the changes of the gravity center positions of rectangular frames of the two frames of faces are calculated (step S14 in figure 4), the motion vectors are used as starting vectors, motion estimation is started based on a traditional motion estimation algorithm (step S15 in figure 4), searching is carried out in a rectangular frame with the fixed size of 16x8 pixels (S16), SATD values under different motion vector offsets are sequentially compared, and an optimal motion vector prediction value MV is determinedbestAnd minimum value of inter prediction distortion (SATD)inter
Then, intra-frame prediction is carried out, the same prediction mode as that in the I frame is carried out, and the minimum distortion SATD is determined after all possible prediction angles are traversedintraCorresponding predicted angle DirintraThen, SATD is comparedintraAnd SATDinterThe prediction mode with the smaller distortion degree is selected as the optimal rough prediction mode of the current macroblock (step S18 in fig. 4).
If the current macroblock belongs to the caption area, the caption can be judged in advance to be rigid motion conforming to horizontal or vertical movement according to the prior knowledge about the caption. At this time
For the image of the I frame, the same prediction process as the human face area is adopted, namely, all prediction angles are traversed, and the minimum distortion degree SATD between frames is selectedbestCorresponding predicted angle DirbestAs a result of the best preliminary prediction mode for the current macroblock (step S23, fig. 5).
For the image of P frame or B frame, firstly judging whether the frame nearest to the I frame is the frame nearest to the I frame, when the frame nearest to the I frame is the frame, adopting the same method as the previous human face area to calculate the gravity center motion vector of the caption area in the I frame and the P frame or the B frame (step S26 in figure 5), taking the gravity center motion vector as the starting motion vector of the current macro block, selecting a smaller search range 8x4 pixel (step S27 in figure 5), starting motion estimation (step S28 in figure 5) to obtain and record the optimal motion vector predicted value MVbestAnd minimum value of inter prediction distortion (SATD)inter(step S29 of fig. 5). At this time, considering that the optimal motion vector between different macro blocks has difference due to pixel noise and other factors, the MV of the caption macro block in the current frame is neededbestAveraging and storing.
If the current frame is not nearest to the I frame, the optimal motion vector predicted value MV which is obtained by previous calculation is usedbestAs the base vector, the corresponding best prediction vector is obtained by stretching according to the distance (step S25 in fig. 5), so that the calculation time of the rough selection process can be saved.
When the current macro block belongs to the station logo area, the rough selection process can be further simplified. The station caption is generally fixed at a fixed position in a video picture, and the position of the station caption in a longer-time video sequence is considered to be kept unchanged, and the inter-frame minimum distortion SATD is obtained for the I frame in the same prediction mode of the human face and the captionbestCorresponding predicted angle Dirbest
For P frame or B frame, the initial error of motion estimation is directly set to (0, 0) due to the fixed position of the station logo (step S34 in FIG. 6), then a search range of 2x2 pixels is set (step S35 in FIG. 6), motion estimation is started in a smaller search range (step S36 in FIG. 6), and finally the determination of the optimal motion vector predictor MV is determinedbestAnd minimum value of inter prediction distortion (SATD)inter(step S37 of fig. 6).
The encoding module 400 encodes each coding unit, and in the present embodiment, encodes according to HEVC and AVS2 standards.
In the encoding process, parameters are set as follows.
For the original image of the I frame, according to the prediction angle DirbestConstructing a prediction reference angle set, traversing the angles in the set, comparing rate distortion optimization RDO values corresponding to all the angles, and obtaining an optimal prediction angle Dir required by real codingbest
For the original image of P frame or B frame, according to the selection result of the initial selection prediction step, if the selected prediction is intra-frame prediction, the optimal prediction angle Dir required by real coding is obtained according to the same method as the I framebest(ii) a If the inter-frame prediction is selected, stretching the motion vector obtained in the preliminary selection prediction step according to the scaling scale, and comparing rate distortion optimization RDO values of different stretched motion vectors in the same search range to obtain the optimal prediction motion vector required by the real coding and the corresponding rate distortion optimization RDO value.
During the encoding process, the encoding module (400) allocates a reference quantization parameter QP to the original imagebaseThe sum of the SATD values of the distortion degrees of different interested areas in the original image is counted, a local target code rate is allocated to the interested areas according to the area ratio of the interested areas to the original image, the sum of the SATD values is used as the input of a code rate control algorithm, a quantization parameter QP is allocated to each interested area according to the local target code rate,
Figure BDA0003579959140000121
wherein, clip3(x, min, max) limits x to (min, max).
In the invention, the down-sampling of the video frame is performed to perform rough selection prediction, so that the rough selection prediction speed can be improved; when the region of interest is a face, the edge of the face and the region where the five sense organs are located are only divided into smaller regions, so that the calculated amount in the encoding process is reduced, and the real-time performance of the system is ensured; by fine adjustment of the code rate, the code rate distribution of the region of interest is more reasonable.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.

Claims (10)

1. A region-of-interest based video coding method is characterized by comprising,
the method comprises the steps of reading information, namely sequentially acquiring original image data of each frame of a video and pixel position information of an interested area in the original image, wherein the original image at least comprises one interested area;
a down-sampling step, namely down-sampling the original image to obtain a low-resolution image;
a preliminary selection prediction step, namely dividing the low-resolution image into a plurality of macro blocks, carrying out intra-frame prediction on the macro blocks in the region of interest, traversing the prediction angle supported in the coding standard, calculating the distortion SATD value of the intra-frame prediction pixel after projection reconstruction and the pixel of the low-resolution image, and obtaining the inter-frame minimum distortion SATDbestCorresponding predicted angle Dirbest
For the original image of an I frame, SATD (minimum distortion degree)bestCorresponding predicted angle DirbestAs a result of the best preliminary prediction mode for the current macroblock;
for the original image of the P frame or the B frame, searching the coordinates of the area corresponding to the interested area in the adjacent frame, calculating the motion vector corresponding to the change of the barycentric position of the interested area between the reference frame and the reference frame, searching by taking the motion vector as a starting vector, sequentially calculating the SATD under different offsets, and determining the optimal motion vector prediction value MVbestAnd minimum value of inter prediction distortion (SATD)interComparing the minimum distortion degree SATDbestAnd the framePrediction distortion minimum SATDinterSelecting a prediction result with small distortion as a result of the optimal initial selection prediction mode of the current macro block;
an encoding step of setting an encoding unit in the original image to encode each encoding unit, and in the encoding process, for the original image of the I frame, encoding the original image according to the prediction angle DirbestConstructing a prediction reference angle set, traversing the angles in the set, comparing rate distortion optimization RDO values corresponding to all the angles, and obtaining an optimal prediction angle Dir required by real codingbest
For the original image of the P frame or the B frame, according to the selection result of the initial selection prediction step, if the selected image is intra-frame prediction, the optimal prediction angle Dir required by real coding is obtained according to the same method as the I framebest(ii) a If the inter-frame prediction is selected, stretching the motion vector obtained in the preliminary selection prediction step according to the scaling scale, and comparing rate distortion optimization RDO values of different stretched motion vectors in the same search range to obtain the optimal prediction motion vector required by the real coding and the corresponding rate distortion optimization RDO value.
2. The region-of-interest-based video coding method according to claim 1, wherein: in the encoding step, a reference quantization parameter QP is allocated to the original imagebaseCounting the sum of the SATD values of the distortion degrees of different interested areas in the original image, allocating a local target code rate to the interested areas according to the area ratio of the interested areas to the original image, taking the sum of the SATD values as the input of a code rate control algorithm, allocating a quantization parameter QP to each interested area according to the local target code rate,
Figure FDA0003579959130000021
wherein, clip3(x, min, max) limits x to (min, max).
3. A region-of-interest based video coding method according to claim 1 or 2, characterized in that: the original image comprises Y, U, V data of three channels, and in the down-sampling step, the data of Y component in the original image is down-sampled to obtain a low-resolution image.
4. The region-of-interest-based video coding method according to claim 3, wherein: in the down-sampling step, pixels closest to the edge of the low-resolution image are sequentially copied and then added with outward-extended pixels around the low-resolution image.
5. The region-of-interest-based video coding method according to claim 4, wherein: in the encoding step, the encoding unit divides the original image of the I frame into the minimum size, constructs a prediction reference angle set for macro blocks of different division levels, traverses angles in the set, compares rate distortion optimization RDO values corresponding to the angles, and obtains the optimal prediction angle of the real encoding process of each layer.
6. The region-of-interest-based video coding method according to claim 5, wherein: the region of interest is a face region, in the encoding step, when the original image of the P frame or the B frame contains the face region in the current encoding unit, whether the encoding unit contains the face region and the edge of the background or five sense organs is judged, and if the encoding unit contains the face region and the edge of the background or five sense organs, the encoding unit is divided into the minimum size; and if not, dividing a layer, and when the sum of the rate-distortion optimized RDO values of all the sub-units after division is smaller than the rate-distortion RDO value corresponding to the optimal prediction mode when the sub-units are not divided, dividing the coding unit by a layer.
7. The region-of-interest-based video coding method according to claim 5, wherein: the interested region is a caption region, in the encoding step, when the current encoding unit contains a caption region for the original image of the P frame or the B frame, whether the encoding unit contains the boundary position of the caption and the background region is judged, and if so, the encoding unit is divided into the minimum size; if not, the coding unit is not divided.
8. The region-of-interest-based video coding method according to claim 5, wherein: the region of interest is a fixed identification region, in the encoding step, when the original image of the P frame or the B frame contains a fixed identification in the current encoding unit, whether the encoding unit contains the edge of the fixed identification is judged, and if the encoding unit contains the edge of the fixed identification, the encoding unit is divided into the minimum size; if not, the coding unit is not divided.
9. A region-of-interest based video coding system, comprising,
the information reading module (100) is used for sequentially acquiring original image data of each frame of a video and pixel position information of an interested area in the original image, wherein the original image at least comprises one interested area;
a down-sampling module (200) for down-sampling the original image to obtain a low-resolution image;
a primary selection prediction module (300) which divides the low-resolution image into a plurality of macro blocks, performs intra-frame prediction on the macro blocks in the region of interest, traverses the prediction angle supported in the coding standard, calculates the distortion SATD value of the intra-frame prediction pixel after projection reconstruction and the pixel of the low-resolution image, and obtains the inter-frame minimum distortion SATD valuebestCorresponding predicted angle Dirbest
For the original image of an I frame, SATD (minimum distortion degree)bestCorresponding predicted angle DirbestAs a result of the best preliminary prediction mode for the current macroblock;
for the original image of P frame or B frame, searching the coordinates of the region corresponding to the region of interest in the adjacent frame, calculating the motion vector corresponding to the change of the barycentric position of the region of interest between the reference frames, and taking the motion vector as the starting directionSearching, calculating SATD under different offset in turn, and determining the optimal motion vector predicted value MVbestAnd minimum value of inter prediction distortion (SATD)interComparing the minimum distortion degree SATDbestAnd the minimum value SATD of the inter-prediction distortion degreeinterSelecting a prediction result with small distortion as a result of the optimal initial selection prediction mode of the current macro block;
an encoding module (400) for setting encoding units in the original image to encode each encoding unit, and during encoding, for the original image of the I frame, according to the prediction angle DirbestConstructing a prediction reference angle set, traversing the angles in the set, comparing the RDO values corresponding to all the angles, and obtaining the optimal prediction angle Dir required by real codingbest
For the original image of the P frame or the B frame, according to the selection result of the initial selection prediction step, if the selected image is intra-frame prediction, the optimal prediction angle Dir required by real coding is obtained according to the same method as the I framebest(ii) a If the inter-frame prediction is selected, stretching the motion vector obtained in the preliminary selection prediction step according to the scaling scale, and comparing rate distortion optimization RDO values of different stretched motion vectors in the same search range to obtain the optimal prediction motion vector required by the real coding and the corresponding rate distortion optimization RDO value.
10. The region-of-interest based video coding system of claim 9, wherein: the encoding module (400) allocates a reference quantization parameter QP to the original picturebaseCounting the sum of the SATD values of the distortion degrees of different interested areas in the original image, allocating a local target code rate to the interested areas according to the area ratio of the interested areas to the original image, taking the sum of the SATD values as the input of a code rate control algorithm, allocating a quantization parameter QP to each interested area according to the local target code rate,
Figure FDA0003579959130000041
wherein, clip3(x, min, max) limits x to (min, max).
CN202210350595.8A 2022-04-02 2022-04-02 Video coding method and system based on region of interest Active CN114745549B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210350595.8A CN114745549B (en) 2022-04-02 2022-04-02 Video coding method and system based on region of interest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210350595.8A CN114745549B (en) 2022-04-02 2022-04-02 Video coding method and system based on region of interest

Publications (2)

Publication Number Publication Date
CN114745549A true CN114745549A (en) 2022-07-12
CN114745549B CN114745549B (en) 2023-03-17

Family

ID=82278276

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210350595.8A Active CN114745549B (en) 2022-04-02 2022-04-02 Video coding method and system based on region of interest

Country Status (1)

Country Link
CN (1) CN114745549B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116506628A (en) * 2023-06-27 2023-07-28 苇创微电子(上海)有限公司 Pixel block-based coding predictor method, coding system and coding device
CN116828154A (en) * 2023-07-14 2023-09-29 湖南中医药大学第一附属医院((中医临床研究所)) Remote video monitoring system
CN116962685A (en) * 2023-09-21 2023-10-27 杭州爱芯元智科技有限公司 Video encoding method, video encoding device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006059848A1 (en) * 2004-12-03 2006-06-08 Samsung Electronics Co., Ltd. Method and apparatus for multi-layered video encoding and decoding
CN101163241A (en) * 2007-09-06 2008-04-16 武汉大学 Video sequence coding/decoding structure
CN101282479A (en) * 2008-05-06 2008-10-08 武汉大学 Method for encoding and decoding airspace with adjustable resolution based on interesting area
CN101572810A (en) * 2008-04-29 2009-11-04 合肥坤安电子科技有限公司 Video encoding method based on interested regions
CN102510496A (en) * 2011-10-14 2012-06-20 北京工业大学 Quick size reduction transcoding method based on region of interest
US20170041605A1 (en) * 2015-08-04 2017-02-09 Fujitsu Limited Video encoding device and video encoding method
US20170085892A1 (en) * 2015-01-20 2017-03-23 Beijing University Of Technology Visual perception characteristics-combining hierarchical video coding method
CN113079376A (en) * 2021-04-02 2021-07-06 北京数码视讯软件技术发展有限公司 Video coding method and device for static area

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006059848A1 (en) * 2004-12-03 2006-06-08 Samsung Electronics Co., Ltd. Method and apparatus for multi-layered video encoding and decoding
CN101163241A (en) * 2007-09-06 2008-04-16 武汉大学 Video sequence coding/decoding structure
CN101572810A (en) * 2008-04-29 2009-11-04 合肥坤安电子科技有限公司 Video encoding method based on interested regions
CN101282479A (en) * 2008-05-06 2008-10-08 武汉大学 Method for encoding and decoding airspace with adjustable resolution based on interesting area
CN102510496A (en) * 2011-10-14 2012-06-20 北京工业大学 Quick size reduction transcoding method based on region of interest
US20170085892A1 (en) * 2015-01-20 2017-03-23 Beijing University Of Technology Visual perception characteristics-combining hierarchical video coding method
US20170041605A1 (en) * 2015-08-04 2017-02-09 Fujitsu Limited Video encoding device and video encoding method
CN113079376A (en) * 2021-04-02 2021-07-06 北京数码视讯软件技术发展有限公司 Video coding method and device for static area

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116506628A (en) * 2023-06-27 2023-07-28 苇创微电子(上海)有限公司 Pixel block-based coding predictor method, coding system and coding device
CN116506628B (en) * 2023-06-27 2023-10-24 苇创微电子(上海)有限公司 Pixel block-based coding predictor method, coding system and coding device
CN116828154A (en) * 2023-07-14 2023-09-29 湖南中医药大学第一附属医院((中医临床研究所)) Remote video monitoring system
CN116828154B (en) * 2023-07-14 2024-04-02 湖南中医药大学第一附属医院((中医临床研究所)) Remote video monitoring system
CN116962685A (en) * 2023-09-21 2023-10-27 杭州爱芯元智科技有限公司 Video encoding method, video encoding device, electronic equipment and storage medium
CN116962685B (en) * 2023-09-21 2024-01-30 杭州爱芯元智科技有限公司 Video encoding method, video encoding device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114745549B (en) 2023-03-17

Similar Documents

Publication Publication Date Title
CN114745549B (en) Video coding method and system based on region of interest
CA2408364C (en) Method for encoding and decoding video information, a motion compensated video encoder and a corresponding decoder
US8817871B2 (en) Adaptive search range method for motion estimation and disparity estimation
JP2006519565A (en) Video encoding
CN101378504B (en) Method for estimating block matching motion of H.264 encode
JP2006519564A (en) Video encoding
EP1419650A2 (en) method and apparatus for motion estimation between video frames
US11425413B2 (en) Encoder, decoder, encoding method, decoding method, and recording medium
EP1461959A2 (en) Sharpness enhancement in post-processing of digital video signals using coding information and local spatial features
US20130235935A1 (en) Preprocessing method before image compression, adaptive motion estimation for improvement of image compression rate, and method of providing image data for each image type
US20220030241A1 (en) Encoder, decoder, encoding method, and decoding method
CN113079376B (en) Video coding method and device for static area
US20070258521A1 (en) Method for motion search between video frames with multiple blocks
US20230308649A1 (en) Encoder, decoder, encoding method, and decoding method
Paul et al. Pattern-based video coding with dynamic background modeling
CN112954365A (en) HEVC interframe motion estimation pixel search improvement method
Eisips et al. Global motion estimation for image sequence coding applications
Yang et al. Fast depth map coding based on virtual view quality
CN106878753B (en) 3D video residual coding mode selection method using texture smoothing information
JP2883592B2 (en) Moving picture decoding apparatus and moving picture decoding method
CN117294861B (en) Coding block dividing method based on inter-frame prediction and coder
US11716470B2 (en) Encoder, decoder, encoding method, and decoding method
JPH0993537A (en) Digital video signal recording and reproducing device and digital video signal coding method
JP2883585B2 (en) Moving picture coding apparatus and moving picture coding method
Hammani et al. Fast Depth Map Intra Mode Prediction Based on Self-organizing Map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant