CN105959700B

CN105959700B - Video image coding method, device, storage medium and terminal equipment

Info

Publication number: CN105959700B
Application number: CN201610379923.1A
Authority: CN
Inventors: 罗斌姬; 王浦林; 刘海军; 王诗涛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2016-05-31
Filing date: 2016-05-31
Publication date: 2020-04-14
Anticipated expiration: 2036-05-31
Also published as: CN105959700A

Abstract

The invention provides a video image coding method, which comprises the following steps: receiving a video image to be coded, identifying scene complexity of the video image, determining resolution corresponding to the scene complexity of the video image, switching the resolution of a current encoder to the resolution corresponding to the scene complexity of the video image by switching coding header information of the encoder when the resolution of the current encoder needs to be adjusted, and coding the video image according to the switched resolution. In the process, the encoder does not need to be restarted, so that the free switching of the resolution can be realized only by encoding a P frame or a B frame without encoding an I frame, and the encoding efficiency is greatly improved. In addition, a video image coding device is also provided.

Description

Video image coding method, device, storage medium and terminal equipment

Technical Field

The present invention relates to the field of video processing, and in particular, to a method, an apparatus, a storage medium, and a terminal device for video image encoding.

Background

With the development of the internet, video watching becomes more and more popular, and the requirements of people on the fluency and the definition of videos are higher and higher. However, in general, the bitrate of the video is fixed, so that the bitrate allocated to each frame is basically fixed, and thus the image quality is relatively good in static and small-motion scenes, but if a scene with severe and complex textures is suddenly entered, the quantization parameter becomes large, so that the image quality is poor, and even a severe mosaic phenomenon occurs. In addition, when the network bandwidth changes, for example, the network condition becomes worse, the bitrate allocated to each frame becomes lower, so that even if the video image is still and in a scene with small motion, the video image quality becomes worse due to the reduced bitrate.

If the resolution of the first-class video is reduced when the quality of the video is poor, the code rate allocated to the unit pixel is increased, and the video can be encoded by using a lower quantization parameter. The adjustment of the resolution is very important to the process of video encoding. However, the conventional method for adjusting the resolution requires restarting the encoder and then encoding the I frame for switching, and the encoding efficiency of the I frame is extremely low, which causes the image quality of the frame to be very poor and affects the appearance.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method and an apparatus for encoding a video image with high encoding efficiency.

A method of video image encoding, the method comprising:

receiving a video image to be encoded;

identifying scene complexity of the video image;

determining a resolution corresponding to a scene complexity of the video image;

when the resolution of the current encoder needs to be adjusted, the resolution of the current encoder is switched to the resolution corresponding to the scene complexity of the video image by switching the encoding header information of the encoder;

and encoding the video image according to the switched resolution.

A device for video image encoding, the device comprising:

the receiving module is used for receiving a video image to be coded;

the identification module is used for identifying scene complexity of the video image;

a determining module for determining a resolution corresponding to a scene complexity of the video image;

the switching module is used for switching the resolution of the current encoder to the resolution corresponding to the scene complexity of the video image by switching the encoding header information of the encoder when the resolution of the current encoder needs to be adjusted;

and the coding module is used for coding the video image according to the switched resolution.

According to the method and the device for encoding the video image, the video image to be encoded is received, the scene complexity of the video image is identified, the resolution corresponding to the scene complexity of the video image is determined, when the resolution of the current encoder needs to be adjusted, the resolution of the current encoder is switched to the resolution corresponding to the scene complexity of the video image by switching the encoding header information of the encoder, and the video image is encoded according to the switched resolution. According to the method and the device, the scene complexity of the video image is identified, the resolution corresponding to the scene complexity is determined, when the resolution of the current encoder needs to be adjusted, the resolution of the current encoder can be switched to the resolution corresponding to the scene complexity by changing the encoding header information, the encoder does not need to be restarted in the process, so that the I frame does not need to be encoded, the resolution can be freely switched by encoding the P frame or the B frame, and the encoding efficiency is greatly improved.

Drawings

FIG. 1 is a diagram of an exemplary embodiment of a video encoding method;

FIG. 2 is a diagram illustrating the structure of the encoding side according to an embodiment;

FIG. 3 is a schematic diagram illustrating the structure of the encoding end in another embodiment;

FIG. 4 is a flow diagram of a method for video image encoding according to one embodiment;

FIG. 5 is a flow diagram of a method for resolution switching in one embodiment;

FIG. 6 is a flow diagram of a method for encoding video images according to a switched resolution in one embodiment;

FIG. 7a is a diagram illustrating downsampling of a reference frame, according to one embodiment;

FIG. 7b is a diagram illustrating upsampling of a reference frame in one embodiment;

FIG. 8 is a diagram illustrating edge-filling of a sampled reference frame, according to an embodiment;

FIG. 9 is a flowchart of a method for video image encoding according to another embodiment;

FIG. 10 is a diagram illustrating the encoding and decoding process in one embodiment;

FIG. 11 is a block diagram of an apparatus for encoding video pictures according to an embodiment;

FIG. 12 is a block diagram of an apparatus for encoding video pictures according to another embodiment;

FIG. 13 is a block diagram of the structure of an encoding module in one embodiment;

fig. 14 is a block diagram of an apparatus for encoding video images according to still another embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, in one embodiment, the method for encoding video images can be applied to an application environment as shown in fig. 1, in which an encoding end 102 and a decoding end 104 are connected through a network. The encoding terminal 102 is configured to receive a video image to be encoded, identify scene complexity of the video image, and further determine a resolution corresponding to the scene complexity of the video image, when it is detected that the current resolution needs to be adjusted, switch the resolution of the current encoder to a resolution corresponding to the scene complexity of the video image by switching encoding header information of the encoder, encode the video image according to the switched resolution, then send the encoded video image and corresponding encoding header information to the decoding terminal 104 through a network, where the decoding terminal 104 is configured to receive the encoding header information and the encoded video image sent by the encoding terminal, and decode the video image according to the resolution in the encoding header information. The encoding end 102 may be a terminal or a server. When the encoding end 102 is a terminal, the resolution of the current encoder can be switched to the resolution corresponding to the scene complexity of the video image by switching the encoding header information of the encoder according to the scene change in real time during video recording, and then the video image is encoded according to the switched resolution and sent to the decoding end 104, and the decoding end 104 decodes the video image with the resolution consistent with that of the encoding end. For example, when the encoding end and the decoding end perform a video call, on one hand, the encoding end can automatically switch resolutions to perform encoding according to scene changes when acquiring videos, and on the other hand, the decoding end decodes and plays the video image with the resolution consistent with that of the encoding end. When the encoding end 102 is a server, the received video image is encoded, the resolution is dynamically adjusted according to the scene complexity of the video image to encode the corresponding video image, the encoded video image is sent to the decoding end 104, and the decoding end 104 receives the encoding header information and the encoded video image sent by the server and decodes the video image with the resolution consistent with that during encoding.

As shown in fig. 2, in an embodiment, a composition structure of the encoding terminal 102 when it is a terminal is shown in fig. 2, and includes a processor, an internal memory, a nonvolatile storage medium, a network interface, a video capture device, a display screen, and an input device, which are connected by a system bus. The non-volatile storage medium of the encoding end 102 stores an operating system, and further includes a video image encoding apparatus, where the video image encoding apparatus is used to implement a video image encoding method. The processor is used for providing calculation and control capability and supporting the operation of the whole encoding end. The internal memory in the encoding end provides an environment for the operation of the apparatus for encoding video images in the non-volatile storage medium, and the internal memory stores computer-readable instructions, which when executed by the processor, cause the processor to perform a method for encoding video images. The network interface is used for connecting to a network for communication, such as transmitting encoded video images to a decoding terminal or the like. The video acquisition device is used for acquiring videos, such as recording the videos. The display screen of the coding end can be a liquid crystal display screen or an electronic ink display screen, and the input device can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, or an external keyboard, a touch pad or a mouse. The encoding end can be a mobile phone, a tablet computer, a personal digital assistant or a wearable device. Those skilled in the art will appreciate that the structure shown in fig. 2 is a block diagram of only a portion of the structure relevant to the present disclosure, and does not constitute a limitation on the encoding end to which the present disclosure applies, and that a particular encoding end may include more or less components than those shown in the figures, or combine certain components, or have a different arrangement of components.

As shown in fig. 3, in an embodiment, the component structure of the encoding terminal 102 when it is a server is shown in fig. 3, and includes a processor, a non-volatile storage medium, a memory, and a network interface, which are connected through a system bus. The non-volatile storage medium comprises an operating system, a database and a video image coding device. The database is used for storing data, such as video image data to be encoded. The video image coding device is used for coding video images, and a processor of the coding end is used for providing calculation and control capacity and supporting the operation of the whole coding end. The network interface of the encoding end is used for communicating with an external decoding end through network connection, such as sending encoded video images to the decoding end. The encoding end may be an independent server or a plurality of server clusters. Those skilled in the art will appreciate that the structure shown in fig. 3 is a block diagram of only a portion of the structure relevant to the present disclosure, and does not constitute a limitation on the encoding end to which the present disclosure applies, and that a particular encoding end may include more or less components than those shown in the drawings, or combine certain components, or have a different arrangement of components.

As shown in fig. 4, in an embodiment, a method for encoding a video image is provided, where the method is applicable to both a terminal and a server, and includes:

step 402, receiving a video image to be encoded.

The video is composed of pictures of a frame, and the data volume of the video is large, so the video needs to be coded before video transmission, the video coding is actually a coded video image, and the video coding is to remove redundant information in the data by adopting a compression technology because the video image data has strong correlation. Video encoding is implemented by an encoder, so before transmission, video needs to be encoded first via the encoder. Specifically, the encoder receives a video image to be encoded, and then encodes the video image according to preset encoding parameters.

At step 404, scene complexity of the video image is identified.

In this embodiment, the scene complexity of the video refers to the pixel variation of the current video image relative to the previous video image. The more the pixels of two adjacent video pictures change, the greater the scene complexity. Since a video is composed of consecutive pictures of one frame, the larger the difference between two adjacent pictures is, the greater the scene complexity is, and conversely, the lower the scene complexity is. After the encoder receives a video image to be encoded, the scene of the video image needs to be analyzed and identified, that is, the current motion scene is judged by calculating the scene complexity of the video image. In one embodiment, a motion scene may be simply divided into still, small motion, large motion, and violent motion. Under the condition of a fixed code rate, if a large motion scene or a violent motion scene occurs in a picture, the picture quality is reduced, the picture is jammed and unsmooth, and the like, the resolution needs to be switched down to the first level so as to improve the quality of the picture quality, and then if the picture is in a static or small motion scene, the resolution can be reset to the original state. Specifically, the complexity of the video scene may be calculated by a plurality of methods, in an embodiment, a prediction block may be calculated by performing intra-frame prediction or inter-frame prediction, and then a residual SAD (Sum of absolute Differences) between an actual block and the prediction block is calculated, where a larger residual indicates a larger code rate required by encoding, which indicates a more complex scene, and conversely, indicates a lower complexity. In another embodiment, for more detailed scene complexity calculation, in addition to calculating the residual SAD between the actual block and the predicted block, an average motion vector needs to be calculated, during the calculation, each block finds an optimal block in the reference frame, the distance between the optimal block and the current block is a motion vector, the average motion vector is an average value of the motion vectors of all the blocks, the larger the average motion vector is, the faster the motion is, and the smaller the average motion vector is, the slower the motion is, for example, when the average motion vector is 0, the picture is still. And finally obtaining a numerical value representing the scene complexity by weighting the calculated residual SAD and the average motion vector, wherein the larger the numerical value is, the larger the scene complexity is, and the smaller the scene complexity is otherwise.

At step 406, a resolution corresponding to the scene complexity of the video image is determined.

In this embodiment, a corresponding relationship between the scene complexity and the resolution of the video image is pre-established, and after the scene complexity of the video image is obtained through calculation, the resolution corresponding to the scene complexity of the current video image is determined from the pre-established corresponding relationship. For example, three resolutions are established in advance, and the resolution is divided into a low-grade resolution, a medium-grade resolution and a high-grade resolution according to the size of the resolution. When the scene complexity of the video image is greater than a preset first threshold, the corresponding resolution is a low-grade resolution; when the scene complexity of the video image is smaller than a preset second threshold, the corresponding resolution is a middle-grade resolution; when the scene complexity of the video image is greater than a preset second threshold and smaller than a preset first threshold, the corresponding resolution is a high-grade resolution, wherein the second threshold is smaller than the first threshold. And determining the resolution corresponding to the scene complexity of the video image after calculating the scene complexity of the video image.

And step 408, when the resolution of the current encoder needs to be adjusted, switching the resolution of the current encoder to a resolution corresponding to the scene complexity of the video image by switching the encoding header information of the encoder.

Specifically, the encoding header information, i.e., header information of the encoder, is used to set general parameters in various encoding processes, including a video Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), Slice header information (Slice header), and the like. The sequence parameter set SPS is used to describe parameter information for encoding the entire sequence, including resolution information for encoding. After the resolution corresponding to the scene complexity of the video image is determined, whether the resolution of the current encoder is consistent with the resolution corresponding to the scene complexity of the video image is judged, if not, the resolution of the current encoder is not suitable for the current scene complexity, and the resolution of the current encoder needs to be adjusted. If so, no adjustment is required. When the resolution of the current encoder needs to be adjusted, the resolution of the current encoder is switched to the resolution corresponding to the scene complexity of the video image by switching the encoding header information of the encoder. Specifically, the encoder initializes a plurality of encoding header information, each of which includes a resolution, and different encoding header information includes different resolutions. For example, three encoding header information are initialized in the encoder, each encoding header information includes a resolution, and the three encoding header information respectively includes three different resolutions, which are divided into a low-level resolution, a medium-level resolution, and a high-level resolution according to the resolution. Assume that the three coded header information are 1, 2, and 3, respectively. The header information 1 includes a low resolution, the header information 2 includes a medium resolution, and the header information 3 includes a high resolution. If it is determined that the resolution corresponding to the scene complexity of the video image is a low resolution and the current resolution is a medium resolution or a high resolution, it is necessary to switch the current header information 2 or 3 to the header information 1 including the low resolution.

And step 410, coding the video image according to the switched resolution.

In this embodiment, after the resolution of the current encoder is switched to the resolution corresponding to the scene complexity of the video image, in order to enable normal encoding, the reference frame needs to be up-sampled or down-sampled according to the resolution in the encoding header information, and then the video image is encoded according to the sampled reference frame. Specifically, the image resolution is the number of pixels included in a unit inch. Furthermore, the resolution may also be expressed in terms of the length and width of the picture, i.e. the size of the picture. The process of changing from high resolution to low resolution is called downsampling, and the process of changing from low resolution to high resolution is called upsampling. The down-sampling process is to obtain a reduced image by extracting part of pixel points in the original image. In the up-sampling process, missing pixel points need to be supplemented by methods such as interpolation, and a larger image is obtained. The reference frame refers to a frame which is required to be referred to by IPB coding. In the process of coding video images, I frames are intra-frame image data compression and are independent frames, reference is made to block coding in the image, and no reference frame is needed. P frames are coded with reference to a previous I frame or P frame, the number of reference frames being at most 2, both forward. B frames are coded with reference to preceding and following I or P frames, one frame before and after, or just forward or backward (three options). That is, the reference frame refers to a reference frame of a P frame or a B frame. Most of the process of encoding video pictures is P frames, so the corresponding up-sampling or down-sampling of reference frames is mainly for encoding P frames. Compared with the traditional technology that the resolution is switched by encoding the I frame, the encoding efficiency of the P frame or the B frame is greatly improved by encoding the P frame or the B frame.

In this embodiment, a video image to be encoded is received, scene complexity of the video image is identified, a resolution corresponding to the scene complexity of the video image is determined, when the resolution of a current encoder needs to be adjusted, the resolution of the current encoder is switched to the resolution corresponding to the scene complexity of the video image by switching encoding header information of the encoder, and the video image is encoded according to the switched resolution. According to the method, the scene complexity of the video image is identified, the resolution corresponding to the scene complexity is determined, when the resolution of the current encoder needs to be adjusted, the current resolution can be switched to the resolution corresponding to the scene complexity by changing the encoding header information, the encoder does not need to be restarted in the process, so that the resolution can be freely switched by encoding only P frames or B frames without encoding I frames, and the encoding efficiency is greatly improved.

In one embodiment, the step of determining a resolution corresponding to the scene complexity of the video image comprises: and determining the resolution corresponding to the scene complexity of the video image according to the pre-established corresponding relation between the scene complexity and the resolution of the video image.

In the present embodiment, the correspondence between the scene complexity and the resolution of the video image is set in advance. And determining the resolution corresponding to the scene complexity of the video image according to the calculated scene complexity of the video image. For example, two resolutions, a low resolution and a high resolution, are established in advance. When a scene of a video image is still or small in motion, in order to improve the sharpness of the image, encoding is performed with high resolution. When the scene of the video image is large motion or violent motion, low resolution is adopted for encoding in order to ensure the quality of the image quality. Specifically, a threshold is set in advance for the scene complexity of the video. Judging whether the scene complexity is greater than a preset threshold value or not by calculating the scene complexity of the video image, if so, indicating that the scene of the current video image is a large-motion or severe-motion scene, and the corresponding resolution is low resolution; if not, the scene of the current video image is a static or small motion scene, and the resolution corresponding to the scene is high resolution.

In one embodiment, the step of receiving a video image to be encoded further comprises, before the step of: a plurality of encoding header information is initialized, wherein different encoding header information contains different resolutions. The step 408 of switching the resolution of the current encoder to a resolution corresponding to the scene complexity of the video image by switching the encoding header information of the encoder comprises: and switching the coding header information of the current encoder into the coding header information which has the resolution consistent with the resolution corresponding to the scene complexity of the video image and is contained in the coding header information.

In this embodiment, establishing a plurality of resolutions is achieved by an encoder initializing a plurality of encoding header information. One piece of encoding header information contains one resolution, and different pieces of encoding header information contain different resolutions. When the resolution of the current encoder needs to be adjusted, the current encoding head information is only required to be switched into encoding head information which contains resolution consistent with resolution corresponding to scene complexity of the video image. The encoder initialization encoding header information is a parameter in various encoding processes for setting header information. Such as resolution, number of reference frames, etc. The encoder needs to pass some sequence level, picture level and coded slice level information to the decoder through the coding header information, where one important information is the resolution information. In order to dynamically encode different resolutions, it is necessary to dynamically switch the encoding header information. One encoding header information generally contains only one resolution information, so that several encoding header information are initialized to represent several resolutions in order to dynamically encode different resolutions. Specifically, assume that the encoder initializes two encoder headers, wherein one encoder header includes a resolution of 640x480 and the other encoder header includes a resolution of 480x 360. The encoding header information containing 640x480 is used as the initial default, when a video scene has large motion or severe motion, the current resolution 640x480 needs to be switched to 480x360, and then the current encoding header information only needs to be switched to the encoding header information containing 480x360 resolution.

For ease of description, the method of implementation is described in detail below using the H.264/H.265 encoding protocol. The header information of the h.264/h.265 coding protocol mainly includes SPS (video sequence parameter set), PPS (picture parameter set), and Sliceheader (coded slice header information). Where an SPS is a video sequence parameter set that describes parameter information for encoding of the entire sequence, including resolution information for encoding. Each SPS has an ID number, which may be denoted as SPS _ seq _ parameter _ set _ ID. PPS are picture parameter sets that describe parameter information of a picture, each PPS having an ID number, which may be denoted as PPS _ pic _ parameter _ set _ ID. And one PPS corresponds to one SPS, with PPS _ pic _ parameter _ set _ id to point to which SPS information the current PPS is associated. Typically, there is only one SPS and one PPS in one coded header. Then, in order to dynamically switch the resolution, several pieces of resolution information need to be saved in advance, that is, several SPS need to be initialized, and correspondingly several PPS need to be needed.

As shown in fig. 5, in one embodiment, the encoder initializes two SPSs and PPSs, i.e., sets MAX _ SPS _ COUNT to 2 and MAX _ PPS _ COUNT to 2. When the encoder is initialized, two SPS are established, which are respectively referred to as SPS [0] and SPS [1], wherein 0 and 1 respectively represent ID numbers corresponding to the SPS, namely SPS _ seq _ parameter _ set _ ID of the SPS [0] is 0, and SPS _ seq _ parameter _ set _ ID of the SPS [1] is 1. The resolution can be represented by the length and width of a picture, and SPS [0] - > pic _ width _ in _ luma _ samples ═ W1, and SPS [0] - > pic _ height _ in _ luma _ samples ═ H1 are respectively set; SPS [1] - > pic _ width _ in _ lu _ samples ═ W2, and SPS [1] - > pic _ height _ in _ luma _ samples ═ H2. Where pic _ width _ in _ luma _ samples indicates the width of the encoded picture and pic _ height _ in _ luma _ samples indicates the height (length) of the encoded picture, then the resolution of SPS [0] may be represented as W1xH1 and the resolution of SPS [1] may be represented as W2xH 2. Two PPS's are then set, denoted PPS [0] and PPS [1], where 0 and 1 represent the corresponding ID numbers of the PPS, respectively, i.e., PPS _ pic _ parameter _ set _ ID of PPS [0] is 0 and PPS _ pic _ parameter _ set _ ID of PPS [1] is 1. Each PPS is associated with an SPS, where PPS [0] is associated with SPS [0] and PPS [1] is associated with SPS [1 ]. Finally, two SLICEs (coded SLICEs) are created, denoted SLICE [0] and SLICE [1], where 0 and 1 respectively represent the ID numbers corresponding to the SLICEs, i.e., SLICE _ pic _ parameter _ set _ ID of SLICE [0] is 0 and SLICE _ pic _ parameter _ set _ ID of SLICE [1] is 1. Wherein each coded SLICE is associated with a PPS, wherein SLICE [0] is associated with PPS [0] and SLICE [1] is associated with PPS [1 ]. In the initial state, the SLICE _ pic _ parameter _ set _ id is 0, that is, the initial SLICE uses the coding information of PPS [0], and PPS [0] points to SPS [0], i.e., the resolution of the initial coding is W1xH 1. If the resolution is switched from W1xH1 to W2xH2, then the slice header only needs to be set to 1, i.e. slice _ pic _ parameter _ set _ id is 1, which points to PPS [1], and PPS [1] points to SPS [1], where the corresponding resolution is W2xH 2.

As shown in fig. 6, in one embodiment, the step of encoding the video image according to the switched resolution includes:

and step 410a, performing corresponding up-sampling or down-sampling on the reference frame according to the switched resolution.

In this embodiment, after the resolution is switched to the resolution corresponding to the scene complexity of the video image by changing the encoding header information, the reference frame of the video image needs to be correspondingly up-sampled or down-sampled in order to enable normal encoding. When switching from high resolution to low resolution, the reference frame needs to be downsampled, and when switching from low resolution to high resolution, the reference frame needs to be upsampled. Specifically, if the reference frame is downsampled, as shown in fig. 7a, a bilinear filtering method is used to downsample YUV components of the reference frame (light gray large image), where the YUV components refer to a Y component (luminance component), a U component (chrominance component), and a V component (chrominance component), and each frame is composed of these three components. The down-sampled reference frame (dark grey thumbnail) is saved. In addition, since the resolution is changed and the pixel values at the edges are also changed, the edges of the sampled reference frame need to be re-edge-repaired, and the edge-repairing method can be copied by using the values of the nearest pixels. If the reference frame is up-sampled, as shown in fig. 7b, a bilinear filtering method is used to up-sample the YUV component of the reference frame (light gray small image), and the up-sampled reference frame (dark gray large image) is stored, and similarly, since the resolution ratio changes, the pixel value at the edge also changes, so that the upper, lower, left and right edges of the up-sampled reference frame need to be re-edge-repaired, and the edge-repairing method can also be copied by using the value of the nearest pixel. As shown in fig. 8, it is a schematic diagram of performing edge compensation on the upsampled reference frame in an embodiment, where the edge compensation is to compensate an edge in each of the upper, lower, left, and right directions of the reference frame, as shown in fig. 8, a dark gray portion is an original value of the upsampled reference frame, and a light gray portion is a compensated edge. The method of edge-filling uses the value copy of the nearest pixel, taking the edge at the top of the gray image as an example, and the pixel value is equal to the value at the top of the original value (i.e. dark gray portion) of the reference frame corresponding to the vertical direction.

And step 410b, encoding the video image according to the sampled reference frame.

Specifically, after the reference frame is correspondingly sampled and edge-compensated according to the switched resolution, the processed reference frame is used to encode the video image, and different encoding methods such as predictive encoding and variable encoding can be adopted according to different requirements. And sending the coded video image to a decoder, so that the decoder decodes the video image according to the resolution in the coding header information.

As shown in fig. 9, in an embodiment, the method for encoding a video image further includes:

step 412, the encoded video image and the corresponding encoding header information are transmitted to a decoder, so that the decoder decodes the video image according to the resolution in the corresponding encoding header information.

In this embodiment, since the reference frame is modified at the encoding end, the decoding end must be modified in a consistent manner, otherwise the decoding will be lost. After the encoder encodes the video image, the encoded video image and the corresponding encoding head information are transmitted to the decoder, and when the decoder detects the information change of the resolution, the reference frame is correspondingly up-sampled or down-sampled by adopting a bilinear filter algorithm consistent with an encoding end. Specifically, the encoder first sends a plurality of initialized encoding header information to the decoder, and then sends the encoded video image to the decoder, and the decoder can acquire the encoding header information pointed by the video image from the header of the received video stream, and then decodes the video image according to the determined resolution in the encoding header information.

In order to make better use of the bandwidth, as shown in fig. 10, in one embodiment, the header information of the encoder is set according to the calculation result of the control module 1002. Specifically, the control module 1002 sets a set of encoding parameters according to conditions such as network conditions and hardware capabilities of the client, and then the control module 1002 transmits the set encoding parameters to the encoder, and the encoder 1004 initializes the encoding header information according to the received encoding parameters. For example, if the resolution set by the control module is 640x480, the encoder initializes a header containing the 640x480 resolution. In addition, for more dynamic resolution switching, the control module may set multiple sets of encoding parameters according to conditions such as current network conditions and hardware capabilities of the client, so that the encoder 1004 may initialize multiple sets of encoding header information according to the received multiple sets of encoding parameters. The encoder 1004 transmits the encoding header information and the encoded video image to the decoder 1006 through a network, and the decoder 1006 acquires the encoding header information pointed by the video image from the header of the received video code stream and decodes the video image according to the determined resolution in the encoding header information.

As shown in fig. 11, in one embodiment, an apparatus for encoding a video image is provided, the apparatus comprising:

a receiving module 1102, configured to receive a video image to be encoded.

And an identifying module 1104 for identifying scene complexity of the video image.

A determining module 1106 is configured to determine a resolution corresponding to the scene complexity of the video image.

A switching module 1108, configured to switch, when the resolution of the current encoder needs to be adjusted, the resolution of the current encoder to a resolution corresponding to the scene complexity of the video image by switching encoding header information of the encoder.

An encoding module 1110, configured to encode the video image according to the switched resolution.

In one embodiment, the determining module is further configured to determine a resolution corresponding to the scene complexity of the video image according to a pre-established correspondence between the scene complexity and the resolution of the video image.

As shown in fig. 12, in an embodiment, the apparatus for encoding a video image further includes:

an initializing module 1101, configured to initialize a plurality of encoding header information, where different encoding header information includes different resolutions; the switching module 1108 is further configured to switch, when the resolution of the current encoder needs to be adjusted, the encoding header information of the current encoder to encoding header information that includes a resolution that is consistent with a resolution corresponding to the scene complexity of the video image.

As shown in fig. 13, in one embodiment, the encoding module 1110 includes:

the sampling module 1110a is configured to perform corresponding upsampling or downsampling on the reference frame according to the switched resolution, where if the switched resolution is higher than the resolution before the switching, the reference frame is upsampled, and if the switched resolution is lower than the resolution before the switching, the reference frame is downsampled.

A video image encoding module 1110b, configured to encode the video image according to the sampled reference frame.

As shown in fig. 14, in an embodiment, the apparatus for encoding a video image further includes:

the transmission module 1111 is configured to transmit the encoded video image and the corresponding encoding header information to a decoder, so that the decoder decodes the video image according to the resolution in the corresponding encoding header information.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of video image encoding, the method comprising:

receiving a video image to be encoded;

identifying scene complexity of the video image;

determining a resolution corresponding to a scene complexity of the video image; when the resolution of the current encoder needs to be adjusted, the resolution of the current encoder is switched to the resolution corresponding to the scene complexity of the video image by switching the encoding header information of the encoder;

encoding the video image according to the switched resolution;

the encoding the video image according to the switched resolution includes:

performing corresponding up-sampling or down-sampling on the reference frame according to the switched resolution, wherein if the switched resolution is higher than the resolution before switching, the reference frame is up-sampled, and if the switched resolution is lower than the resolution before switching, the reference frame is down-sampled;

and coding the video image according to the sampled reference frame.

2. The method of claim 1, wherein the step of determining a resolution corresponding to the scene complexity of the video image comprises:

and determining the resolution corresponding to the scene complexity of the video image according to the pre-established corresponding relation between the scene complexity and the resolution of the video image.

3. The method of claim 1, further comprising, prior to the step of receiving a video image to be encoded: initializing a plurality of encoding header information, wherein the resolutions contained in different encoding header information are different;

the step of switching the resolution of the current encoder to a resolution corresponding to the scene complexity of the video image by switching encoding header information of the encoder includes:

and switching the coding header information of the current encoder into the coding header information which has the resolution consistent with the resolution corresponding to the scene complexity of the video image and is contained in the coding header information.

4. The method of claim 1, further comprising:

and transmitting the coded video image and the corresponding coding header information to a decoder, so that the decoder decodes the video image according to the resolution in the corresponding coding header information.

5. An apparatus for video image encoding, the apparatus comprising:

the receiving module is used for receiving a video image to be coded;

the coding module is used for coding the video image according to the switched resolution;

the encoding module includes:

the sampling module is used for performing corresponding up-sampling or down-sampling on the reference frame according to the switched resolution, wherein if the switched resolution is higher than the resolution before switching, the up-sampling is performed on the reference frame, and if the switched resolution is lower than the resolution before switching, the down-sampling is performed on the reference frame;

and the video image coding module is used for coding the video image according to the sampled reference frame.

6. The apparatus of claim 5, wherein the determining module is further configured to determine a resolution corresponding to the scene complexity of the video image according to a pre-established correspondence between the scene complexity and the resolution of the video image.

7. The apparatus of claim 5, further comprising:

the device comprises an initialization module, a decoding module and a decoding module, wherein the initialization module is used for initializing a plurality of encoding header information, and different encoding header information comprises different resolutions;

the switching module is further configured to switch, when the resolution of the current encoder needs to be adjusted, the encoding header information of the current encoder to encoding header information that includes resolution that is consistent with resolution corresponding to scene complexity of the video image.

8. The apparatus of claim 5, further comprising:

and the transmission module is used for transmitting the coded video image and the corresponding coding header information to a decoder so that the decoder decodes the video image according to the resolution in the corresponding coding header information.

9. A storage medium on which a computer program is stored which, when being executed by a processor, is adapted to carry out the method of video image encoding according to any one of claims 1 to 4.

10. Terminal device for video image encoding, comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, which when executing the program implements a method for video image encoding as claimed in any of claims 1 to 4.