CN112468808A - I frame target bandwidth allocation method and device based on reinforcement learning - Google Patents

I frame target bandwidth allocation method and device based on reinforcement learning Download PDF

Info

Publication number
CN112468808A
CN112468808A CN202011354798.1A CN202011354798A CN112468808A CN 112468808 A CN112468808 A CN 112468808A CN 202011354798 A CN202011354798 A CN 202011354798A CN 112468808 A CN112468808 A CN 112468808A
Authority
CN
China
Prior art keywords
frame
target bandwidth
current
reinforcement learning
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011354798.1A
Other languages
Chinese (zh)
Other versions
CN112468808B (en
Inventor
王妙辉
黄丽蓉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202011354798.1A priority Critical patent/CN112468808B/en
Publication of CN112468808A publication Critical patent/CN112468808A/en
Application granted granted Critical
Publication of CN112468808B publication Critical patent/CN112468808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/115Selection of the code volume for a coding unit prior to coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/156Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention provides an I frame target bandwidth allocation method and device based on reinforcement learning, comprising the following steps: s1, inputting the video sequence into the HM coding system; s2, after the HM coding system allocates the target bandwidth to the GOP, calling the reinforcement learning neural network to allocate the target bandwidth to the current I frame; s3, the HM coding system uses the allocated target bandwidth to code the current I frame data, continuously codes the rest frames in the GOP to obtain the finished GOP data, and inputs the finished GOP data into a buffer area; and S4, judging whether the video sequence is coded or not, if not, acquiring the next GOP data, and returning to S2. The invention has the beneficial effects that: the method can select the optimal target bandwidth for the current video sequence by continuously perceiving the environment state, and helps to obtain better video quality and smaller code rate error.

Description

I frame target bandwidth allocation method and device based on reinforcement learning
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a method and an apparatus for allocating I-frame target bandwidths based on reinforcement learning.
Background
The goal of rate control algorithms is to provide a high quality compressed sequence at a particular bandwidth or storage, which is crucial for maintaining the quality of video applications, especially for systems with high real-time requirements. In video coding, balancing the code rate and distortion of video frames is a key issue for rate control. In the prior art, a mathematical model is established through experimental data and research experience, so that bandwidth allocation, quantization and parameter adjustment are performed.
The rate control algorithm of h.265/HEVC still employs the traditional two-step approach — target bandwidth allocation and quantization parameter determination. The key of the image-level target bandwidth allocation is that the interdependence relation among video frame rate distortions is considered, and the allocated bandwidth weight is closely related to a target code rate, video content characteristics and a time domain prediction structure.
In HEVC, target bandwidth allocation is divided into GOP level, picture level and CTU level, wherein there are I, P, B video frame types in GOP level, I frame is the first frame of each GOP, and is an independent frame with all information, while P frame and B frame need to be predicted by relying on other frames. When there is a drastic change in motion and a fast scene change in a video sequence, the inter-frame correlation between two I frames is significantly reduced, and thus more bandwidth needs to be consumed for encoding. The existing image-level target bandwidth allocation strategy allocates weights to images according to target code rates, content characteristics and time domain prediction structures, and has no targeted design for the above conditions, so that effective processing cannot be guaranteed. The reinforcement learning-based method can optimize the target bandwidth allocation process from end to end, and further improve the performance. Therefore, we adopt a reinforcement learning mode to hope to obtain a more reasonable I-frame target bandwidth allocation strategy.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the defects of the prior art, an I frame target bandwidth allocation method and device based on reinforcement learning are provided, and the purpose is to optimize the target bandwidth allocation of an image level in the code rate control process, thereby reducing distortion and improving the video quality.
In order to solve the technical problems, the invention adopts the technical scheme that: an I-frame target bandwidth allocation method based on reinforcement learning comprises the following steps:
s1, inputting the video sequence into the HM coding system;
s2, after the HM coding system allocates the target bandwidth to the GOP, calling the reinforcement learning neural network to allocate the target bandwidth to the current I frame;
s3, the HM coding system uses the allocated target bandwidth to code the current I frame data, continuously codes the rest frames in the GOP to obtain the finished GOP data, and inputs the finished GOP data into a buffer area;
and S4, judging whether the video sequence is coded or not, if not, acquiring the next GOP data, and returning to S2.
Further, before step S2, the method further includes establishing a training model:
s21, selecting at least two videos with resolution difference, at least two videos with content difference and at least two videos with duration difference, performing bandwidth allocation and quantization parameter selection on the encoding process of the H.265/HEVC video according to the HM encoding system, and recording the encoding information of each video;
and S22, inputting the coded information into a reinforcement learning neural network for reinforcement learning.
Further, in step S22, A2C neural network is used for reinforcement learning.
Further, after step S21, the method further includes obtaining the supplemental encoding information:
s211, obtaining the texture features of the current frame I through a multi-scale Gaussian difference fusion calculation formula, wherein the multi-scale Gaussian difference fusion formula is as follows:
Figure BDA0002802258390000021
where (x, y) is the spatial coordinate, σ determines the smoothness of the image, σ1=0.54,σ2=0.87,σ3=1.19,
w is the weight of the gaussian difference term, w is 0.284,
a and b are parameters of the Gaussian difference, wherein a is 0.75, and b is 0.66;
s212, according to the sigma1Generating a two-dimensional Gaussian distributionMatrix, the calculation formula is:
Figure BDA0002802258390000031
where x and y are the dimensions of the Gaussian kernel, w1,w2,w3Three parameters related to the visual characteristics of human eyes are respectively w1=0.536,w2=0.277,w3=0.187;
By calculating a pixel gradient matrix GxyAcquiring the edge characteristics of the current I frame, wherein the calculation formula of the pixel gradient matrix is as follows:
Figure BDA0002802258390000032
the image matrix coordinate system comprises a gray image matrix, a Sobel operator, a coordinate system, an image matrix coordinate system and a coordinate system, wherein I is a gray image matrix, S is a Sobel operator, c is 2, the origin of the image matrix coordinate system is located at the upper left corner, the positive x direction is from left to right, and the positive y direction is from top to bottom;
s213, obtaining the color feature of the current I frame through a color feature extraction formula, wherein the color feature extraction formula is as follows:
Figure BDA0002802258390000033
wherein h isi,jRepresenting the probability of the occurrence of a pixel with a gray value of j in the ith color channel component, n representing the number of image gray levels, and d being 1.33;
s214, packaging the texture feature, the edge feature and the color feature of the current I frame into the supplementary coding information of the current I frame, and inputting the supplementary coding information into the reinforcement learning neural network for reinforcement learning.
Further, after step S2, the method further includes, in combination with the distortion degree after encoding the current frame and the distortion degree history information of the encoded frame, making an evaluation on the I-frame target bandwidth allocated by the mobile network using a reward calculation formula, where the reward calculation formula for evaluating bandwidth allocation is:
Figure BDA0002802258390000034
where i is the frame number, N represents the number of encoded frames, QiPSNR value, a2, B representing an imageiDenotes the sliding window size, RiRepresents the number of encoding bandwidths, and lambda is the Lagrangian optimization factor value.
The invention also relates to an I frame target bandwidth allocation device based on reinforcement learning, which comprises a transmission module, an allocation module, a calling module, a coding module and a judgment module,
the transmission module is used for inputting the video sequence into the HM coding system;
the distribution module is used for distributing target bandwidth for GOP;
the calling module is used for calling the reinforcement learning neural network to allocate a target bandwidth for the current I frame;
the encoding module is used for using the allocated target bandwidth for encoding the current I frame data and continuously encoding the rest frames in the GOP to obtain the finished GOP data;
the transmission module is also used for inputting the completed GOP data into a buffer area;
the judging module is used for judging whether the video sequence is coded.
The system further comprises a learning module, wherein the learning module is used for selecting at least two videos with resolution difference, at least two videos with content difference and at least two videos with duration difference, performing bandwidth allocation and quantization parameter selection on the encoding process of the H.265/HEVC video according to an HM (high efficiency video) encoding system, recording the encoding information of each video, and inputting the encoding information to a reinforcement learning neural network for reinforcement learning.
Further, the learning module is also used for performing reinforcement learning by using an A2C neural network.
Further, the method further includes an obtaining module, where the obtaining module is configured to obtain supplementary coding information, where the supplementary coding information includes texture features, edge features, and color features of the current I frame, specifically:
acquiring the texture characteristics of the current frame I through a multi-scale Gaussian difference fusion calculation formula, wherein the multi-scale Gaussian difference fusion formula is as follows:
Figure BDA0002802258390000041
where (x, y) are spatial coordinates and the size of σ determines the degree of smoothness of the image, i.e. the size of the σ value for image profile and detail features, σ1=0.54,σ2=0.87,σ3=1.19,
w is the weight of the gaussian difference term, w is 0.284,
a and b are parameters of the Gaussian difference, wherein a is 0.75, and b is 0.66;
according to σ1Generating a two-dimensional Gaussian distribution matrix, wherein the calculation formula is as follows:
Figure BDA0002802258390000042
where x and y are the dimensions of the Gaussian kernel, w1,w2,w3Three parameters related to the visual characteristics of human eyes are respectively w1=0.536,w2=0.277,w3=0.187;
By calculating a pixel gradient matrix GxyAcquiring the edge characteristics of the current I frame, wherein the calculation formula of the pixel gradient matrix is as follows:
Figure BDA0002802258390000051
the image matrix coordinate system comprises a gray image matrix, a Sobel operator, a coordinate system, an image matrix coordinate system and a coordinate system, wherein I is a gray image matrix, S is a Sobel operator, c is 2, the origin of the image matrix coordinate system is located at the upper left corner, the positive x direction is from left to right, and the positive y direction is from top to bottom;
obtaining the color feature of the current I frame through a color feature extraction formula, wherein the color feature extraction formula is as follows:
Figure BDA0002802258390000052
wherein h isi,jRepresenting the probability of the occurrence of a pixel with a gray value of j in the ith color channel component, n representing the number of image gray levels, and d being 1.33;
the acquisition module packs texture features, edge features and color features of the current I frame into the complementary coding information of the current I frame.
Further, the learning module is further configured to combine the distortion degree after the current frame is coded and the distortion degree history information of the coded frame, and evaluate an I-frame target bandwidth allocated to the mobile network by using an incentive calculation formula, where the incentive calculation formula for evaluating bandwidth allocation is:
Figure BDA0002802258390000053
where i is the frame number, N represents the number of encoded frames, QiPSNR value, a2, B representing an imageiDenotes the sliding window size, RiRepresents the number of encoding bandwidths, and lambda is the Lagrangian optimization factor value.
The invention has the beneficial effects that: the method can select the optimal target bandwidth for the current video sequence by continuously perceiving the environment state, and helps to obtain better video quality and smaller code rate error.
Drawings
The specific process and structure of the present invention are detailed below with reference to the accompanying drawings:
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic diagram of a reinforcement learning neural network structure according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the description of the invention relating to "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying any relative importance or implicit indication of the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Referring to fig. 1, a method for allocating I-frame target bandwidth based on reinforcement learning includes:
s1, inputting the video sequence into the HM coding system;
s2, after the HM coding system allocates the target bandwidth to the GOP, calling the reinforcement learning neural network to allocate the target bandwidth to the current I frame;
in order to enable the reinforcement learning neural network to have the capability of primarily allocating target bandwidth, a training model needs to be established for the reinforcement learning neural network:
s21, selecting at least two videos with resolution difference, at least two videos with content difference and at least two videos with duration difference, performing bandwidth allocation and quantization parameter selection on the encoding flow of the H.265/HEVC video according to the HM encoding system, recording the encoding information of each video,
in this embodiment, there are 5 kinds of selected video resolutions, which are respectively: 352 × 288, 720 × 480, 1280 × 720, 1920 × 1080, 3840 × 2160;
the selected video content features are 3, respectively: simple background, small picture color change, simple foreground texture and outline, and uniform and flat motion; the background is complex, the picture color is rich, the foreground has the textures and contours of various objects, and the object rotates when moving slowly; the background is complex, the color of the picture is complicated, the details of the texture and the outline are numerous, and the scene switching is violent in motion or quicker;
there are 3 kinds of selected video duration differences, which are respectively: within 10 seconds; 10-30 seconds; 30-60 seconds;
according to the video difference, at least 10 videos of each feature are selected as training data, and 2 videos are selected as testing data. Thus, there were 450 training data and 90 test data. The training data set is encoded using the same quantization parameter for each frame, ranging from 20 to 44 (the quantization parameter is an integer), and the actual encoded information is recorded.
In order to better embody the inter-frame relevance of the I frame, the content features of the current I frame can be extracted, wherein the content features comprise texture features, contour features and color features, and are used as supplementary coding information of the training set.
Acquiring the texture characteristics of the current frame I through a multi-scale Gaussian difference fusion calculation formula, wherein the multi-scale Gaussian difference fusion formula is as follows:
Figure BDA0002802258390000071
where (x, y) are spatial coordinates and the size of σ determines the degree of smoothness of the image, i.e. the size of the σ value for image profile and detail features, σ1=0.54,σ2=0.87,σ3=1.19,
w is the weight of the gaussian difference term, w is 0.284,
a and b are parameters of the Gaussian difference, wherein a is 0.75, and b is 0.66;
s212, according to the sigma1Generating a two-dimensional Gaussian distribution matrix, wherein the calculation formula is as follows:
Figure BDA0002802258390000072
wherein x and y are dimensions of the Gaussian kernelDegree, w1,w2,w3Three parameters related to the visual characteristics of human eyes are respectively w1=0.536,w2=0.277,w3=0.187;
By calculating a pixel gradient matrix GxyAcquiring the edge characteristics of the current I frame, wherein the calculation formula of the pixel gradient matrix is as follows:
Figure BDA0002802258390000073
the image matrix coordinate system comprises a gray image matrix, a Sobel operator, a coordinate system, an image matrix coordinate system and a coordinate system, wherein I is a gray image matrix, S is a Sobel operator, c is 2, the origin of the image matrix coordinate system is located at the upper left corner, the positive x direction is from left to right, and the positive y direction is from top to bottom;
obtaining the color feature of the current I frame through a color feature extraction formula, wherein the color feature extraction formula is as follows:
Figure BDA0002802258390000081
wherein h isi,jRepresenting the probability of the occurrence of a pixel with a gray value of j in the ith color channel component, n representing the number of image gray levels, and d being 1.33;
and packing the texture feature, the edge feature and the color feature of the current I frame into the complementary coding information of the current I frame.
And S22, inputting the coding information and the supplementary coding information into the reinforcement learning neural network for reinforcement learning.
The A2C neural network is used for reinforcement learning, and the reinforcement learning neural network comprises a mobile network and an evaluation network, and the network structure of the network is shown in FIG. 2.
The action network is used for inputting a target bandwidth of a GOP in which the current I frame is positioned, texture features, contour features and color features of the current I frame, and texture features, contour features and color features of the last I frame.
The mobile network output is the target bandwidth of the current I frame.
In the action network, the reinforcement learning neural network can intelligently combine information such as historical coding information, the correlation degree of characteristics of a current I frame and a previous I frame, the target bandwidth of a current GOP (group of pictures), the target bandwidth of a frame layer and the like to decide the target bandwidth of the current I frame.
In order to evaluate the target bandwidth of the current I frame output by the mobile network, the evaluation network of the reinforcement learning neural network is used for inputting the target bandwidth of the current I frame and outputting the target bandwidth as the evaluation value of the mobile network.
In the evaluation network, the reinforcement learning neural network can intelligently combine the distortion degree after the current frame is coded and the historical information of the distortion degree of the coded frame, and a reward calculation formula for evaluating bandwidth allocation is adopted to evaluate the I frame target bandwidth allocated by the action network. Meanwhile, the evaluation network can carry out back propagation on the calculation gradient and update the network parameters.
The reward calculation formula for evaluating bandwidth allocation is as follows:
Figure BDA0002802258390000082
where i is the frame number, N represents the number of encoded frames, QiPSNR value, a2, B representing an imageiDenotes the sliding window size, RiRepresents the number of encoding bandwidths, and lambda is the Lagrangian optimization factor value.
And (3) carrying out feature sampling on the video I frame in the data set according to the feature extraction method, and inputting the features and required information into a mobile network. When the evaluation network evaluates, the coding information in the data set is used as a part of the historical information for rate distortion performance evaluation, and continuous learning reinforcement of the reinforcement learning neural network can be realized.
S3, using the I frame target bandwidth output by the reinforcement learning neural network in subsequent target bandwidth allocation and quantization parameter decision, encoding the current I frame data, continuously encoding the rest frames in the GOP to obtain finished GOP data, and inputting the finished GOP data into a buffer area;
and S4, judging whether the video sequence is coded or not, if not, acquiring next GOP data, returning to S2, and circulating the steps until the coding of the whole video sequence is finished.
From the above description, the beneficial effects of the present invention are: the method can be used for extracting texture features, contour features and color features of the I frame, correlating content features of a video foreground, analyzing the image complexity of the I frame, assisting in more accurate bandwidth allocation, continuously sensing an environment state, selecting an optimal target bandwidth for a current video sequence and helping to obtain better video quality and smaller code rate errors.
The invention also relates to an I frame target bandwidth allocation device based on reinforcement learning, which comprises a transmission module, an allocation module, a calling module, a coding module and a judgment module,
the transmission module is used for inputting the video sequence into the HM coding system;
the distribution module is used for distributing target bandwidth for GOP;
the calling module is used for calling the reinforcement learning neural network to allocate a target bandwidth for the current I frame;
the encoding module is used for using the allocated target bandwidth for encoding the current I frame data and continuously encoding the rest frames in the GOP to obtain the finished GOP data;
the transmission module is also used for inputting the completed GOP data into a buffer area;
the judging module is used for judging whether the video sequence is coded.
In order to enable the reinforcement learning neural network to have the initial target bandwidth allocation capacity, the reinforcement learning neural network further comprises a learning module, wherein the learning module is used for selecting at least two videos with resolution difference, at least two videos with content difference and at least two videos with duration difference, performing bandwidth allocation and quantization parameter selection on the encoding process of the H.265/HEVC video according to an HM encoding system, recording the encoding information of each video, and inputting the encoding information to the reinforcement learning neural network for reinforcement learning.
In this embodiment, there are 5 kinds of selected video resolutions, which are respectively: 352 × 288, 720 × 480, 1280 × 720, 1920 × 1080, 3840 × 2160;
the selected video content features are 3, respectively: simple background, small picture color change, simple foreground texture and outline, and uniform and flat motion; the background is complex, the picture color is rich, the foreground has the textures and contours of various objects, and the object rotates when moving slowly; the background is complex, the color of the picture is complicated, the details of the texture and the outline are numerous, and the scene switching is violent in motion or quicker;
there are 3 kinds of selected video duration differences, which are respectively: within 10 seconds; 10-30 seconds; 30-60 seconds;
according to the video difference, at least 10 videos of each feature are selected as training data, and 2 videos are selected as testing data. Thus, there were 450 training data and 90 test data. The training data set is encoded using the same quantization parameter for each frame, ranging from 20 to 44 (the quantization parameter is an integer), and the actual encoded information is recorded.
In order to better embody the inter-frame relevance of the I frame, the method further includes an obtaining module, where the obtaining module may extract content features of the current I frame, where the content features include texture features, contour features, and color features, and serve as supplementary encoding information of a training set, specifically:
acquiring the texture characteristics of the current frame I through a multi-scale Gaussian difference fusion calculation formula, wherein the multi-scale Gaussian difference fusion formula is as follows:
Figure BDA0002802258390000101
where (x, y) are spatial coordinates and the size of σ determines the degree of smoothness of the image, i.e. the size of the σ value for image profile and detail features, σ1=0.54,σ2=0.87,σ3=1.19,
w is the weight of the gaussian difference term, w is 0.284,
a and b are parameters of the Gaussian difference, wherein a is 0.75, and b is 0.66;
according to σ1Generating a two-dimensional Gaussian distribution matrix, wherein the calculation formula is as follows:
Figure BDA0002802258390000102
where x and y are the dimensions of the Gaussian kernel, w1,w2,w3Three parameters related to the visual characteristics of human eyes are respectively w1=0.536,w2=0.277,w3=0.187;
By calculating a pixel gradient matrix GxyAcquiring the edge characteristics of the current I frame, wherein the calculation formula of the pixel gradient matrix is as follows:
Figure BDA0002802258390000111
the image matrix coordinate system comprises a gray image matrix, a Sobel operator, a coordinate system, an image matrix coordinate system and a coordinate system, wherein I is a gray image matrix, S is a Sobel operator, c is 2, the origin of the image matrix coordinate system is located at the upper left corner, the positive x direction is from left to right, and the positive y direction is from top to bottom;
obtaining the color feature of the current I frame through a color feature extraction formula, wherein the color feature extraction formula is as follows:
Figure BDA0002802258390000112
wherein h isi,jRepresenting the probability of the occurrence of a pixel with a gray value of j in the ith color channel component, n representing the number of image gray levels, and d being 1.33;
and finally, the acquisition module packs the texture feature, the edge feature and the color feature of the current I frame into the complementary coding information of the current I frame.
In order to ensure the learning effect of the reinforcement learning neural network, the learning module adopts the A2C neural network to carry out reinforcement learning.
Wherein, the reinforcement learning neural network comprises a mobile network and an evaluation network.
The action network is used for inputting a target bandwidth of a GOP in which the current I frame is positioned, texture features, contour features and color features of the current I frame, and texture features, contour features and color features of the last I frame.
The mobile network output is the target bandwidth of the current I frame.
In the action network, the reinforcement learning neural network can intelligently combine information such as historical coding information, the correlation degree of characteristics of a current I frame and a previous I frame, the target bandwidth of a current GOP (group of pictures), the target bandwidth of a frame layer and the like to decide the target bandwidth of the current I frame.
In order to evaluate the target bandwidth of the current I frame output by the mobile network, the learning module is further configured to evaluate the I frame target bandwidth allocated by the mobile network by using a reward calculation formula for evaluating bandwidth allocation in combination with the distortion degree after the current frame is encoded and the distortion degree history information of the encoded frame, and the evaluation network can perform backward propagation on the calculation gradient and update the network parameters.
The reward calculation formula for evaluating bandwidth allocation is as follows:
Figure BDA0002802258390000121
where i is the frame number, N represents the number of encoded frames, QiPSNR value, a2, B representing an imageiDenotes the sliding window size, RiRepresents the number of encoding bandwidths, and lambda is the Lagrangian optimization factor value.
And (3) carrying out feature sampling on the video I frame in the data set according to the feature extraction method, and inputting the features and required information into a mobile network. When the evaluation network evaluates, the coding information in the data set is used as a part of the historical information for rate distortion performance evaluation, and continuous learning reinforcement of the reinforcement learning neural network can be realized.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. An I-frame target bandwidth allocation method based on reinforcement learning comprises the following steps:
s1, inputting the video sequence into the HM coding system;
s2, after the HM coding system allocates the target bandwidth to the GOP, calling the reinforcement learning neural network to allocate the target bandwidth to the current I frame;
s3, the HM coding system uses the allocated target bandwidth to code the current I frame data, continuously codes the rest frames in the GOP to obtain the finished GOP data, and inputs the finished GOP data into a buffer area;
and S4, judging whether the video sequence is coded or not, if not, acquiring the next GOP data, and returning to S2.
2. The reinforcement learning-based I-frame target bandwidth allocation method of claim 1, wherein: before step S2, the method further includes establishing a training model:
s21, selecting at least two videos with resolution difference, at least two videos with content difference and at least two videos with duration difference, performing bandwidth allocation and quantization parameter selection on the encoding process of the H.265/HEVC video according to the HM encoding system, and recording the encoding information of each video;
and S22, inputting the coded information into a reinforcement learning neural network for reinforcement learning.
3. The reinforcement learning-based I-frame target bandwidth allocation method of claim 2, wherein: in step S22, reinforcement learning is performed using the A2C neural network.
4. The reinforcement learning-based I-frame target bandwidth allocation method according to claim 3, wherein: in step S21, the method further includes obtaining the supplemental encoding information:
s211, obtaining the texture features of the current frame I through a multi-scale Gaussian difference fusion calculation formula, wherein the multi-scale Gaussian difference fusion formula is as follows:
Figure FDA0002802258380000011
where (x, y) is the spatial coordinate, σ determines the smoothness of the image, σ1=0.54,σ2=0.87,σ3=1.19,
w is the weight of the gaussian difference term, w is 0.284,
a and b are parameters of the Gaussian difference, wherein a is 0.75, and b is 0.66;
s212, according to the sigma1Generating a two-dimensional Gaussian distribution matrix, wherein the calculation formula is as follows:
Figure FDA0002802258380000012
where x and y are the dimensions of the Gaussian kernel, w1,w2,w3Three parameters related to the visual characteristics of human eyes are respectively w1=0.536,w2=0.277,w3=0.187;
By calculating a pixel gradient matrix GxyAcquiring the edge characteristics of the current I frame, wherein the calculation formula of the pixel gradient matrix is as follows:
Figure FDA0002802258380000021
the image matrix coordinate system comprises a gray image matrix, a Sobel operator, a coordinate system, an image matrix coordinate system and a coordinate system, wherein I is a gray image matrix, S is a Sobel operator, c is 2, the origin of the image matrix coordinate system is located at the upper left corner, the positive x direction is from left to right, and the positive y direction is from top to bottom;
s213, obtaining the color feature of the current I frame through a color feature extraction formula, wherein the color feature extraction formula is as follows:
Figure FDA0002802258380000022
wherein h isi,jRepresenting the probability of the occurrence of a pixel with a gray value of j in the ith color channel component, n representing the number of image gray levels, and d being 1.33;
s214, packaging the texture feature, the edge feature and the color feature of the current I frame into the supplementary coding information of the current I frame, and inputting the supplementary coding information into the reinforcement learning neural network for reinforcement learning.
5. The reinforcement learning-based I-frame target bandwidth allocation method according to claim 4, wherein: after step S2, the method further includes, in combination with the distortion degree after encoding the current frame and the distortion degree history information of the encoded frame, making an evaluation on the I-frame target bandwidth allocated by the mobile network using a reward calculation formula, where the reward calculation formula for evaluating bandwidth allocation is:
Figure FDA0002802258380000023
where i is the frame number, N represents the number of encoded frames, QiPSNR value, a2, B representing an imageiDenotes the sliding window size, RiRepresents the number of encoding bandwidths, and lambda is the Lagrangian optimization factor value.
6. An I-frame target bandwidth allocation apparatus based on reinforcement learning, characterized in that: comprises a transmission module, a distribution module, a calling module, a coding module and a judgment module,
the transmission module is used for inputting the video sequence into the HM coding system;
the distribution module is used for distributing target bandwidth for GOP;
the calling module is used for calling the reinforcement learning neural network to allocate a target bandwidth for the current I frame;
the encoding module is used for using the allocated target bandwidth for encoding the current I frame data and continuously encoding the rest frames in the GOP to obtain the finished GOP data;
the transmission module is also used for inputting the completed GOP data into a buffer area;
the judging module is used for judging whether the video sequence is coded.
7. The reinforcement learning-based I-frame target bandwidth allocation apparatus of claim 6, wherein: the video enhancement learning system further comprises a learning module, wherein the learning module is used for selecting at least two videos with resolution difference, at least two videos with content difference and at least two videos with duration difference, carrying out bandwidth allocation and quantization parameter selection on the encoding process of the H.265/HEVC videos according to the HM encoding system, recording the encoding information of each video, and inputting the encoding information to the enhancement learning neural network for enhancement learning.
8. The reinforcement learning-based I-frame target bandwidth allocation apparatus of claim 7, wherein: the learning module is also used for performing reinforcement learning by adopting an A2C neural network.
9. The reinforcement learning-based I-frame target bandwidth allocation apparatus of claim 8, wherein: the system further comprises an obtaining module, wherein the obtaining module is configured to obtain supplementary encoding information, and the supplementary encoding information includes texture features, edge features, and color features of the current I frame, specifically:
acquiring the texture characteristics of the current frame I through a multi-scale Gaussian difference fusion calculation formula, wherein the multi-scale Gaussian difference fusion formula is as follows:
Figure FDA0002802258380000031
where (x, y) is the spatial coordinate, σ determines the smoothness of the image, σ1=0.54,σ2=0.87,σ3=1.19,
w is the weight of the gaussian difference term, w is 0.284,
a and b are parameters of the Gaussian difference, wherein a is 0.75, and b is 0.66;
according to σ1Generating a two-dimensional Gaussian distribution momentThe calculation formula of the array is as follows:
Figure FDA0002802258380000032
where x and y are the dimensions of the Gaussian kernel, w1,w2,w3Three parameters related to the visual characteristics of human eyes are respectively w1=0.536,w2=0.277,w3=0.187;
By calculating a pixel gradient matrix GxyAcquiring the edge characteristics of the current I frame, wherein the calculation formula of the pixel gradient matrix is as follows:
Figure FDA0002802258380000041
the image matrix coordinate system comprises a gray image matrix, a Sobel operator, a coordinate system, an image matrix coordinate system and a coordinate system, wherein I is a gray image matrix, S is a Sobel operator, c is 2, the origin of the image matrix coordinate system is located at the upper left corner, the positive x direction is from left to right, and the positive y direction is from top to bottom;
obtaining the color feature of the current I frame through a color feature extraction formula, wherein the color feature extraction formula is as follows:
Figure FDA0002802258380000042
wherein h isi,jRepresenting the probability of the occurrence of a pixel with a gray value of j in the ith color channel component, n representing the number of image gray levels, and d being 1.33;
the acquisition module packs texture features, edge features and color features of the current I frame into the complementary coding information of the current I frame.
10. The reinforcement learning-based I-frame target bandwidth allocation apparatus of claim 9, wherein: the learning module is further configured to evaluate an I-frame target bandwidth allocated to the mobile network by using a reward calculation formula in combination with the distortion degree after the current frame is encoded and the distortion degree history information of the encoded frame, where the reward calculation formula for evaluating bandwidth allocation is:
Figure FDA0002802258380000043
where i is the frame number, N represents the number of encoded frames, QiPSNR value, a2, B representing an imageiDenotes the sliding window size, RiRepresents the number of encoding bandwidths, and lambda is the Lagrangian optimization factor value.
CN202011354798.1A 2020-11-26 2020-11-26 I frame target bandwidth allocation method and device based on reinforcement learning Active CN112468808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011354798.1A CN112468808B (en) 2020-11-26 2020-11-26 I frame target bandwidth allocation method and device based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011354798.1A CN112468808B (en) 2020-11-26 2020-11-26 I frame target bandwidth allocation method and device based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112468808A true CN112468808A (en) 2021-03-09
CN112468808B CN112468808B (en) 2022-08-12

Family

ID=74809592

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011354798.1A Active CN112468808B (en) 2020-11-26 2020-11-26 I frame target bandwidth allocation method and device based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112468808B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116208788A (en) * 2023-05-04 2023-06-02 海马云(天津)信息技术有限公司 Method and device for providing network application service, server equipment and storage medium
CN117196999A (en) * 2023-11-06 2023-12-08 浙江芯劢微电子股份有限公司 Self-adaptive video stream image edge enhancement method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109743778A (en) * 2019-01-14 2019-05-10 长沙学院 A kind of resource allocation optimization method and system based on intensified learning
CN111031387A (en) * 2019-11-21 2020-04-17 南京大学 Method for controlling video coding flow rate of monitoring video sending end
CN111294595A (en) * 2020-02-04 2020-06-16 清华大学深圳国际研究生院 Video coding intra-frame code rate control method based on deep reinforcement learning
CN111405327A (en) * 2020-04-03 2020-07-10 广州市百果园信息技术有限公司 Network bandwidth prediction model training method, video data playing method and device
US20200344472A1 (en) * 2019-04-23 2020-10-29 National Chiao Tung University Reinforcement learning method for video encoder

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109743778A (en) * 2019-01-14 2019-05-10 长沙学院 A kind of resource allocation optimization method and system based on intensified learning
US20200344472A1 (en) * 2019-04-23 2020-10-29 National Chiao Tung University Reinforcement learning method for video encoder
CN111031387A (en) * 2019-11-21 2020-04-17 南京大学 Method for controlling video coding flow rate of monitoring video sending end
CN111294595A (en) * 2020-02-04 2020-06-16 清华大学深圳国际研究生院 Video coding intra-frame code rate control method based on deep reinforcement learning
CN111405327A (en) * 2020-04-03 2020-07-10 广州市百果园信息技术有限公司 Network bandwidth prediction model training method, video data playing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MINGLIANG ZHOU 等: "Rate Control Method Based on Deep Reinforcement Learning for Dynamic Video Sequences in HEVC", 《IEEE TRANSACTIONS ON MULTIMEDIA》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116208788A (en) * 2023-05-04 2023-06-02 海马云(天津)信息技术有限公司 Method and device for providing network application service, server equipment and storage medium
CN116208788B (en) * 2023-05-04 2023-07-21 海马云(天津)信息技术有限公司 Method and device for providing network application service, server equipment and storage medium
CN117196999A (en) * 2023-11-06 2023-12-08 浙江芯劢微电子股份有限公司 Self-adaptive video stream image edge enhancement method and system
CN117196999B (en) * 2023-11-06 2024-03-12 浙江芯劢微电子股份有限公司 Self-adaptive video stream image edge enhancement method and system

Also Published As

Publication number Publication date
CN112468808B (en) 2022-08-12

Similar Documents

Publication Publication Date Title
CN111432207B (en) Perceptual high-definition video coding method based on salient target detection and salient guidance
Tang Spatiotemporal visual considerations for video coding
CN110087087B (en) VVC inter-frame coding unit prediction mode early decision and block division early termination method
US20200329233A1 (en) Hyperdata Compression: Accelerating Encoding for Improved Communication, Distribution & Delivery of Personalized Content
CN112399176B (en) Video coding method and device, computer equipment and storage medium
CN108495135B (en) Quick coding method for screen content video coding
CN103188493B (en) Image encoding apparatus and image encoding method
CN110751649B (en) Video quality evaluation method and device, electronic equipment and storage medium
CN112468808B (en) I frame target bandwidth allocation method and device based on reinforcement learning
CN108063944B (en) Perception code rate control method based on visual saliency
CN107371022B (en) Inter-frame coding unit rapid dividing method applied to HEVC medical image lossless coding
CN101710993A (en) Block-based self-adaptive super-resolution video processing method and system
CN104539962A (en) Layered video coding method fused with visual perception features
CN111083477B (en) HEVC (high efficiency video coding) optimization algorithm based on visual saliency
CN108347612A (en) A kind of monitored video compression and reconstructing method of view-based access control model attention mechanism
Liu et al. End-to-end neural video coding using a compound spatiotemporal representation
CN112291562A (en) Fast CU partition and intra mode decision method for H.266/VVC
CN111447452B (en) Data coding method and system
CN108513132B (en) Video quality evaluation method and device
CN115941943A (en) HEVC video coding method
Wang et al. Perceptually quasi-lossless compression of screen content data via visibility modeling and deep forecasting
Wang et al. Semantic-aware video compression for automotive cameras
CN106686383A (en) Depth map intra-frame coding method capable of preserving edge of depth map
EP1802127B1 (en) Method for performing motion estimation
CN111723735B (en) Pseudo high bit rate HEVC video detection method based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant