CN110381313B

CN110381313B - Video compression sensing reconstruction method based on LSTM network and image group quality blind evaluation

Info

Publication number: CN110381313B
Application number: CN201910610758.XA
Authority: CN
Inventors: 刘浩; 魏冬; 周健; 田伟; 陈根龙; 黄荣; 孙韶媛; 李德敏; 周武能; 魏国林; 廖荣生; 黄震
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2021-08-31
Anticipated expiration: 2039-07-08
Also published as: CN110381313A

Abstract

The invention relates to a video compression sensing reconstruction method based on LSTM network and image group quality blind evaluation, wherein a reconstruction end receives a frame observation vector code stream, continuous image group observation vectors are formed by combination, multi-frame joint iterative reconstruction based on the LSTM network is executed on each image group observation vector, corresponding reconstructed image groups are obtained, final reconstructed frames are output one by one, and whether a parameter set of the LSTM network is updated or not is determined according to the continuous condition that the iteration times reach the maximum value. The method can combine sparse prior modeling with a data driving mechanism, and is favorable for improving the quality of the reconstructed video.

Description

Video compression sensing reconstruction method based on LSTM network and image group quality blind evaluation

Technical Field

The invention relates to the technical field of video compression sensing reconstruction, in particular to a video compression sensing reconstruction method based on an LSTM network and image group quality blind evaluation.

Background

The rise of the compressive sensing technology provides a novel signal acquisition and recovery mechanism, according to the compressive sensing theory, only the original signal needs to be projected onto a random basis to obtain a small number of measurement values, and signals with sparse or nearly sparse representation in a certain transform domain can be recovered through the measurement values. In compressed sensing video communication, a measuring end and a reconstruction end are extremely asymmetric, the measuring end is an information-physical fusion system and has the basic characteristics of limited physical and computing resources, cooperative signal acquisition and transmission and the like, and the reconstruction end with sufficient resources needs to recover an original signal without a feedback channel.

Video compression sensing generally adopts a communication architecture of 'independent measurement of each frame and multi-frame joint reconstruction', the computational complexity is transferred from a measurement end to a reconstruction end, and the extremely simple measurement end design is very suitable for a visual sensor with limited resources in a sensing network. The measuring terminal independently observes and encodes each frame image of the video by adopting the same observation matrix to generate continuous frame observation vectors, and sends out the continuous frame observation vectors as code streams. After receiving the code stream, the reconstruction end combines the code stream into a continuous image group observation vector, the multi-frame joint reconstruction utilizes different degrees of space-time redundant information, and the speed and the quality of video reconstruction are different.

The original image signal cannot be obtained in the reconstruction process of video compressed sensing, and the original image is difficult to refer to for the reconstruction performance evaluation. The blind evaluation of the video quality utilizes image samples in a typical image database to train, a blind evaluation model of video characteristic change is established through supervised pattern recognition and statistical regression, and the quality evaluation of a plurality of frames of images can be executed without original images. The Video-BLIINDS and VIIDEO proposed by Bovik et al are two typical Video quality blind evaluation criteria, wherein the Video-BLIINDS criteria is a frequency domain statistical model based on space-time natural scenes, and the VIIDEO criteria is a statistical model based on the distribution of frame differences between the front and the rear. Blind evaluation of the quality of the reconstructed image group can extract the self-characteristics of multi-frame images, and is helpful for recovering the structural information of a video signal.

The deep learning has shown promising performance in machine vision and image recovery tasks, and the compressed sensing deep learning can fully utilize the resources of a reconstruction terminal and better reconstruct a dynamically changing video signal. Long-short term memory (LSTM) networks perform attention model-based long-time sequence modeling, enabling the expression of more complex spatio-temporal information, and LSTM network-based deep learning mechanisms help to recover detailed information of video signals.

Disclosure of Invention

The invention aims to solve the technical problem of providing a video compression sensing reconstruction method based on the LSTM network and image group quality blind evaluation, which can combine sparse prior modeling with a data driving mechanism and is beneficial to improving the quality of a reconstructed video.

The technical scheme adopted by the invention for solving the technical problems is as follows: a video compression perception reconstruction method based on LSTM network and image group quality blind evaluation is provided, which comprises the following steps:

(1) a reconstruction end receives a frame observation vector code stream and combines the frame observation vector code stream to form a continuous image group observation vector;

(2) observation vector GMV Using 1 st image group₁Training a parameter set of the LSTM network by the reconstructed image group;

(3) observation vector GMV for nth image group_nExecuting multi-frame joint iterative reconstruction based on the LSTM network, wherein n is more than or equal to 2, and the stopping condition is that when the iteration number reaches the maximum value K or the residual error l₂Norm R_n,j||₂Less than threshold resMin or group of images blind quality Q^b _nIs higher than the threshold value qMax, thereby completing the recovery of the nth image group and reconstructing a frame F in the nth image group_nAs the final nth reconstructed frame; after the recovery of continuous alpha image groups is finished, if the final recovery of each image group is that the iteration number reaches the maximumIf the value K is stopped, entering the step (4); otherwise, subsequent multi-frame joint iterative reconstruction still employs the current parameter set of the LSTM network §^*And jumping to the step (5);

(4) the reconstruction end uses the nth image group to observe the vector GMV_nThe reconstructed image group G of_nTraining an LSTM network;

(5) if the image group observation vector to be reconstructed still exists, returning to the step (3), and continuing to restore the image groups one by one; otherwise, outputting the rest reconstructed frame F_n+1、…、F_n+L-1And finishing video reconstruction as the final n +1, … and n + L-1 reconstructed frames.

Each image group observation vector in the step (1) comprises L frame observation vectors, wherein L is more than or equal to 2, and each frame observation vector contains M measured values.

The observation vector GMV for the 1 st image group in the step (2)₁The 1 st reconstruction image group G is restored by the reconstruction end frame by adopting an image reconstruction algorithm₁＝{F₁,F₂,...,F_LThen (G) is added₁,GMV₁) Parameter set for training LSTM networks as reference data pairs §₁To obtain the current set of parameters of the LSTM network §^*＝§₁。

The multi-frame joint iterative reconstruction based on the LSTM network in the step (3) is realized by frame observation vector GMV_n(i) initializing the i frame residual vector R one by one_n,j(i) and the initialized residual vector R_n,j(i) as an input to the LSTM network; using a conversion matrix U to output the LSTM network of the ith frame image in the jth iteration

Conversion to base vectors

ncell is the number of LSTM network neurons; will base vector z_n,j(ii) i) further input to the softmax layer, thereby deriving non-zero probabilities for each element in the ith frame sparse vector, selecting the element with the highest probability and applying it to the iAdding to a support set of frame sparse vectors; finally, finding out sparse vectors { S ] of each frame in the jth iteration one by one through a least square estimation method_n,j(:,i)}_{i＝1,2,…,L}。

And (3) weighting the residual error coefficient by the residual error vector after the repeated iteration calculated by the reconstruction end in the step (3) according to the probability that the residual error coefficient is zero to obtain a weighted residual error minimization problem, and solving the problem by adopting a Split Bregman iteration algorithm.

And (4) evaluating the blind quality of the image group in the step (3) by a Video-BLIINDS or VIIDEO criterion.

The reconstruction end recovers the nth reconstruction image group G frame by frame in the step (4) by adopting an image reconstruction algorithm_n＝{F_n,F_n+1,...,F_n+L-1Will be (G)_n,GMV_n) Training a parameter set of an LSTM network as a reference data pair §_nUpdating the Current parameter set of the LSTM network §^*＝§_n。

When the LSTM network is trained, the LSTM network is adopted to reconstruct the image group G for training_nCarrying out sparse coding, and carrying out sparse representation on given data by using the LSTM network to obtain a coefficient matrix; then fixing the coefficient matrix, and updating each atom of the LSTM network in turn to enable each atom to represent the reconstructed image group G for training more closely_n。

Advantageous effects

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the invention considers the space-time sparse characteristics of continuous multi-frame images, and provides a video compression sensing reconstruction method integrating sparse prior modeling and data memory driving. The method can simultaneously consider a large number of frames, does not need to make linear assumption on the motion of the object, can comprehensively reflect the motion information of the object, and is favorable for recovering the structure and detail information of a multi-frame image integrally, thereby improving the quality of the reconstructed video.

Drawings

FIG. 1 is a timing diagram of image group observation vectors versus frame observation vectors;

FIG. 2 is a general flow diagram of a video compressed perceptual reconstruction method;

fig. 3 is a flow chart of multi-frame joint iterative reconstruction based on an LSTM network.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

In compressed sensing video communication, a measuring terminal performs each frame of independent measurement on an original video or a sub video divided by blocks frame by frame, and sends a frame observation vector code stream. The reconstruction end receives the code stream and combines the code stream to form continuous image group observation vectors, each image group observation vector contains L frame observation vectors, i represents the frame number (i is more than or equal to 1 and less than or equal to L) in the same image group, each frame observation vector contains M measured values,

represents the nth image group observation vector (n ≧ 1), and L frame observation vectors are arranged into GMV_nL column of (1), wherein GMV_n(: i) denotes the i-th frame observation vector. Based on the continuous frame observation vector code stream, fig. 1 shows a timing relationship diagram of the image group observation vectors, where the total number L of frame numbers included in each image group observation vector is 3. The reconstruction end carries out multi-frame joint iterative reconstruction based on long-short term memory (LSTM) on each image group observation vector, and determines whether to update the current parameter set of the LSTM network in a self-adaptive mode according to the recent condition of iterative reconstruction because the previous image group and the next image group of the video have strong correlation.

In the reconstructed video, each reconstructed image group contains L reconstructed frames, G_n＝{F_n,F_n+1,...,F_n+L-1Denotes an nth reconstructed image group, Q^b _nRepresents G_nBlind quality of image groups, R_n,jRepresenting the nth image set residual vector in the jth iteration. F_nRepresenting the nth reconstructed frame, containing N pixels. S_n,jRepresenting a reconstructed image group G_nAnd (4) corresponding image group sparse vectors in the j iteration. Based on the LSTM network and the blind estimation of the image group quality, fig. 2 shows a general flow chart of a video compressed sensing reconstruction method, which mainly includes the following steps:

in the first step, in the initialization operation, the sequence number n is 1. Because the original frame can not be obtained, the reconstruction end adopts the 1 st image group observation vector GMV₁The recovered reconstructed image set trains LSTM network parameters. For GMV₁The 1 st reconstruction image group G is restored by the reconstruction end frame by adopting the image reconstruction algorithm with the total variation minimization₁＝{F₁,F₂,...,F_LWill then be (G)₁,GMV₁) Training the LSTM network as a reference data pair to obtain its parameter set §₁As the current set of parameters for the LSTM network §^*＝§₁。

Step two, observing vector GMV for nth image group_nAnd the reconstruction end executes multi-frame joint iterative reconstruction based on the LSTM network, calculates a residual error in each iteration, weights the residual error coefficient according to the probability that the residual error coefficient is zero to form a weighted residual error minimization problem, and adopts a Split Bregman iterative algorithm to solve the problem. The stop condition of multi-frame joint iterative reconstruction is that the iteration number reaches the maximum value K or the residual error l₂Norm R_n,j||₂Less than threshold resMin or group of images blind quality Q^b _nAbove the threshold qMax. The blind evaluation of the quality of the image sets uses the Video-BLIINDS or VIIDEO criterion, the quality being better for larger values. In cooperation with newly introduced qMax, K may be selected to be larger and resMin may be selected to be smaller. The reconstruction end completes the image group observation vector GMV_nTo obtain the reconstructed frames { F one by one_n+i-1＝Ψ·S_n,j(:,i)}_{i＝1,2,…,L}Further, a reconstructed image group G is obtained_n＝{F_n,F_n+1,...,F_n+L-1And outputs a reconstructed frame F_nAs the final n-th reconstructed frame, the rest of the reconstructed frames F_n+1、…、F_n+L-1As the initial state of the same time-series frame of the subsequent image group. The adjacent image groups have approximate multi-frame joint sparse characteristics, and the parameter set of the LSTM network generally does not need to be updated in the recovery of the multiple image groups. When the recovery of the continuous alpha image groups is finished, if the situation that the iteration times reach the maximum value and stop continues for the alpha image groups, continuing the step three; otherwise, subsequent image group observation vectors still employ the current parameter set of the LSTM network §^*And jumping to the step four.

In the multi-frame joint iterative reconstruction of the step, L frame sparse vectors are combined into an image group observation vector to carry out joint recovery, and L reconstructed frames are output according to the frame sequence and the frame rate to form a reconstructed image group. In the case of video reconstruction, it is known that,

is a matrix of a gaussian random number and,

is a dual-tree wavelet transform base, observation matrix

For any image group observation vector, a multi-frame joint iterative reconstruction flow chart based on the LSTM network is shown in FIG. 3. Image group residual vector R_n,jI frame residual vector R of_n,j(:,i)＝GMV_n(:,i)-A_j·S_n,j(i), frame number i is 1,2, …, L, A_jIs to include only the corresponding S_n,jA matrix of A columns supporting the elements of the set, S_n,j(i) is the image group sparse vector S_n,jThe ith frame sparse vector of (1). L is the total number of frame numbers in a group of pictures observation vector, i.e. S_n,jThe number of columns. The joint sparse dependency of multiple frame observation vectors tends to be gradual, and needs to be dynamically obtained by calculating the conditional probability of the residual. The method is obtained by calculating the conditional probability of each vectorThis dependency uses a data-driven LSTM network to infer these probabilities, completing the GMV_n(: i) and A)_jIs estimated by least squares. Suppose S_n,jThe columns are sparse in common, i.e. the non-zero elements of each vector appear at the same positions as the other vectors, which means that the frame sparse vectors have the same support set. The method observes a vector GMV through an ith frame_n(i) initializing the i frame residual vector R one by one_n,j(ii) (i) these residual vectors are used as input to the LSTM network; the LSTM network for the ith frame in the jth iteration is then output using the transformation matrix U

Conversion into base vectors

Will z_n,j(ii) (i) input to the softmax layer, whose output is expressed as conditional probabilities, thereby finding the non-zero probabilities of the elements in the ith frame sparse vector, selecting the element with the highest probability and adding it to the support set of frame sparse vectors; and then finding the ith frame sparse vector by a least square estimation method, and simultaneously calculating a new ith frame residual vector as the input of the LSTM network in the next iteration.

Step three, the reconstruction end utilizes the nth image group observation vector GMV_nThe reconstructed image set of (a) trains LSTM network parameters. The reconstruction end recovers the nth reconstruction image group G by adopting an image reconstruction algorithm with minimized total variation frame by frame_n＝{F_n,F_n+1,...,F_n+L-1Will be (G)_n,GMV_n) Retraining the parameter set of the LSTM network as a reference data pair §_nAnd update the current set of parameters of the LSTM network §^*＝§_n。

In the third step, if the iteration times reach the maximum value and the situation of stopping continues for alpha image groups, the training and updating of the LSTM network parameters are started. Reconstructed image set G_nAnd its corresponding group of images observation vector GMV_nForm a reference data pair (G)_n,GMV_n). To solve for the LSTM network parameters, it is necessary to minimize LThe STM network gives a cross entropy cost function between the conditional probability and the known probability of the reference data pair. The alternate training process first uses the LSTM network to reconstruct the image group G for training_nPerforming sparse coding, i.e. fixing an LSTM network with which to sparsely represent given data, i.e. to represent the image set observation vectors GMV as closely as possible with as few coefficients as possible_nObtaining a coefficient matrix; then, the coefficient matrix is fixed, and each atom of the LSTM network (each column of the LSTM network) is updated in turn to enable the atom to represent the reconstructed image group G for training more closely_n. The training process of the LSTM network parameters is usually carried out again at longer time sequence numbers, the smaller the alpha value is, the more stable the quality of the reconstructed video is, but more computing resources are consumed, and the updated LSTM network parameter set is used for recovering the subsequent image group.

Step four, if the image group observation vector to be reconstructed still exists at the reconstruction end, if n is n +1, jumping back to the step two, repeatedly executing the process, and continuing to recover the subsequent image group; otherwise, the reconstruction end outputs G_nThe remaining reconstructed frame F_n+1、…、F_n+L-1And the reconstructed frames are the final n +1, … and n + L-1 reconstructed frames, so that the video reconstruction is completed.

The invention provides a video compression sensing reconstruction method integrating sparse prior modeling and data memory driving, which considers the space-time sparse characteristics of continuous multi-frame images. The method can simultaneously consider a large number of frames, does not need to make linear assumption on the motion of the object, can comprehensively reflect the motion information of the object, and is favorable for recovering the structure and detail information of a multi-frame image integrally, thereby improving the quality of the reconstructed video.

Claims

1. A video compression sensing reconstruction method based on LSTM network and image group quality blind evaluation is characterized by comprising the following steps:

(3) observation vector GMV for nth image group_nExecuting multi-frame joint iterative reconstruction based on the LSTM network, wherein n is more than or equal to 2, and the stopping condition is that when the iteration number reaches the maximum value K or the residual error l₂Norm R_n,j||₂Less than threshold resMin or group of images blind quality Q^b _nIs higher than the threshold value qMax, thereby completing the recovery of the n-th reconstructed image group and reconstructing a frame F in the n-th reconstructed image group_nAs the final nth reconstructed frame; after the recovery of the continuous alpha image groups is finished, if the final recovery of each image group is the condition that the iteration times reach the maximum value K and then stop, the step (4) is carried out; otherwise, subsequent multi-frame joint iterative reconstruction still employs the current parameter set of the LSTM network §^*And jumping to the step (5); wherein, the multi-frame joint iterative reconstruction based on the LSTM network is realized by frame observation vector GMV_n(i) initializing the i frame residual vector R one by one_n,j(i) and the initialized residual vector R_n,j(i) as an input to the LSTM network; using a conversion matrix U to output the LSTM network of the ith frame image in the jth iteration

Conversion to base vectors

ncell is the number of LSTM network neurons; will base vector z_n,j(ii) (i) further input to the softmax layer, thereby deriving non-zero probabilities for each element in the ith frame sparse vector, selecting the element with the highest probability and adding it to the support set of frame sparse vectors; finally, finding out sparse vectors { S ] of each frame in the jth iteration one by one through a least square estimation method_n,j(:,i)}_{i＝1,2,…,L}；

(4) The reconstruction end uses the nth image group to observe the vector GMV_nIs reconstructed to obtain a mapImage group G_nTraining an LSTM network;

2. The method for compressed sensing reconstruction of video based on LSTM network and blind evaluation of image group quality as claimed in claim 1, wherein each image group observation vector in step (1) contains L frame observation vectors, wherein L ≧ 2, and each frame observation vector contains M measurement values.

3. The method according to claim 1, wherein the step (2) is implemented by observing the vector GMV for the 1 st group of images₁The 1 st reconstruction image group G is restored by the reconstruction end frame by adopting an image reconstruction algorithm₁＝{F₁,F₂,...,F_LThen (G) is added₁,GMV₁) Parameter set for training LSTM networks as reference data pairs §₁To obtain the current set of parameters of the LSTM network §^*＝§₁。

4. The video compressed sensing reconstruction method based on the LSTM network and the blind estimation of the image group quality according to claim 1, wherein the reconstruction end in the step (3) calculates the residual vector after multiple iterations to weight the residual coefficient according to the probability that the residual coefficient is zero, so as to obtain a weighted residual minimization problem, and the problem is solved by adopting a Split Bregman iteration algorithm.

5. The LSTM network and blind image group quality assessment based Video compressed perceptual reconstruction method of claim 1, wherein the blind image group quality in step (3) is assessed by Video-blinds or VIIDEO criteria.

6. The LSTM network and image group quality blind estimation based video compressed sensing reconstruction method as claimed in claim 1, wherein the reconstruction end in step (4) recovers the n-th reconstructed image group G by using an image reconstruction algorithm frame by frame_n＝{F_n,F_n+1,...,F_n+L-1Will be (G)_n,GMV_n) Training a parameter set of an LSTM network as a reference data pair §_nUpdating the Current parameter set of the LSTM network §^*＝§_n。

7. The method as claimed in claim 1, wherein the LSTM network and the image group quality blind estimation-based video compressed sensing reconstruction method is characterized in that, when the LSTM network is trained, the LSTM network is adopted to reconstruct the image group G for training_nCarrying out sparse coding, and carrying out sparse representation on given data by using the LSTM network to obtain a coefficient matrix; then fixing the coefficient matrix, and updating each atom of the LSTM network in turn to enable each atom to represent the reconstructed image group G for training more closely_n。