WO2018120723A1 - 视频压缩感知重构方法、***、电子装置及存储介质 - Google Patents

视频压缩感知重构方法、***、电子装置及存储介质 Download PDF

Info

Publication number
WO2018120723A1
WO2018120723A1 PCT/CN2017/091311 CN2017091311W WO2018120723A1 WO 2018120723 A1 WO2018120723 A1 WO 2018120723A1 CN 2017091311 W CN2017091311 W CN 2017091311W WO 2018120723 A1 WO2018120723 A1 WO 2018120723A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
video
feature
layer
input
Prior art date
Application number
PCT/CN2017/091311
Other languages
English (en)
French (fr)
Inventor
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to KR1020187017256A priority Critical patent/KR102247907B1/ko
Priority to SG11201808823PA priority patent/SG11201808823PA/en
Priority to AU2017389534A priority patent/AU2017389534A1/en
Priority to EP17885721.5A priority patent/EP3410714A4/en
Priority to JP2018530728A priority patent/JP6570155B2/ja
Priority to US16/084,234 priority patent/US10630995B2/en
Publication of WO2018120723A1 publication Critical patent/WO2018120723A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/89Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving methods or arrangements for detection of transmission errors at the decoder

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a video compression sensing reconstruction method, system, electronic device, and storage medium.
  • the existing time domain-based video compression sensing algorithms are generally very sensitive to computational complexity, especially when rendering reconstructed video frames, which is extremely slow, even when using a graphics processing unit (GPU) for parallel acceleration. Significantly improved this problem.
  • GPU graphics processing unit
  • the quality of reconstruction is usually lower. Therefore, how to render reconstructed video frames with high speed and high quality has become a technical problem to be solved.
  • the main object of the present invention is to provide a video compression sensing reconstruction method, system, electronic device and storage medium, which aim to reconstruct a reconstructed video frame at high speed and high quality.
  • a first aspect of the present application provides a video compression sensing reconstruction method, where the method includes the following steps:
  • the plurality of feature abstraction hidden layers of the video frame reconstruction model reconstruct the input frame fragments into frame fragment blocks according to the established nonlinear mapping, and reconstruct the frame fragments of the model by the video frames.
  • the block output layer outputs the reconstructed frame fragment block, and the reconstructed video is generated based on the reconstructed frame fragment block.
  • a second aspect of the present application provides a video compressed sensing reconstruction system, where the video compressed sensing reconstruction system includes:
  • An extracting module configured to extract, according to a predetermined extraction rule, a frame fragment of the compressed video frame after receiving the compressed video frame to be reconstructed;
  • Feature abstraction module for inputting the extracted frame fragments into the pre-trained video frame weight Forming a frame fragment input layer of the model, performing feature abstraction on the input frame fragments by the multiple feature abstract hidden layers of the video frame reconstruction model, and establishing a nonlinear mapping between the frame fragments to the frame fragment blocks;
  • a reconstruction module configured to reconstruct, by the plurality of feature abstraction hidden layers of the video frame reconstruction model, the input frame fragments into a frame fragment block according to the established nonlinear mapping, and reconstructed by the video frame
  • the frame fragment block output layer of the model outputs the reconstructed frame fragment block, and the reconstructed video is generated based on the reconstructed frame fragment block.
  • a third aspect of the present application provides an electronic device, including a processing device and a storage device.
  • the storage device stores a video compression aware reconstruction program including at least one computer readable instruction, the at least one computer readable instruction
  • the processing device executes to:
  • a plurality of feature abstraction hidden layers of the video frame reconstruction model reconstruct the input frame fragments into frame fragment blocks according to the established nonlinear mapping, and output the frame fragment blocks of the video frame reconstruction model
  • the layer outputs the reconstructed frame fragment block, and the reconstructed video is generated based on the reconstructed frame fragment block.
  • a fourth aspect of the present application provides a computer readable storage medium having stored thereon at least one computer readable instruction executable by a processing device to:
  • a plurality of feature abstraction hidden layers of the video frame reconstruction model reconstruct the input frame fragments into frame fragment blocks according to the established nonlinear mapping, and output the frame fragment blocks of the video frame reconstruction model
  • the layer outputs the reconstructed frame fragment block, and the reconstructed video is generated based on the reconstructed frame fragment block.
  • the video compression sensing reconstruction method, system, electronic device and storage medium extract frame fragments of a compressed video frame to be reconstructed by a predetermined extraction rule; and reconstruct a model from a pre-trained video frame
  • the feature abstract hidden layer performs feature abstraction on the frame fragment, establishes a nonlinear mapping between the frame fragment and the frame fragment block, and reconstructs the input frame fragment into a frame fragment block according to the nonlinear mapping.
  • the reconstruction is performed for the frame fragment instead of Processing large compressed video frames directly reduces computational complexity and speeds up video frame reconstruction; and, by pre-training video frames, reconstructs multiple feature abstraction layers of the model for each frame fragment Performing feature abstraction and reconstructing frame fragments into frame fragment blocks for output can effectively extract each detail feature of the compressed video frame and improve the quality of video frame reconstruction.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a video compression sensing reconstruction method according to the present invention
  • FIG. 2 is a schematic flowchart of a first embodiment of a video compression sensing reconstruction method according to the present invention
  • FIG. 3 is a schematic flowchart of a second embodiment of a video compression sensing reconstruction method according to the present invention.
  • FIG. 4 is a schematic structural diagram of a video frame reconstruction model in an embodiment of a video compression sensing reconstruction method according to the present invention.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a video compression sensing reconstruction system according to the present invention.
  • FIG. 6 is a schematic diagram of functional modules of a second embodiment of a video compression sensing reconstruction system according to the present invention.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a video compression sensing reconstruction method according to the present invention.
  • the electronic device 1 is an apparatus capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the electronic device 1 includes a storage device 11, a processing device 12, and the like.
  • the processing device 12 is for supporting the operation of the electronic device 1, which may include one or more microprocessors, digital processors, and the like.
  • the storage device 11 is for storing various data and computer readable instructions, which may include one or more non-volatile memories such as a ROM, an EPROM or a Flash Memory.
  • the storage device There is stored therein a video compression aware reconstruction program comprising at least one computer readable instruction stored in the storage device 11, the at least one computer readable instruction being executable by the processor device 12 to implement the embodiments of the present application. Video compression sensing reconstruction method.
  • the invention provides a video compression sensing reconstruction method.
  • FIG. 2 is a schematic flowchart diagram of a first embodiment of a video compression sensing reconstruction method according to the present invention.
  • the video compressed sensing reconstruction method includes:
  • Step S10 After receiving the compressed video frame to be reconstructed, extract the frame fragment of the compressed video frame according to a predetermined extraction rule.
  • the compressed video frame after receiving the compressed video frame to be reconstructed, the compressed video frame is not directly reconstructed, but the compressed video frame is first extracted according to a predetermined extraction rule.
  • the predetermined extraction rule may be to extract frame fragments of the compressed video frame according to different characteristics such as color, content, format, and area size, which is not limited herein.
  • the predetermined extraction rule is: performing block division on the compressed video frame to be reconstructed, and dividing the compressed video frame to be reconstructed into several frame fragments.
  • block division is performed on various types of compressed video frames such as JPEG, PNG, etc.
  • the compressed video frame is divided into N*M (for example, 32*32) frame fragments, and N and M are positive integers.
  • the compressed video frame may be equally divided into frame fragments of the same size, or the compressed video frames may be divided into frame fragments of different sizes according to a certain ratio or random. This is not limited.
  • the frame fragment may be a square, a rectangle, or the like having a regular shape, or may be an irregularly shaped piece, which is not limited herein.
  • Step S20 input the extracted frame fragments into a frame fragment input layer of the pre-trained video frame reconstruction model, and perform feature abstraction on the input frame fragments by using multiple feature abstract hidden layers of the video frame reconstruction model to establish A non-linear mapping of frame fragments to frame fragment blocks.
  • the frame fragments of the compressed video frame may be processed correspondingly by using a pre-trained video frame reconstruction model.
  • the video frame reconstruction model may be established and trained each time the video compression sensing reconstruction is performed, or may be a pre-created and trained model, and the model is directly called each time the video compression sensing reconstruction is performed. Yes, there is no limit here.
  • the video frame reconstruction model may include a frame fragment input layer, a frame fragment block output layer, and a plurality of feature abstract hidden layers, and after extracting the frame fragments of the compressed video frame, the extraction will be performed.
  • the frame fragment is input into the frame fragment input layer of the video frame reconstruction model, and the feature fragment abstraction layer of the video frame reconstruction model performs feature abstraction on the input frame fragment, and establishes a frame fragment to the frame fragment block.
  • a non-linear mapping that links each frame fragment to the final reconstructed frame fragmentation block.
  • Step S30 the plurality of feature abstraction hidden layers of the video frame reconstruction model reconstruct the input frame fragments into frame fragment blocks according to the established nonlinear mapping, and reconstruct the model frames by the video frames.
  • the fragment block output layer outputs the reconstructed frame fragment block, and the reconstructed video is generated based on the reconstructed frame fragment block.
  • the plurality of feature abstraction hidden layers of the video frame reconstruction model according to the established non-linear mapping, that is, the mapping relationship between the feature abstraction of each frame fragment and the finally reconstructed frame fragment block, the input frame
  • the fragment is reconstructed into a final frame fragment block, and the reconstructed frame fragment block is output via the frame fragment block output layer of the video frame reconstruction model, and the reconstructed video is generated based on the reconstructed frame fragment block, such as reconstruction
  • the frame fragment block is spliced, combined, and the like to finally generate a reconstructed video, and the rendering and reconstruction of the compressed video frame is completed.
  • the frame fragment of the compressed video frame to be reconstructed is extracted by using a predetermined extraction rule; the feature abstraction layer of the pre-trained video frame reconstruction model is characterized by the feature abstraction, and the frame is established. Fragmentation to a non-linear mapping between frame fragment blocks, and reconstructing the input frame fragments into frame fragment fragments according to the non-linear mapping. Since the compressed video frame to be reconstructed is extracted into frame fragments, the frame fragments are reconstructed instead of directly processing the larger compressed video frames, which reduces computational complexity and improves video frame reconstruction.
  • a second embodiment of the present invention provides a video compression sensing reconstruction method.
  • the method further includes:
  • Step S40 creating and training a video frame reconstruction model, where the video frame reconstruction model includes at least one frame fragment input layer, at least one frame fragment block output layer, and a plurality of feature abstraction hidden layers.
  • a video frame reconstruction model before performing video frame reconstruction, a video frame reconstruction model needs to be created and trained, where the video frame reconstruction model includes at least one frame fragment input layer, at least one frame fragment block output layer, and multiple features. Abstract hidden layer.
  • the step of generating the training data and the test data includes:
  • the accumulated data size of all the acquired videos needs to meet the preset value (for example, 10K).
  • w b is the width of video block b with a preset number of videos
  • h b is the height of video block b
  • d b is the length of video block b (ie, the number of video frames)
  • each video block is x i ⁇ w b ⁇ h b ⁇ d b ,i ⁇ N,N is a positive integer not less than 1
  • All compressed video is divided into a first data set and a second data set according to a preset ratio such as X:Y (for example, 7:3), wherein the number of videos in the first data set is greater than the video in the second data set Quantity, the first data set is used as the training set, and the second data set is used as the test set, where X is greater than 0 and Y is greater than 0.
  • a preset ratio such as X:Y (for example, 7:3)
  • the training process of the video frame reconstruction model is as follows:
  • the batch size of the input video frame can be set to 200, and the total number of training times can be set to 10 ⁇ 10 6 iterations, between each video frame input.
  • the difference in size is normalized to a range where the mean is 0 and the standard deviation is 1.
  • the neuron weights of each feature abstract hidden layer are randomly initialized, and the random values come from a range of Uniform distribution, the variable s is the number of neurons in the abstract hidden layer of the previous feature.
  • the Stochastic Gradient Descent (SGD) algorithm is used to optimize the parameters in the video frame reconstruction model.
  • the stochastic gradient descent algorithm is suitable for many control variables, and the controlled system is complex, and it is impossible to establish an optimal control process for accurate mathematical models.
  • the initial learning rate can be set to 0.001, and the learning rate will become one tenth of the original every 3 ⁇ 10 6 iterations.
  • the momentum term (Momentum) of the stochastic gradient descent algorithm can be set to 0.9. The gradient can be cropped while the stochastic gradient is falling.
  • E(x) f(x)+r(x)
  • f(x) is the loss function, used to evaluate the model training loss, is an arbitrary microfeasible function
  • r(x) is a normalized constraint factor, which is used to limit the model, according to the probability distribution of the model parameters
  • r (x) Generally there are L1 norm constraints (models obey Gaussian distribution) and L2 paradigm constraints (models obey Laplacian distribution), and the weight update gradient is trimmed by using the L2 paradigm to ensure that the gradient is always in a certain range, so that It is possible to prevent the gradient explosion phenomenon from affecting the convergence of the model, and the gradient clipping threshold can be set to 10.
  • FIG. 4 is a schematic structural diagram of a video frame reconstruction model according to an embodiment of a video compression sensing reconstruction method according to the present invention.
  • the video frame reconstruction model includes a frame fragment input layer, a frame fragment block output layer, and k feature abstract hidden layers (k is a natural number greater than 1), and each feature abstract hidden layer has the following formula:
  • the implicit layer activation value vector is abstracted
  • L k is the number of neurons in the k-th layer feature abstract hidden layer
  • Abstract the hidden layer neuron offset vector for this feature As a weight matrix, An abstract hidden layer input vector is added for this feature.
  • is the feature Abstract hidden layer activation value vector, number of neurons, activation function, neuron offset vector and parameter set of weight matrix
  • y i is the frame fragment input through the frame fragment input layer
  • f(y i ; ⁇ ) The feature abstraction is performed on the frame fragments input through the frame fragment input layer by a plurality of feature abstract hidden layers, and the non-linear mapping between the frame fragments and the frame fragment blocks is established.
  • the frame fragment input layer receives the input of the frame fragment, passes through the frame feature abstraction of the K layer feature abstract hidden layer, and finally inputs to the frame fragment block output layer, and the dimension and final reconstruction of the frame fragment block output layer.
  • the total size of the video blocks is the same, both of which are w m ⁇ h m ⁇ d m .
  • BP Error Back Propagation
  • the optimization function is MSE (Mean Squared Error). Have:
  • the frame fragment input layer dimension of the video frame reconstruction model may be set to 8 ⁇ 8, and the frame fragment block output layer dimension of the video frame reconstruction model may be set to 8 ⁇ 8 ⁇ 16.
  • the feature abstract hidden layer of the video frame reconstruction model can be set to 7 layers, and each feature abstract hidden layer dimension can be set to 128, 256, 384, 512, 512, 4096, 2048, respectively.
  • the invention further provides a video compression sensing reconstruction system.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a video compression sensing reconstruction system according to the present invention.
  • the video compressed sensing reconstruction system includes:
  • the extracting module 01 is configured to: after receiving the compressed video frame to be reconstructed, extract frame fragments of the compressed video frame according to a predetermined extraction rule;
  • the compressed video frame after receiving the compressed video frame to be reconstructed, the compressed video frame is not directly reconstructed, but the compressed video frame is first extracted according to a predetermined extraction rule.
  • the predetermined extraction rule may be to extract frame fragments of the compressed video frame according to different characteristics such as color, content, format, and area size, which is not limited herein.
  • the predetermined extraction rule is: performing block division on the compressed video frame to be reconstructed, and dividing the compressed video frame to be reconstructed into several frame fragments.
  • block division is performed on various types of compressed video frames such as JPEG, PNG, etc.
  • the compressed video frame is divided into N*M (for example, 32*32) frame fragments, and N and M are positive integers.
  • the compressed video frame may be equally divided into frame fragments of the same size, or the compressed video frames may be divided into frame fragments of different sizes according to a certain ratio or random. This is not limited.
  • the frame fragment may be a square, a rectangle, or the like having a regular shape, or may be an irregularly shaped piece, which is not limited herein.
  • the feature abstraction module 02 is configured to input the extracted frame fragments into a frame fragment input layer of the pre-trained video frame reconstruction model, and perform multiple frame abstraction hidden layers of the video frame reconstruction model on the input frame fragments. Feature abstraction, establishing a non-linear mapping between frame fragments to frame fragment blocks;
  • the frame fragments of the compressed video frame may be processed correspondingly by using a pre-trained video frame reconstruction model.
  • the video frame reconstruction model may be established and trained each time the video compression sensing reconstruction is performed, or may be a pre-created and trained model, and the model is directly called each time the video compression sensing reconstruction is performed. Yes, there is no limit here.
  • the video frame reconstruction model may include a frame fragment input layer, a frame fragment block output layer, and a plurality of feature abstract hidden layers, and after extracting frame fragments of the compressed video frame, extracting The frame fragment is input into the frame fragment input layer of the video frame reconstruction model, and the feature fragment abstraction layer of the video frame reconstruction model performs feature abstraction on the input frame fragment, and establishes a frame fragment to the frame fragment block.
  • a non-linear mapping that links each frame fragment to the final reconstructed frame fragmentation block.
  • a reconstruction module 03 configured to reconstruct, by the plurality of feature abstraction hidden layers of the video frame reconstruction model, the input frame fragments into frame fragment fragments according to the established nonlinear mapping, and are heavy by the video frame
  • the frame fragment block output layer of the model outputs the reconstructed frame fragment block, and the reconstructed video is generated based on the reconstructed frame fragment block.
  • the plurality of feature abstraction hidden layers of the video frame reconstruction model according to the established non-linear mapping, that is, the mapping relationship between the feature abstraction of each frame fragment and the finally reconstructed frame fragment block, the input frame
  • the fragment is reconstructed into a final frame fragment block, and the reconstructed frame fragment block is output via the frame fragment block output layer of the video frame reconstruction model, and the reconstructed video is generated based on the reconstructed frame fragment block, such as reconstruction
  • the frame fragment block is spliced, combined, and the like to finally generate a reconstructed video, and the rendering and reconstruction of the compressed video frame is completed.
  • the frame fragment of the compressed video frame to be reconstructed is extracted by using a predetermined extraction rule; the feature abstraction layer of the pre-trained video frame reconstruction model is characterized by the feature abstraction, and the frame is established. Non-linear mapping of fragmentation to frame fragmentation blocks, And converting the input frame fragments into frame fragment blocks according to the nonlinear mapping and outputting them. Since the compressed video frame to be reconstructed is extracted into frame fragments, the frame fragments are reconstructed instead of directly processing the larger compressed video frames, which reduces computational complexity and improves video frame reconstruction.
  • the second embodiment of the present invention provides a video compression sensing reconstruction system. Based on the foregoing embodiments, the method further includes:
  • a module 04 is created for creating and training a video frame reconstruction model, the video frame reconstruction model including at least one frame fragment input layer, at least one frame fragment block output layer, and a plurality of feature abstraction hidden layers.
  • a video frame reconstruction model needs to be created and trained, where the video frame reconstruction model includes at least one frame fragment input layer, at least one frame fragment block output layer, and multiple features.
  • the creating module 04 further includes a generating unit for generating training data and test data, the generating unit is configured to: acquire a preset number (for example, 100) of different kinds of natural scene videos, and acquire the obtained Each video is converted to a grayscale color space. The accumulated data size of all the acquired videos needs to meet the preset value (for example, 10K).
  • w b is the width of the video block b with a preset number of videos
  • h b is the height of the video block b
  • d b is the length of the video block b (ie, the number of video frames)
  • each video block is x i ⁇ w b ⁇ h b ⁇ d b ,i ⁇ N,N is a positive integer not less than 1
  • All compressed video is divided into a first data set and a second data set according to a preset ratio such as X:Y (for example, 7:3), wherein the number of videos in the first data set is greater than the video in the second data set Quantity, the first data set is used as the training set, and the second data set is used as the test set, where X is greater than 0 and Y is greater than 0.
  • a preset ratio such as X:Y (for example, 7:3)
  • the training process of the video frame reconstruction model is as follows:
  • the batch size of the input video frame can be set to 200, and the total number of training times can be set to 10 ⁇ 10 6 iterations, between each video frame input.
  • the difference in size is normalized to a range where the mean is 0 and the standard deviation is 1.
  • the neuron weights of each feature abstract hidden layer are randomly initialized, and the random values come from a range of Uniform distribution, the variable s is the number of neurons in the abstract hidden layer of the previous feature.
  • the Stochastic Gradient Descent (SGD) algorithm is used to optimize the parameters in the video frame reconstruction model.
  • the stochastic gradient descent algorithm is suitable for many control variables, and the controlled system is complex, and it is impossible to establish an optimal control process for accurate mathematical models.
  • the initial learning rate can be set to 0.001, and the learning rate will become one tenth of the original every 3 ⁇ 10 6 iterations.
  • the momentum term (Momentum) of the stochastic gradient descent algorithm can be set to 0.9. The gradient can be cropped while the stochastic gradient is falling.
  • E(x) f(x)+r(x)
  • f(x) is the loss function, used to evaluate the model training loss, is an arbitrary microfeasible function
  • r(x) is a normalized constraint factor, which is used to limit the model, according to the probability distribution of the model parameters
  • r (x) Generally there are L1 norm constraints (models obey Gaussian distribution) and L2 paradigm constraints (models obey Laplacian distribution), and the weight update gradient is trimmed by using the L2 paradigm to ensure that the gradient is always in a certain range, so that It is possible to prevent the gradient explosion phenomenon from affecting the convergence of the model, and the gradient clipping threshold can be set to 10.
  • the video frame reconstruction model includes a frame fragment input layer, a frame fragment block output layer, and k feature abstract hidden layers (k is a natural number greater than 1), and each feature abstraction is hidden.
  • the containing layer has the following formula:
  • the implicit layer activation value vector is abstracted
  • L k is the number of neurons in the k-th layer feature abstract hidden layer
  • Abstract the hidden layer neuron offset vector for this feature As a weight matrix, An abstract hidden layer input vector is added for this feature.
  • is the feature Abstract hidden layer activation value vector, number of neurons, activation function, neuron offset vector and parameter set of weight matrix
  • y i is the frame fragment input through the frame fragment input layer
  • f(y i ; ⁇ ) The feature abstraction is performed on the frame fragments input through the frame fragment input layer by a plurality of feature abstract hidden layers, and the non-linear mapping between the frame fragments and the frame fragment blocks is established.
  • the frame fragment input layer receives the input of the frame fragment, passes through the frame feature abstraction of the K layer feature abstract hidden layer, and finally inputs to the frame fragment block output layer, and the dimension and final reconstruction of the frame fragment block output layer.
  • the total size of the video blocks is the same, both of which are w m ⁇ h m ⁇ d m .
  • BP Error Back Propagation
  • the optimization function is MSE (Mean Squared Error). Have:
  • the frame fragment input layer dimension of the video frame reconstruction model may be set to 8 ⁇ 8, and the frame fragment block output layer dimension of the video frame reconstruction model may be set to 8 ⁇ 8 ⁇ 16.
  • the feature abstract hidden layer of the video frame reconstruction model can be set to 7 layers, and each feature abstract hidden layer dimension can be set to 128, 256, 384, 512, 512, 4096, 2048, respectively.
  • the foregoing extraction module 01, the feature abstraction module 02, the reconstruction module 03, the creation module 04, and the like may be embedded in or independent of the electronic device in hardware, or may be stored in the storage device of the electronic device in software.
  • the processing device can be a central processing unit (CPU), a microprocessor, a single chip microcomputer, or the like.
  • the present invention also provides a computer readable storage medium storing a video compressed sensing reconstruction system, the video compressed sensing reconstruction system being executable by at least one processor to cause the The at least one processor performs the steps of the video compression-aware reconstruction method in the foregoing embodiment, and the specific implementation processes of the steps S10, S20, and S30 of the video compression-aware reconstruction method are as described above, and are not described herein again.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and can also be implemented by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

本发明公开了一种视频压缩感知重构方法、***、电子装置及存储介质,该方法包括:B、在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;C、将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;D、由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。本发明能高速且高质量的渲染重构视频帧。

Description

视频压缩感知重构方法、***、电子装置及存储介质
优先权申明
本申请基于巴黎公约申明享有2016年12月30日递交的申请号为CN201611260793.6、名称为“视频压缩感知重构方法及装置”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本发明涉及计算机技术领域,尤其涉及一种视频压缩感知重构方法、***、电子装置及存储介质。
背景技术
现有的基于时间域的视频压缩感知算法普遍对计算复杂度非常敏感,特别是在渲染重构视频帧时处理速度极慢,即便使用图形处理器(Graphics Processing Unit,GPU)进行并行加速也无法显著改善这个问题。虽然,目前也有算法能够较快的完成视频块的感知重建,但是重建的质量通常较低。因此,如何高速且高质量的渲染重构视频帧已经成为一种亟待解决的技术问题。
发明内容
本发明的主要目的在于提供一种视频压缩感知重构方法、***、电子装置及存储介质,旨在高速且高质量的渲染重构视频帧。
本申请第一方面提供一种视频压缩感知重构方法,所述方法包括以下步骤:
B、在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;
C、将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
D、由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
本申请第二方面提供一种视频压缩感知重构***,所述视频压缩感知重构***包括:
提取模块,用于在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;
特征抽象模块,用于将提取的帧碎片输入经预先训练的视频帧重 构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
重构模块,用于由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
本申请第三方面提供一种电子装置,包括处理设备及存储设备;该存储设备中存储有视频压缩感知重构程序,其包括至少一个计算机可读指令,该至少一个计算机可读指令可被所述处理设备执行,以实现以下操作:
在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;
将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
本申请第四方面提供一种计算机可读存储介质,其上存储有至少一个可被处理设备执行以实现以下操作的计算机可读指令:
在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;
将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
本发明提出的视频压缩感知重构方法、***、电子装置及存储介质,通过预先确定的提取规则提取出待重构的压缩视频帧的帧碎片;由经预先训练的视频帧重构模型的多个特征抽象隐含层对该帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射,并根据所述非线性映射将输入的帧碎片重构为帧碎片块之后输出。由于是将待重构的压缩视频帧提取为帧碎片后,针对帧碎片来进行重构,而不是 直接对较大的压缩视频帧进行处理,降低了计算复杂度,提高了视频帧重构的速度;而且,通过预先训练的视频帧重构模型的多个特征抽象隐含层对每一帧碎片进行特征抽象,并将帧碎片重构为帧碎片块进行输出,能有效地提取压缩视频帧的每一细节特征,提高了视频帧重构的质量。
附图说明
图1为本发明实现视频压缩感知重构方法的较佳实施例的应用环境示意图;
图2为本发明视频压缩感知重构方法第一实施例的流程示意图;
图3为本发明视频压缩感知重构方法第二实施例的流程示意图;
图4为本发明视频压缩感知重构方法一实施例中视频帧重构模型的结构示意图;
图5为本发明视频压缩感知重构***第一实施例的功能模块示意图;
图6为本发明视频压缩感知重构***第二实施例的功能模块示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
为了使本发明所要解决的技术问题、技术方案及有益效果更加清楚、明白,以下结合附图和实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
参阅图1所示,是本发明实现视频压缩感知重构方法的较佳实施例的应用环境示意图。所述电子装置1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述电子装置1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
在本实施例中,电子装置1包括存储设备11及处理设备12等。处理设备12用于支撑电子装置1的运行,其可以包括一个或者多个微处理器、数字处理器等。存储设备11用于存储各种数据及计算机可读指令,其可以包括一个或者多个非易失性存储器,如ROM、EPROM或Flash Memory(快闪存储器)等。在一实施例中,存储设备 11中存储有视频压缩感知重构程序,其包括至少一个存储在存储设备11中的计算机可读指令,该至少一个计算机可读指令可被处理器设备12执行,以实现本申请各实施例的视频压缩感知重构方法。
本发明提供一种视频压缩感知重构方法。
参照图2,图2为本发明视频压缩感知重构方法第一实施例的流程示意图。
在第一实施例中,该视频压缩感知重构方法包括:
步骤S10,在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片。
本实施例中,接收到待重构的压缩视频帧后,并不直接对所述压缩视频帧进行渲染重构,而是先对所述压缩视频帧按照预先确定的提取规则进行帧碎片的提取。该预先确定的提取规则可以是根据颜色、内容、格式、面积大小等不同特征对所述压缩视频帧进行帧碎片的提取,在此不做限定。
在一种可选的实施方式中,所述预先确定的提取规则为:对待重构的压缩视频帧进行块分割,将所述待重构的压缩视频帧分成若干帧碎片。例如,对如JPEG、PNG等各种类型的压缩视频帧进行块分割,将所述压缩视频帧分成N*M(例如,32*32)的帧碎片,N和M为正整数。其中,对所述压缩视频帧进行块分割时,可以将所述压缩视频帧等分成各个相同大小的帧碎片,也可以将所述压缩视频帧按一定比例或随机分成不同大小的帧碎片,在此不做限定。帧碎片既可以是形状规则的正方形、长方形等,也可以是形状不规则的碎片,在此不做限定。
步骤S20,将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射。
提取出所述压缩视频帧的帧碎片之后,可利用经预先训练好的视频帧重构模型对该帧碎片进行相应的处理。其中,该视频帧重构模型可以是在每一次进行视频压缩感知重构时进行建立并训练,也可以是预先创建并训练好的模型,每一次进行视频压缩感知重构时直接调用该模型即可,在此不做限定。
例如,本实施例中,所述视频帧重构模型可包括帧碎片输入层、帧碎片块输出层和多个特征抽象隐含层,在提取出所述压缩视频帧的帧碎片之后,将提取的帧碎片输入该视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射,从而将每一帧碎片与最终重构的帧碎片块形成联系。
步骤S30,由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射即每一帧碎片经特征抽象后与最终重构的帧碎片块之间的映射关系,将输入的帧碎片重构为最终的帧碎片块,并经由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频,如对重构的帧碎片块进行拼接、组合等方式最终生成重构的视频,完成所述压缩视频帧的渲染重构。
本实施例通过预先确定的提取规则提取出待重构的压缩视频帧的帧碎片;由经预先训练的视频帧重构模型的多个特征抽象隐含层对该帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射,并根据所述非线性映射将输入的帧碎片重构为帧碎片块之后输出。由于是将待重构的压缩视频帧提取为帧碎片后,针对帧碎片来进行重构,而不是直接对较大的压缩视频帧进行处理,降低了计算复杂度,提高了视频帧重构的速度;而且,通过预先训练的视频帧重构模型的多个特征抽象隐含层对每一帧碎片进行特征抽象,并将帧碎片重构为帧碎片块进行输出,能有效地提取压缩视频帧的每一细节特征,提高了视频帧重构的质量。
如图3所示,本发明第二实施例提出一种视频压缩感知重构方法,在上述实施例的基础上,在上述步骤S10之前还包括:
步骤S40,创建并训练视频帧重构模型,所述视频帧重构模型包括至少一个帧碎片输入层、至少一个帧碎片块输出层和多个特征抽象隐含层。
本实施例中,在进行视频帧重构之前,还需创建并训练视频帧重构模型,所述视频帧重构模型包括至少一个帧碎片输入层、至少一个帧碎片块输出层和多个特征抽象隐含层。在对视频帧重构模型进行训练之前,还包括训练数据和测试数据的生成步骤,该训练数据和测试数据的生成步骤包括:
获取预设数量(例如,100个)的不同种类的自然场景下的视频,并将获取的各个视频转换到灰度颜色空间。其中,获取的所有视频的数据大小累计和需满足预设值(例如,10K)。
将转换后的各个视频通过预定义尺寸为wm×hm×dm(例如,wm=8,hm=8,dm=16)的度量转换矩阵进行压缩。例如,wb为具有预设数量视频的视频块b的宽度,hb为视频块b的高度,db为视频块 b的长度(即视频帧的数量),每个视频块为xi∈wb×hb×db,i∈N,N为不小于1的正整数,压缩后的视频帧yi∈wb×hb,其中,yi=φbxi,φb为度量转换矩阵。
将所有压缩后的视频按照预设比例如X:Y(例如,7:3)的比例分成第一数据集和第二数据集,其中,第一数据集中的视频数量大于第二数据集中的视频数量,将第一数据集作为训练集,第二数据集作为测试集,其中,X大于0,Y大于0。
在一种实施方式中,该视频帧重构模型的训练过程如下:
在训练该视频帧重构模型时,可以将输入的视频帧的batch size(批量尺寸)设置成200,总的训练次数可以设为10×106个迭代次数,输入的每张视频帧之间的大小差值被规约化到均值为0,标准差为1的范围。在训练的起始阶段,每一特征抽象隐含层的神经元权值被随机初始化,随机值来自一个范围为
Figure PCTCN2017091311-appb-000001
的均匀分布,变量s为先前特征抽象隐含层的神经元数目。
在训练过程中,采用随机梯度下降(SGD,Stochastic Gradient Descent)算法来对该视频帧重构模型中的各个参数进行优化。随机梯度下降算法适用于控制变量较多,受控***比较复杂,无法建立准确数学模型的最优化控制过程。本实施例中,起始学习率可以设置为0.001,每隔3×106次迭代学习率会变为原来的十分之一。随机梯度下降算法的冲量项(Momentum)可以设置为0.9,在随机梯度下降的同时还可以对梯度进行裁剪,假设需要求解的目标函数为:E(x)=f(x)+r(x),其中f(x)为损失函数,用来评价模型训练损失,是任意的可微凸函数,r(x)为规范化约束因子,用来对模型进行限制,根据模型参数的概率分布不同,r(x)一般有L1范式约束(模型服从高斯分布)和L2范式约束(模型服从拉普拉斯分布),通过使用L2范式对权值更新梯度进行裁剪以确保梯度始终处于一定范围之类,这样可以防止梯度***现象影响模型的收敛,梯度裁剪的阈值可以被设定为10。
进一步地,在其他实施例中,如图4所示,图4为本发明视频压缩感知重构方法一实施例中视频帧重构模型的结构示意图。该视频帧重构模型包括一个帧碎片输入层、一个帧碎片块输出层和k个特征抽象隐含层(k为大于1的自然数),每一个特征抽象隐含层有如下公式:
hk(y)=θ(bk+wky),
其中,
Figure PCTCN2017091311-appb-000002
为该特征抽象隐含层激活值向量,Lk为第k层特征抽象隐含层的神经元数目,θ(*)也即θ(bk+wky)为激活函数,其 表达式为θ(x)=max(x,0),
Figure PCTCN2017091311-appb-000003
为该特征抽象隐含层神经元偏置向量,
Figure PCTCN2017091311-appb-000004
为权值矩阵,
Figure PCTCN2017091311-appb-000005
为该特征抽象隐含层输入向量。
基于所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵训练得到视频帧重构模型f(yi;ω),其中,ω是所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵的参数集合,yi为经所述帧碎片输入层输入的帧碎片,f(yi;ω)为由多个特征抽象隐含层对经所述帧碎片输入层输入的帧碎片进行特征抽象,建立起来的帧碎片到帧碎片块之间的非线性映射。
参照图4,该帧碎片输入层接收帧碎片的输入,经过K层特征抽象隐含层的帧特征抽象,最后输入到该帧碎片块输出层,该帧碎片块输出层的维度与最终重构的视频块的总尺寸一致,均为wm×hm×dm。为了训练该视频帧重构模型,需要根据输入参数不断调整模型的权值和偏置项。假定把模型的所有参数构成的集合表示为L(ω),使用误差反向传播(Error Back Propagation,BP)算法对参数进行更新,优化函数为MSE(Mean Squared Error,平均平方和错误),则有:
Figure PCTCN2017091311-appb-000006
在一个优选的实施方式中,该视频帧重构模型的帧碎片输入层维度可以设为8×8,该视频帧重构模型的帧碎片块输出层维度可以设为8×8×16,该视频帧重构模型的特征抽象隐含层可以设为7层,各个特征抽象隐含层维度可以分别设为128,256,384,512,512,4096,2048。
本发明进一步提供一种视频压缩感知重构***。
参照图5,图5为本发明视频压缩感知重构***第一实施例的功能模块示意图。
在第一实施例中,该视频压缩感知重构***包括:
提取模块01,用于在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;
本实施例中,接收到待重构的压缩视频帧后,并不直接对所述压缩视频帧进行渲染重构,而是先对所述压缩视频帧按照预先确定的提取规则进行帧碎片的提取。该预先确定的提取规则可以是根据颜色、内容、格式、面积大小等不同特征对所述压缩视频帧进行帧碎片的提取,在此不做限定。
在一种可选的实施方式中,所述预先确定的提取规则为:对待重构的压缩视频帧进行块分割,将所述待重构的压缩视频帧分成若干帧碎片。例如,对如JPEG、PNG等各种类型的压缩视频帧进行块分割,将所述压缩视频帧分成N*M(例如,32*32)的帧碎片,N和M为正整数。其中,对所述压缩视频帧进行块分割时,可以将所述压缩视频帧等分成各个相同大小的帧碎片,也可以将所述压缩视频帧按一定比例或随机分成不同大小的帧碎片,在此不做限定。帧碎片既可以是形状规则的正方形、长方形等,也可以是形状不规则的碎片,在此不做限定。
特征抽象模块02,用于将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
提取出所述压缩视频帧的帧碎片之后,可利用经预先训练好的视频帧重构模型对该帧碎片进行相应的处理。其中,该视频帧重构模型可以是在每一次进行视频压缩感知重构时进行建立并训练,也可以是预先创建并训练好的模型,每一次进行视频压缩感知重构时直接调用该模型即可,在此不做限定。
例如,本实施例中,所述视频帧重构模型可包括帧碎片输入层、帧碎片块输出层和多个特征抽象隐含层,在提取出所述压缩视频帧的帧碎片之后,将提取的帧碎片输入该视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射,从而将每一帧碎片与最终重构的帧碎片块形成联系。
重构模块03,用于由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射即每一帧碎片经特征抽象后与最终重构的帧碎片块之间的映射关系,将输入的帧碎片重构为最终的帧碎片块,并经由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频,如对重构的帧碎片块进行拼接、组合等方式最终生成重构的视频,完成所述压缩视频帧的渲染重构。
本实施例通过预先确定的提取规则提取出待重构的压缩视频帧的帧碎片;由经预先训练的视频帧重构模型的多个特征抽象隐含层对该帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射, 并根据所述非线性映射将输入的帧碎片重构为帧碎片块之后输出。由于是将待重构的压缩视频帧提取为帧碎片后,针对帧碎片来进行重构,而不是直接对较大的压缩视频帧进行处理,降低了计算复杂度,提高了视频帧重构的速度;而且,通过预先训练的视频帧重构模型的多个特征抽象隐含层对每一帧碎片进行特征抽象,并将帧碎片重构为帧碎片块进行输出,能有效地提取压缩视频帧的每一细节特征,提高了视频帧重构的质量。
如图6所示,本发明第二实施例提出一种视频压缩感知重构***,在上述实施例的基础上,还包括:
创建模块04,用于创建并训练视频帧重构模型,所述视频帧重构模型包括至少一个帧碎片输入层、至少一个帧碎片块输出层和多个特征抽象隐含层。
本实施例中,在进行视频帧重构之前,还需创建并训练视频帧重构模型,所述视频帧重构模型包括至少一个帧碎片输入层、至少一个帧碎片块输出层和多个特征抽象隐含层。所述创建模块04还包括用于生成训练数据和测试数据的生成单元,所述生成单元用于:获取预设数量(例如,100个)的不同种类的自然场景下的视频,并将获取的各个视频转换到灰度颜色空间。其中,获取的所有视频的数据大小累计和需满足预设值(例如,10K)。
将转换后的各个视频通过预定义尺寸为wm×hm×dm(例如,wm=8,hm=8,dm=16)的度量转换矩阵进行压缩。例如,wb为具有预设数量视频的视频块b的宽度,hb为视频块b的高度,db为视频块b的长度(即视频帧的数量),每个视频块为xi∈wb×hb×db,i∈N,N为不小于1的正整数,压缩后的视频帧yi∈wb×hb,其中,yi=φbxi,φb为度量转换矩阵。
将所有压缩后的视频按照预设比例如X:Y(例如,7:3)的比例分成第一数据集和第二数据集,其中,第一数据集中的视频数量大于第二数据集中的视频数量,将第一数据集作为训练集,第二数据集作为测试集,其中,X大于0,Y大于0。
在一种实施方式中,该视频帧重构模型的训练过程如下:
在训练该视频帧重构模型时,可以将输入的视频帧的batch size(批量尺寸)设置成200,总的训练次数可以设为10×106个迭代次数,输入的每张视频帧之间的大小差值被规约化到均值为0,标准差为1的范围。在训练的起始阶段,每一特征抽象隐含层的神经元权值被随机初始化,随机值来自一个范围为
Figure PCTCN2017091311-appb-000007
的均匀分布,变量s为先前特征抽象隐含层的神经元数目。
在训练过程中,采用随机梯度下降(SGD,Stochastic Gradient Descent)算法来对该视频帧重构模型中的各个参数进行优化。随机梯度下降算法适用于控制变量较多,受控***比较复杂,无法建立准确数学模型的最优化控制过程。本实施例中,起始学习率可以设置为0.001,每隔3×106次迭代学习率会变为原来的十分之一。随机梯度下降算法的冲量项(Momentum)可以设置为0.9,在随机梯度下降的同时还可以对梯度进行裁剪,假设需要求解的目标函数为:E(x)=f(x)+r(x),其中f(x)为损失函数,用来评价模型训练损失,是任意的可微凸函数,r(x)为规范化约束因子,用来对模型进行限制,根据模型参数的概率分布不同,r(x)一般有L1范式约束(模型服从高斯分布)和L2范式约束(模型服从拉普拉斯分布),通过使用L2范式对权值更新梯度进行裁剪以确保梯度始终处于一定范围之类,这样可以防止梯度***现象影响模型的收敛,梯度裁剪的阈值可以被设定为10。
进一步地,在其他实施例中,该视频帧重构模型包括一个帧碎片输入层、一个帧碎片块输出层和k个特征抽象隐含层(k为大于1的自然数),每一个特征抽象隐含层有如下公式:
kh(y)=θ(bk+wky),
其中,
Figure PCTCN2017091311-appb-000008
为该特征抽象隐含层激活值向量,Lk为第k层特征抽象隐含层的神经元数目,θ(*)也即θ(bk+wky)为激活函数,其表达式为θ(x)=max(x,0),
Figure PCTCN2017091311-appb-000009
为该特征抽象隐含层神经元偏置向量,
Figure PCTCN2017091311-appb-000010
为权值矩阵,
Figure PCTCN2017091311-appb-000011
为该特征抽象隐含层输入向量。
基于所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵训练得到视频帧重构模型f(yi;ω),其中,ω是所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵的参数集合,yi为经所述帧碎片输入层输入的帧碎片,f(yi;ω)为由多个特征抽象隐含层对经所述帧碎片输入层输入的帧碎片进行特征抽象,建立起来的帧碎片到帧碎片块之间的非线性映射。
参照图4,该帧碎片输入层接收帧碎片的输入,经过K层特征抽象隐含层的帧特征抽象,最后输入到该帧碎片块输出层,该帧碎片块输出层的维度与最终重构的视频块的总尺寸一致,均为wm×hm×dm。为了训练该视频帧重构模型,需要根据输入参数不断调整模型的权值和偏置项。假定把模型的所有参数构成的集合表示为L(ω),使用误差反向传播(Error Back Propagation,BP)算法对参数进行更新,优化函数为MSE(Mean Squared Error,平均平方和错误),则 有:
Figure PCTCN2017091311-appb-000012
在一个优选的实施方式中,该视频帧重构模型的帧碎片输入层维度可以设为8×8,该视频帧重构模型的帧碎片块输出层维度可以设为8×8×16,该视频帧重构模型的特征抽象隐含层可以设为7层,各个特征抽象隐含层维度可以分别设为128,256,384,512,512,4096,2048。
在硬件实现上,以上提取模块01、特征抽象模块02、重构模块03、创建模块04等可以以硬件形式内嵌于或独立于电子装置中,也可以以软件形式存储于电子装置的存储设备中,以便于处理设备调用执行以上各个模块对应的操作。该处理设备可以为中央处理单元(CPU)、微处理器、单片机等。
此外,本发明还提供一种计算机可读存储介质,所述计算机可读存储介质存储有视频压缩感知重构***,所述视频压缩感知重构***可被至少一个处理器执行,以使所述至少一个处理器执行如上述实施例中的视频压缩感知重构方法的步骤,该视频压缩感知重构方法的步骤S10、S20、S30等具体实施过程如上文所述,在此不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件来实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上参照附图说明了本发明的优选实施例,并非因此局限本发明的权利范围。上述本发明实施例序号仅仅为了描述,不代表实施例的 优劣。另外,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本领域技术人员不脱离本发明的范围和实质,可以有多种变型方案实现本发明,比如作为一个实施例的特征可用于另一实施例而得到又一实施例。凡在运用本发明的技术构思之内所作的任何修改、等同替换和改进,均应在本发明的权利范围之内。

Claims (20)

  1. 一种视频压缩感知重构方法,其特征在于,所述方法包括以下步骤:
    B、在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;
    C、将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
    D、由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
  2. 如权利要求1所述的视频压缩感知重构方法,其特征在于,所述步骤B之前还包括:
    A、创建并训练视频帧重构模型,所述视频帧重构模型包括至少一个帧碎片输入层、至少一个帧碎片块输出层和多个特征抽象隐含层。
  3. 如权利要求1所述的视频压缩感知重构方法,其特征在于,所述视频帧重构模型包括一个帧碎片输入层、一个帧碎片块输出层和k个特征抽象隐含层,k为大于1的自然数,每一个特征抽象隐含层包含如下公式:
    hk(y)=θ(bk+wky),
    其中,
    Figure PCTCN2017091311-appb-100001
    为该特征抽象隐含层激活值向量,Lk为第k层特征抽象隐含层的神经元数目,θ(bk+wky)为激活函数,
    Figure PCTCN2017091311-appb-100002
    为该特征抽象隐含层神经元偏置向量,
    Figure PCTCN2017091311-appb-100003
    为权值矩阵,
    Figure PCTCN2017091311-appb-100004
    为该特征抽象隐含层输入向量;基于所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵训练得到视频帧重构模型f(yi;ω),其中,ω是所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵的参数集合,yi为经所述帧碎片输入层输入的帧碎片,f(yi;ω)为由多个特征抽象隐含层对经所述帧碎片输入层输入的帧碎片进行特征抽象,建立起来的帧碎片到帧碎片块之间的非线性映射。
  4. 如权利要求2所述的视频压缩感知重构方法,其特征在于, 所述视频帧重构模型包括一个帧碎片输入层、一个帧碎片块输出层和k个特征抽象隐含层,k为大于1的自然数,每一个特征抽象隐含层包含如下公式:
    hk(y)=θ(bk+wky),
    其中,
    Figure PCTCN2017091311-appb-100005
    为该特征抽象隐含层激活值向量,Lk为第k层特征抽象隐含层的神经元数目,θ(bk+wky)为激活函数,
    Figure PCTCN2017091311-appb-100006
    为该特征抽象隐含层神经元偏置向量,
    Figure PCTCN2017091311-appb-100007
    为权值矩阵,
    Figure PCTCN2017091311-appb-100008
    为该特征抽象隐含层输入向量;基于所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵训练得到视频帧重构模型f(yi;ω),其中,ω是所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵的参数集合,yi为经所述帧碎片输入层输入的帧碎片,f(yi;ω)为由多个特征抽象隐含层对经所述帧碎片输入层输入的帧碎片进行特征抽象,建立起来的帧碎片到帧碎片块之间的非线性映射。
  5. 如权利要求1所述的视频压缩感知重构方法,其特征在于,所述预先确定的提取规则为:
    对待重构的压缩视频帧进行块分割,将所述待重构的压缩视频帧分成若干帧碎片。
  6. 如权利要求2所述的视频压缩感知重构方法,其特征在于,所述预先确定的提取规则为:
    对待重构的压缩视频帧进行块分割,将所述待重构的压缩视频帧分成若干帧碎片。
  7. 如权利要求2所述的视频压缩感知重构方法,其特征在于,所述步骤A还包括训练数据和测试数据的生成步骤,该训练数据和测试数据的生成步骤包括:
    获取预设数量的不同种类的自然场景下的视频,并将获取的各个视频转换到灰度颜色空间;
    将转换后的各个视频通过预定义的度量转换矩阵进行压缩;
    将所有压缩后的视频按照预设比例分成第一数据集和第二数据集,将第一数据集作为训练集,第二数据集作为测试集。
  8. 一种视频压缩感知重构***,其特征在于,所述视频压缩感知重构***包括:
    提取模块,用于在收到待重构的压缩视频帧后,根据预先确定的 提取规则提取出所述压缩视频帧的帧碎片;
    特征抽象模块,用于将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
    重构模块,用于由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
  9. 如权利要求8所述的视频压缩感知重构***,其特征在于,还包括:
    创建模块,用于创建并训练视频帧重构模型,所述视频帧重构模型包括至少一个帧碎片输入层、至少一个帧碎片块输出层和多个特征抽象隐含层。
  10. 如权利要求8所述的视频压缩感知重构***,其特征在于,所述视频帧重构模型包括一个帧碎片输入层、一个帧碎片块输出层和k个特征抽象隐含层,k为大于1的自然数,每一个特征抽象隐含层包含如下公式:
    hk(y)=θ(bk+wky),
    其中,
    Figure PCTCN2017091311-appb-100009
    为该特征抽象隐含层激活值向量,Lk为第k层特征抽象隐含层的神经元数目,θ(bk+wky)为激活函数,
    Figure PCTCN2017091311-appb-100010
    为该特征抽象隐含层神经元偏置向量,
    Figure PCTCN2017091311-appb-100011
    为权值矩阵,
    Figure PCTCN2017091311-appb-100012
    为该特征抽象隐含层输入向量;基于所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵训练得到视频帧重构模型f(yi;ω),其中,ω是所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵的参数集合,yi为经所述帧碎片输入层输入的帧碎片,f(yi;ω)为由多个特征抽象隐含层对经所述帧碎片输入层输入的帧碎片进行特征抽象,建立起来的帧碎片到帧碎片块之间的非线性映射。
  11. 如权利要求9所述的视频压缩感知重构***,其特征在于,所述视频帧重构模型包括一个帧碎片输入层、一个帧碎片块输出层和k个特征抽象隐含层,k为大于1的自然数,每一个特征抽象隐含层包含如下公式:
    hk(y)=θ(bk+wky),
    其中,
    Figure PCTCN2017091311-appb-100013
    为该特征抽象隐含层激活值向量,Lk为第k层特征抽象隐含层的神经元数目,θ(bk+wky)为激活函数,
    Figure PCTCN2017091311-appb-100014
    为该特征抽象隐含层神经元偏置向量,
    Figure PCTCN2017091311-appb-100015
    为权值矩阵,
    Figure PCTCN2017091311-appb-100016
    为该特征抽象隐含层输入向量;基于所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵训练得到视频帧重构模型f(yi;ω),其中,ω是所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵的参数集合,yi为经所述帧碎片输入层输入的帧碎片,f(yi;ω)为由多个特征抽象隐含层对经所述帧碎片输入层输入的帧碎片进行特征抽象,建立起来的帧碎片到帧碎片块之间的非线性映射。
  12. 如权利要求8所述的视频压缩感知重构***,其特征在于,所述预先确定的提取规则为:
    对待重构的压缩视频帧进行块分割,将所述待重构的压缩视频帧分成若干帧碎片。
  13. 如权利要求9所述的视频压缩感知重构***,其特征在于,所述预先确定的提取规则为:
    对待重构的压缩视频帧进行块分割,将所述待重构的压缩视频帧分成若干帧碎片。
  14. 如权利要求9所述的视频压缩感知重构***,其特征在于,所述创建模块还包括用于生成训练数据和测试数据的生成单元,所述生成单元用于:
    获取预设数量的不同种类的自然场景下的视频,并将获取的各个视频转换到灰度颜色空间;将转换后的各个视频通过预定义的度量转换矩阵进行压缩;将所有压缩后的视频按照预设比例分成第一数据集和第二数据集,将第一数据集作为训练集,第二数据集作为测试集。
  15. 一种电子装置,其特征在于,包括处理设备及存储设备;该存储设备中存储有视频压缩感知重构程序,其包括至少一个计算机可读指令,该至少一个计算机可读指令可被所述处理设备执行,以实现以下操作:
    在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;
    将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
    由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
  16. 如权利要求15所述的电子装置,其特征在于,所述至少一个计算机可读指令还可被所述处理设备执行,以实现以下操作:
    创建并训练视频帧重构模型,所述视频帧重构模型包括至少一个帧碎片输入层、至少一个帧碎片块输出层和多个特征抽象隐含层。
  17. 如权利要求15所述的电子装置,其特征在于,所述视频帧重构模型包括一个帧碎片输入层、一个帧碎片块输出层和k个特征抽象隐含层,k为大于1的自然数,每一个特征抽象隐含层包含如下公式:
    hk(y)=θ(bk+wky),
    其中,
    Figure PCTCN2017091311-appb-100017
    为该特征抽象隐含层激活值向量,Lk为第k层特征抽象隐含层的神经元数目,θ(bk+wky)为激活函数,
    Figure PCTCN2017091311-appb-100018
    为该特征抽象隐含层神经元偏置向量,
    Figure PCTCN2017091311-appb-100019
    为权值矩阵,
    Figure PCTCN2017091311-appb-100020
    为该特征抽象隐含层输入向量;基于所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵训练得到视频帧重构模型f(yi;ω),其中,ω是所述特征抽象隐含层激活值向量、神经元数目、激活函数、神经元偏置向量与权值矩阵的参数集合,yi为经所述帧碎片输入层输入的帧碎片,f(yi;ω)为由多个特征抽象隐含层对经所述帧碎片输入层输入的帧碎片进行特征抽象,建立起来的帧碎片到帧碎片块之间的非线性映射。
  18. 如权利要求15所述的电子装置,其特征在于,所述预先确定的提取规则为:
    对待重构的压缩视频帧进行块分割,将所述待重构的压缩视频帧分成若干帧碎片。
  19. 如权利要求16所述的电子装置,其特征在于,所述至少一个计算机可读指令还可被所述处理设备执行,以实现以下操作:
    获取预设数量的不同种类的自然场景下的视频,并将获取的各个视频转换到灰度颜色空间;
    将转换后的各个视频通过预定义的度量转换矩阵进行压缩;
    将所有压缩后的视频按照预设比例分成第一数据集和第二数据 集,将第一数据集作为训练集,第二数据集作为测试集。
  20. 一种计算机可读存储介质,其上存储有至少一个可被处理设备执行以实现以下操作的计算机可读指令:
    在收到待重构的压缩视频帧后,根据预先确定的提取规则提取出所述压缩视频帧的帧碎片;
    将提取的帧碎片输入经预先训练的视频帧重构模型的帧碎片输入层,由所述视频帧重构模型的多个特征抽象隐含层对输入的帧碎片进行特征抽象,建立帧碎片到帧碎片块之间的非线性映射;
    由所述视频帧重构模型的多个特征抽象隐含层根据建立的所述非线性映射将输入的帧碎片重构为帧碎片块,并由所述视频帧重构模型的帧碎片块输出层输出重构的帧碎片块,基于重构的帧碎片块生成重构的视频。
PCT/CN2017/091311 2016-12-30 2017-06-30 视频压缩感知重构方法、***、电子装置及存储介质 WO2018120723A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR1020187017256A KR102247907B1 (ko) 2016-12-30 2017-06-30 비디오 압축 감지 재구성 방법, 시스템, 전자장치 및 저장매체
SG11201808823PA SG11201808823PA (en) 2016-12-30 2017-06-30 Video compressed sensing reconstruction method, system, electronic device, and storage medium
AU2017389534A AU2017389534A1 (en) 2016-12-30 2017-06-30 Video compressed sensing reconstruction method, system, electronic device, and storage medium
EP17885721.5A EP3410714A4 (en) 2016-12-30 2017-06-30 METHOD AND SYSTEM FOR RECONSTRUCTING A VIDEO COMPRESSION SENSING AND ELECTRONIC DEVICE AND STORAGE MEDIUM
JP2018530728A JP6570155B2 (ja) 2016-12-30 2017-06-30 圧縮センシングによる映像再構成方法、システム、電子装置及び記憶媒体
US16/084,234 US10630995B2 (en) 2016-12-30 2017-06-30 Video compressed sensing reconstruction method, system, electronic device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611260793.6A CN106686385B (zh) 2016-12-30 2016-12-30 视频压缩感知重构方法及装置
CN201611260793.6 2016-12-30

Publications (1)

Publication Number Publication Date
WO2018120723A1 true WO2018120723A1 (zh) 2018-07-05

Family

ID=58848741

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/091311 WO2018120723A1 (zh) 2016-12-30 2017-06-30 视频压缩感知重构方法、***、电子装置及存储介质

Country Status (9)

Country Link
US (1) US10630995B2 (zh)
EP (1) EP3410714A4 (zh)
JP (1) JP6570155B2 (zh)
KR (1) KR102247907B1 (zh)
CN (1) CN106686385B (zh)
AU (1) AU2017389534A1 (zh)
SG (1) SG11201808823PA (zh)
TW (1) TWI664853B (zh)
WO (1) WO2018120723A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106686385B (zh) 2016-12-30 2018-09-25 平安科技(深圳)有限公司 视频压缩感知重构方法及装置
CN109451314B (zh) * 2018-04-23 2021-06-08 杭州电子科技大学 一种基于图模型的图像压缩感知方法
CN108986043B (zh) * 2018-06-26 2021-11-05 衡阳师范学院 一种基于自适应的块压缩感知图像重构方法
CN110704681B (zh) 2019-09-26 2023-03-24 三星电子(中国)研发中心 一种生成视频的方法及***
CN113382247B (zh) * 2021-06-09 2022-10-18 西安电子科技大学 基于间隔观测的视频压缩感知***及方法、设备及存储介质
CN113992920A (zh) * 2021-10-25 2022-01-28 北京大学深圳研究生院 一种基于深度展开网络的视频压缩感知重建方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6577700B1 (en) * 2001-06-22 2003-06-10 Liang-Shih Fan Neural network based multi-criteria optimization image reconstruction technique for imaging two- and three-phase flow systems using electrical capacitance tomography
CN104978612A (zh) * 2015-01-27 2015-10-14 厦门大学 基于ahp-rbf的分布式大数据***风险预测方法
CN105163121A (zh) * 2015-08-24 2015-12-16 西安电子科技大学 基于深度自编码网络的大压缩比卫星遥感图像压缩方法
CN105405054A (zh) * 2015-12-11 2016-03-16 平安科技(深圳)有限公司 基于理赔照片深度学习实现保险理赔反欺诈的方法及服务器
CN105868769A (zh) * 2015-01-23 2016-08-17 阿里巴巴集团控股有限公司 图像中的人脸关键点定位方法及装置
CN106204447A (zh) * 2016-06-30 2016-12-07 北京大学 基于总变差分和卷积神经网络的超分辨率重建方法
CN106686385A (zh) * 2016-12-30 2017-05-17 平安科技(深圳)有限公司 视频压缩感知重构方法及装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1317673C (zh) 2004-03-18 2007-05-23 致伸科技股份有限公司 利用神经网络分辨影像中文字与图形的***及其方法
WO2011156250A1 (en) * 2010-06-07 2011-12-15 Thomson Licensing Learned transform and compressive sensing for video coding
WO2014194482A1 (zh) * 2013-06-05 2014-12-11 中国科学院微电子研究所 基于压缩感知理论的气体识别方法
TW201520905A (zh) 2013-11-28 2015-06-01 Nat Univ Chin Yi Technology 字元影像辨識方法與辨識裝置
US20160050440A1 (en) * 2014-08-15 2016-02-18 Ying Liu Low-complexity depth map encoder with quad-tree partitioned compressed sensing
CN105992009A (zh) * 2015-02-05 2016-10-05 袁琳琳 基于运动补偿和分块的视频压缩感知的处理方法
GB2539845B (en) 2015-02-19 2017-07-12 Magic Pony Tech Ltd Offline training of hierarchical algorithms
EP3271863B1 (en) * 2015-03-20 2021-07-28 Fraunhofer Gesellschaft zur Förderung der angewandten Forschung E.V. Relevance score assignment for artificial neural network
US20160358075A1 (en) * 2015-06-08 2016-12-08 The Regents Of The University Of Michigan System for implementing a sparse coding algorithm
CN105740950B (zh) 2016-01-19 2019-03-29 南京邮电大学 基于滑齿法的神经网络的模板匹配方法
US10499056B2 (en) * 2016-03-09 2019-12-03 Sony Corporation System and method for video processing based on quantization parameter

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6577700B1 (en) * 2001-06-22 2003-06-10 Liang-Shih Fan Neural network based multi-criteria optimization image reconstruction technique for imaging two- and three-phase flow systems using electrical capacitance tomography
CN105868769A (zh) * 2015-01-23 2016-08-17 阿里巴巴集团控股有限公司 图像中的人脸关键点定位方法及装置
CN104978612A (zh) * 2015-01-27 2015-10-14 厦门大学 基于ahp-rbf的分布式大数据***风险预测方法
CN105163121A (zh) * 2015-08-24 2015-12-16 西安电子科技大学 基于深度自编码网络的大压缩比卫星遥感图像压缩方法
CN105405054A (zh) * 2015-12-11 2016-03-16 平安科技(深圳)有限公司 基于理赔照片深度学习实现保险理赔反欺诈的方法及服务器
CN106204447A (zh) * 2016-06-30 2016-12-07 北京大学 基于总变差分和卷积神经网络的超分辨率重建方法
CN106686385A (zh) * 2016-12-30 2017-05-17 平安科技(深圳)有限公司 视频压缩感知重构方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3410714A4 *

Also Published As

Publication number Publication date
TWI664853B (zh) 2019-07-01
JP2019511850A (ja) 2019-04-25
EP3410714A1 (en) 2018-12-05
KR20190019894A (ko) 2019-02-27
JP6570155B2 (ja) 2019-09-04
CN106686385B (zh) 2018-09-25
AU2017389534A1 (en) 2018-10-04
CN106686385A (zh) 2017-05-17
US10630995B2 (en) 2020-04-21
SG11201808823PA (en) 2018-11-29
TW201841504A (zh) 2018-11-16
KR102247907B1 (ko) 2021-05-04
US20190075309A1 (en) 2019-03-07
EP3410714A4 (en) 2019-11-06

Similar Documents

Publication Publication Date Title
WO2018120723A1 (zh) 视频压缩感知重构方法、***、电子装置及存储介质
CN112101190B (zh) 一种遥感图像分类方法、存储介质及计算设备
CN108182394B (zh) 卷积神经网络的训练方法、人脸识别方法及装置
CN110462639B (zh) 信息处理设备、信息处理方法及计算机可读存储介质
WO2021115356A1 (zh) 自适应窗宽窗位调节方法、装置、计算机***及存储介质
CN106897746B (zh) 数据分类模型训练方法和装置
CN112418292B (zh) 一种图像质量评价的方法、装置、计算机设备及存储介质
CN107292352B (zh) 基于卷积神经网络的图像分类方法和装置
US11514694B2 (en) Teaching GAN (generative adversarial networks) to generate per-pixel annotation
US10832034B2 (en) Facial image generating method, facial image generating apparatus, and facial image generating device
JP2015215876A (ja) ライブネス検査方法と装置、及び映像処理方法と装置
CN110598717B (zh) 图像特征的提取方法、装置及电子设备
WO2021042857A1 (zh) 图像分割模型的处理方法和处理装置
CN111832437A (zh) 建筑图纸识别方法、电子设备及相关产品
CN113505797B (zh) 模型训练方法、装置、计算机设备和存储介质
CN111105017A (zh) 神经网络量化方法、装置及电子设备
CN111126347B (zh) 人眼状态识别方法、装置、终端及可读存储介质
CN111914908A (zh) 一种图像识别模型训练方法、图像识别方法及相关设备
US20230021551A1 (en) Using training images and scaled training images to train an image segmentation model
CN112801107A (zh) 一种图像分割方法和电子设备
CN114299304A (zh) 一种图像处理方法及相关设备
KR20200131663A (ko) 영상 처리 장치 및 그 동작방법
CN113011532A (zh) 分类模型训练方法、装置、计算设备及存储介质
CN111445383A (zh) 影像参数的调节方法、装置及***
CN110659561A (zh) 互联网暴恐视频识别模型的优化方法及装置

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018530728

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20187017256

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2017885721

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017885721

Country of ref document: EP

Effective date: 20180829

ENP Entry into the national phase

Ref document number: 2017389534

Country of ref document: AU

Date of ref document: 20170630

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17885721

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE