WO2024027635A1 - 视频传输方法、电子设备及计算机存储介质 - Google Patents

视频传输方法、电子设备及计算机存储介质 Download PDF

Info

Publication number
WO2024027635A1
WO2024027635A1 PCT/CN2023/110191 CN2023110191W WO2024027635A1 WO 2024027635 A1 WO2024027635 A1 WO 2024027635A1 CN 2023110191 W CN2023110191 W CN 2023110191W WO 2024027635 A1 WO2024027635 A1 WO 2024027635A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
decoding
encoding
layer
initial
Prior art date
Application number
PCT/CN2023/110191
Other languages
English (en)
French (fr)
Inventor
刘伟伟
徐科
孔德辉
任聪
杨维
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2024027635A1 publication Critical patent/WO2024027635A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder

Definitions

  • the present disclosure relates to the technical field of image communication, and specifically to a video transmission method, an electronic device and a computer storage medium.
  • the operation method of traditional video encoding and decoding technology first transforms the prediction residual data, then quantizes the transformed data, and finally performs inverse quantization and inverse transformation on the data.
  • problems such as data sparseness. It has many disadvantages such as low degree of optimization, large amount of calculation, and long coding time.
  • the present disclosure provides a video transmission method, an electronic device, and a computer storage medium.
  • a video transmission method includes: using the final encoding network in the pre-trained autoencoding network to encode the prediction residual data of the video frame to be transmitted to obtain video features.
  • the self-encoding network includes the final encoding network and the final decoding network; and transmit the video feature data to the decoding end device.
  • a video transmission method includes: receiving video feature data transmitted by an encoding end device, wherein the video feature data is obtained by the encoding end device using pre-training.
  • the final encoding network in the autoencoding network is obtained by encoding the prediction residual data of the video frame to be transmitted; the final decoding network in the autoencoding network is used to decode the video feature data to obtain the decoded Prediction residual data; and determining a post-transmission video frame based on the decoded prediction residual data.
  • an electronic device including: one or more processors; and a storage device having one or more programs stored thereon; wherein, when the one or more programs are When the one or more processors execute, the one or more processors implement the video transmission method as described above.
  • a computer storage medium is provided with a computer program stored thereon, the program When the program is executed, the video transmission method as described above is implemented.
  • Figure 1 is a schematic flowchart of a video transmission method applied to an encoding end device according to an embodiment of the present disclosure
  • Figure 2 is a schematic flowchart 1 of obtaining the autoencoding network through training according to an embodiment of the present disclosure
  • Figure 3 is a schematic flowchart 2 of obtaining the autoencoding network through training according to an embodiment of the present disclosure
  • Figure 4 is a schematic flowchart of a video transmission method applied to a decoding end device according to an embodiment of the present disclosure
  • Figure 5 is a schematic diagram of the principle of a video transmission method according to an embodiment of the present disclosure.
  • Figure 6 is a schematic diagram of an autoencoding network according to an embodiment of the present disclosure.
  • Figure 7 is a schematic diagram of another autoencoding network according to an embodiment of the present disclosure.
  • Embodiments described herein may be described with reference to plan and/or cross-sectional illustrations, with the aid of idealized schematic illustrations of the present disclosure. Accordingly, example illustrations may be modified based on manufacturing techniques and/or tolerances. Therefore, the embodiments are not limited to those shown in the drawings but include modifications of configurations formed based on the manufacturing process. Accordingly, the regions illustrated in the figures are of a schematic nature and the shapes of the regions shown in the figures are illustrative of the specific shapes of regions of the element and are not intended to be limiting.
  • the encoding end device In order to transmit high-resolution video using traditional video coding and decoding technology, the encoding end device usually transforms and quantizes the video frame first, and then the decoding end device performs inverse quantization and inverse transformation on the data.
  • the decoding end device performs inverse quantization and inverse transformation on the data.
  • problems in this process There are many disadvantages such as the contradiction between reducing the code rate and improving the accuracy, large amount of calculation, and long encoding and decoding time.
  • discrete cosine transformation, sine transformation and inverse transformation are often performed through orthogonal matrices. This in itself does not cause a loss of data accuracy, but it does improve the energy concentration of the data. The effect is limited in terms of distribution, that is, there is still room for improvement in data sparsity.
  • Quantization is a process of mapping continuous values to a limited number of discrete amplitudes. This process can improve the degree of data compression, but excessive quantization amplitude will lead to an increase in data loss, so there is a reduction The contradiction between code rate and improved accuracy.
  • embodiments of the present disclosure propose that the prediction residual data of sample video frames can be first used to train to obtain an autoencoding network including a final encoding network and a final decoding network that meets the code rate and accuracy requirements, and then the encoding end device
  • the final encoding network and the final decoding network are used by the final encoding network and the final decoding network to process the video data respectively.
  • it has the advantages of reducing the code rate and improving the accuracy, small amount of calculation, and high encoding and decoding time consumption. Short time and other advantages.
  • an embodiment of the present disclosure provides a video transmission method, which may include the following steps S11 and S12.
  • step S11 the final encoding network in the pre-trained autoencoding network is used to encode the prediction residual data of the video frame to be transmitted to obtain the video feature data.
  • the autoencoding network includes the final encoding network and the final decoding network.
  • step S12 the video feature data is transmitted to the decoding end device.
  • the prediction residual data is the difference between the actual value and the predicted value.
  • the actual value refers to the actual pixel value of the video frame to be transmitted
  • the predicted value refers to the pixel value of the video frame to be transmitted using the encoded video frame.
  • the predicted pixel value obtained by prediction. Determining the prediction residual data of a video frame belongs to the existing technology and will not be described again here.
  • steps S11 and S12 are executed by the encoding end device.
  • the step of pre-training to obtain the self-encoding network can be executed by the encoding end device, the decoding end device, or other than the encoding end device. and other devices other than the decoding end device to perform.
  • the encoding-side device holds at least the final encoding network in the self-encoding network
  • the decoding-side device holds at least the final decoding network in the self-encoding network.
  • the decoding end device After transmitting the video feature data to the decoding end device, the decoding end device uses the final decoding network in the self-encoding network to decode the video feature data to obtain the decoded prediction residual data.
  • the decoded prediction residual data determines the transmitted video frame.
  • the video frame after transmission refers to the video frame restored by the decoding end device, which is consistent with the video frame to be transmitted, or has a very small difference from the video frame to be transmitted. That is to say, the encoding end device transmits the encoded video frames to be transmitted to the decoding end device, instead of directly transmitting the video frames to be transmitted to the decoding end device.
  • the decoding end device decodes the video feature data to obtain the restoration.
  • the video frame to be transmitted is the video frame after transmission, rather than the video frame to be transmitted directly received from the encoding end device.
  • the encoding end device encodes the prediction residual data of the video frame to be transmitted by using the final encoding network in the self-encoding network obtained by pre-training. Process the video feature data to obtain the video feature data, and transmit the video feature data to the decoding end device, so that the decoding end device uses the final decoding network in the self-encoding network to decode the video feature data to obtain the transmitted video frame.
  • the video transmission method can not only increase the video transmission bit rate but also improve the accuracy of video frame encoding and decoding. It also has the advantages of small calculation amount, low computational complexity and short encoding and decoding time. .
  • the initial network can be initialized and adjusted to obtain an intermediate network, and then the prediction residual data of sample video frames can be used to train the intermediate network to obtain an autoencoding network.
  • the method further includes steps S21 to S22 of training (ie, the above-mentioned pre-training) to obtain the autoencoding network.
  • step S21 the initial network is adjusted according to the transmission scene parameters of the sample video frame to obtain an intermediate network.
  • step S22 the intermediate network is trained according to the prediction residual data of the sample video frame to obtain the autoencoding network; wherein the transmission scene parameters of the sample video frame are the same as the transmission scene parameters of the video frame to be transmitted. .
  • different autoencoding networks can be trained according to different transmission scene parameters. That is to say, the autoencoding network used when transmitting video frames to be transmitted with certain transmission scene parameters is trained based on sample video frames that also have such transmission scene parameters; while the transmission has different transmission scene parameters The video frames to be transmitted use different autoencoding networks.
  • the intermediate network is obtained by optimizing the initial network based on the transmission scene parameters of the sample video frame.
  • the autoencoding network is obtained by optimizing the intermediate network based on the prediction residual data of the sample video frame.
  • the initial network, intermediate network and autoencoding The network can include two parts of the network: the network used for encoding processing and the network used for decoding processing.
  • the initial network includes an initial encoding network and an initial decoding network
  • the intermediate network includes an intermediate encoding network and an intermediate decoding network.
  • training the intermediate network based on the prediction residual data of the sample video frame to obtain the autoencoding network may include the following steps S221 to S223.
  • step S221 the prediction residual data of the sample video frame is input into the intermediate coding network to obtain intermediate feature data output by the intermediate coding network.
  • step S222 the intermediate feature data is input into the intermediate decoding network to obtain intermediate prediction residual data output by the intermediate decoding network.
  • step S223 when the prediction residual data, the intermediate feature data and the intermediate prediction residual data of the sample video frame do not meet the preset optimization stop conditions, adjust the parameters of the intermediate network, Until the prediction residual data of the sample video frame and the adjusted intermediate feature data and intermediate prediction residual data meet the preset optimization stop conditions.
  • step S223 does not need to be performed, and the intermediate network can be directly used as an autoencoding network.
  • the prediction residual data, the intermediate feature data and the intermediate prediction residual data of the sample video frame do not meet the preset optimization stop conditions, adjust the parameters of the intermediate network, and then adjust the parameters of the intermediate network.
  • steps S221 and S222 that is, input the prediction residual data of the sample video frame into the intermediate coding network after adjusting the parameters, obtain the intermediate prediction residual data output by the intermediate coding network after adjusting the parameters, and convert the intermediate prediction residual data into the intermediate coding network after adjusting the parameters.
  • the difference data is input into the intermediate decoding network after adjusting the parameters, and the intermediate prediction residual data output by the intermediate decoding network after adjusting the parameters is obtained.
  • the intermediate network is determined to be an autoencoding network. If it is not satisfied, continue to adjust the parameters of the intermediate network and perform steps S221 and S222 again until the prediction residual data of the sample video frame and the adjusted intermediate feature data and intermediate prediction residual are obtained. The difference data meets the preset optimization stop conditions.
  • Meeting the preset optimization stop conditions can include two aspects.
  • the intermediate encoding network at this time has reduced the code rate to a low enough level. Specifically, it can be reflected that the dimension of the output data of the intermediate encoding network is less than the preset dimension threshold; the other is that the intermediate encoding network has reduced the code rate to a sufficiently low level.
  • the intermediate decoding network at this time has been able to restore the video frame to be transmitted to a certain extent, that is, the restoration degree of the intermediate decoding network is high enough. Specifically, it can be reflected in the prediction residual data and intermediate prediction residual of the sample video frame. The average variance between the difference data is less than the preset variance threshold.
  • the average variance between the prediction residual data of the sample video frame and the intermediate prediction residual data is used as the loss function.
  • the prediction residual data of the sample video frame, the intermediate feature data and the The intermediate prediction residual data satisfying the preset optimization stop condition may include: the data dimension of the intermediate feature data is less than the preset dimension threshold, and the prediction residual data of the sample video frame and The average variance between the intermediate prediction residual data is less than a preset variance threshold.
  • the intermediate feature data is usually in the form of a feature vector, and its data dimension refers to the number of elements in the vector.
  • the preset optimization stop conditions it means that the self-encoding network has been able to compress the code rate of the video frame to be transmitted to a sufficiently low level, and can also obtain it based on the intermediate feature data (compared to the prediction residual data of the video frame to be transmitted). In terms of) prediction residual data with a sufficiently high degree of recovery.
  • the initial encoding network includes an initial encoding input layer, an initial encoding output layer, and a plurality of initial encoding intermediate layers
  • the initial decoding network includes an initial decoding input layer, a plurality of initial decoding intermediate layers, and an initial decoding output. layer and a plurality of initial decoding intermediate layers
  • the intermediate coding network includes an intermediate coding input layer, an intermediate coding output layer and a plurality of intermediate coding intermediate layers
  • the intermediate decoding network includes an intermediate decoding input layer, an intermediate decoding output layer and a plurality of intermediate coding layers.
  • the final encoding network includes a final encoding input layer, a final encoding output layer and a plurality of final encoding intermediate layers
  • the final decoding network includes a final decoding input layer, a final decoding output layer and a plurality of final decoding intermediate layers .
  • the initial encoding intermediate layer, the initial decoding intermediate layer, the intermediate encoding intermediate layer, the intermediate decoding intermediate layer, the final encoding intermediate layer, and the final decoding intermediate layer are all of a type It includes the following three types: convolution layer, pooling layer and activation layer.
  • the type of the initial encoding output layer, the type of the initial decoding input layer, the type of the intermediate encoding output layer, the type of the intermediate decoding input layer, the type of the final encoding output layer And the type of the final decoding input layer is a fully connected layer.
  • the type of the initial encoding input layer, the type of the initial decoding output layer, the type of the intermediate encoding input layer, the type of the intermediate decoding output layer, the type of the final encoding input layer and the type of the final decoded output layer are all the same.
  • adjusting the parameters of the intermediate network may include the following steps: adjusting the number of the intermediate coding intermediate layers and the number of the intermediate decoding intermediate layers, and adjusting the number of neurons in the intermediate coding intermediate layer and the number of neurons in the intermediate decoding intermediate layer.
  • the intermediate network includes an intermediate encoding network and an intermediate decoding network.
  • the intermediate encoding network includes multiple intermediate encoding intermediate layers.
  • the intermediate decoding network also includes multiple intermediate decoding intermediate layers.
  • the type of intermediate encoding intermediate layer and the intermediate decoding intermediate layer are also included.
  • the types of layers include convolutional layers, pooling layers and activation layers.
  • the scene category identifier is used to characterize the application scenario of the video frame.
  • the application scenario may include environmental monitoring, video browsing services, video communication, etc.
  • Transmission bandwidth refers to the rated bandwidth when transmitting video frames. You can pre-configure the mapping relationship between transmission scenario parameters and initial network parameters, and adjust the initial network according to different transmission scenario parameters to obtain to different intermediary networks.
  • the parameters of the initial network refer to the number of initial encoding intermediate layers, the number of initial decoding intermediate layers, and the dimensions of the output data of the initial encoding output layer.
  • the dimension of the output data of the initial coding output layer refers to the dimension of the initial feature data output by the initial coding output layer.
  • the traditional video encoding and decoding process includes multiple sub-processes.
  • the embodiments of the present disclosure replace the entire traditional video encoding and decoding process by using an auto-encoding network, which is also beneficial to simplifying the deployment process of hardware and software.
  • traditional video coding and decoding technologies that is, traditional transformation processing, quantization processing, inverse quantization processing and inverse transformation processing, can also be used to process the prediction residual data of the sample video frame to obtain
  • the coding rate of the autoencoding network i.e., the dimension of the final encoding network output data
  • the coding rate of traditional video encoding and decoding technology i.e., the dimension of the data after transformation and quantization
  • step S31 video feature data transmitted by the encoding end device is received, where the video feature data is the final encoding network in the self-encoding network obtained by the encoding end device using pre-training, and the prediction residual of the video frame to be transmitted The data is obtained by encoding.
  • step S32 the final decoding network in the auto-encoding network is used to decode the video feature data to obtain decoded prediction residual data.
  • step S33 the transmitted video frame is determined according to the decoded prediction residual data.
  • the decoding end device receives the video feature data transmitted by the encoding end device; wherein the video feature data is obtained by the encoding end device using pre-training
  • the final encoding network in the obtained auto-encoding network is obtained by encoding the prediction residual data of the video frame to be transmitted; the decoding end device uses the final decoding network in the auto-encoding network to decode the video feature data. , get the decoded prediction residual data.
  • this video transmission method can not only increase the video transmission bit rate but also improve the video frame encoding and decoding accuracy. It also has the advantages of small calculation amount, low computational complexity and short encoding and decoding time. The advantages.
  • the final encoding network in the auto-encoding network is equivalent to a transformation/quantization network, and the encoding end device inputs the prediction residual data block of the video frame to be transmitted
  • This transformation/quantization network can output an entropy encoded code stream (ie, video feature data).
  • the final decoding network in the autoencoding network is equivalent to an inverse transformation/inverse quantization network.
  • the decoding end device can output decoded data (i.e. decoded prediction residual data).
  • Figure 6 is a schematic diagram of an auto-encoding network according to an embodiment of the present disclosure, in which the input layer (Input layer) of the final encoding network (encoder) inputs the prediction residual data of the video frame to be transmitted, with x 1 , x 2 ...
  • the input layer is shown as the same layer, and a 1 , a 2 , and a 3 represent the video feature data.
  • the final output layer (Onput layer) of the decoder network outputs the decoded prediction residual data to express.
  • the convolutional layer, pooling layer and activation layer included in the final encoding intermediate layer and the convolutional layer, pooling layer and activation layer included in the final decoding intermediate layer are not shown in the figures.
  • Figure 7 is a schematic diagram of another autoencoding network according to an embodiment of the present disclosure.
  • the figure shows the input layer (Input) of the final encoding network, the output layer (Onput) of the final decoding network, and the convolutional layer of the final encoding network.
  • Conv the pooling layer of the final encoding network (Pool), the activation layer of the final encoding network (Active), the convolutional layer of the final decoding network (Conv), the activation layer of the final decoding network (Active), the final encoding network's Sampling layer (Upsampling), final encoding output layer and final decoding input layer (FC, fully connected layer).
  • Both the final encoding network intermediate layer and the final decoding network intermediate layer include N unit structures.
  • the autoencoding network in the embodiment of the present disclosure can also be replaced by other deep networks such as GAN (Generative Adversarial Net, Generative Adversarial Network), which also uses deep networks to replace transformation processing, quantization processing, and inverse transformation.
  • GAN Geneative Adversarial Net, Generative Adversarial Network
  • the entire video encoding and decoding process is processed and inversely quantized, and the method of training the deep network and the method of training the autoencoder network can also be the same.
  • Convolution convolution
  • Deconvolution deconvolution
  • ReLU activation
  • Sigmoid prediction and output function of logistic regression
  • Full-Connection full connection
  • Reshape a function that can re-adjust the number of rows, columns, and dimensions of the matrix
  • embodiments of the present disclosure also provide an electronic device, which may include: one or more processors; a storage device on which one or more programs are stored; wherein, when the one or more programs are When the one or more processors execute, the one or more processors implement the video transmission method as described above.
  • embodiments of the present disclosure also provide a computer storage medium on which a computer program is stored, wherein when the program is executed, the video transmission method as described above is implemented.
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
  • Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a general illustrative sense only and not for purpose of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be used in conjunction with other embodiments, unless expressly stated otherwise. Features and/or components are used in combination. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开提供一种视频传输方法,所述方法包括:利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到视频特征数据,所述自编码网络包括所述最终编码网络和最终解码网络;将所述视频特征数据传输至解码端设备。既能够提高视频传输码率又能够提升视频帧编解码精度,还具有计算量小、计算复杂度低以及编解码时间耗时短的优势。本公开还提供一种电子设备及计算机存储介质。

Description

视频传输方法、电子设备及计算机存储介质
相关申请的交叉引用
本申请要求于2022年8月1日提交的名称为“视频传输方法、电子设备及计算机存储介质”的中国专利申请CN202210914809.X的优先权,其全部内容通过引用并入本文。
技术领域
本公开涉及图像通信技术领域,具体涉及一种视频传输方法、一种电子设备及一种计算机存储介质。
背景技术
随着视频技术及显示硬件性能的提升,高分辨率成为未来的视频技术发展方向。为了高效快速地传输高分辨率视频,传统视频编解码技术通常会对视频数据进行高效的帧内帧间预测以得到预测残差数据,为了降低视频编码的数据量也即为了降低数据冗余,进一步的会对预测残差数据进行离散余弦或正弦变换,以提高数据的能量分布、从而达到数据稀疏化的目的,然后对变换后的数据进行量化处理,以有效地减少数据的取值空间、进而获得更好的压缩效果,最后对数据进行反量化与反变换处理,以达到对数据的恢复。
但是,传统视频编解码技术的这种先对预测残差数据进行变换处理、再对变换后的数据进行量化处理、最后对数据进行反量化处理和反变换处理的操作方式,存在着如数据稀疏化程度不高、计算量大、编码时间长等诸多弊端。
发明内容
本公开针对现有技术中存在的上述不足,提供一种视频传输方法、一种电子设备及一种计算机存储介质。
在本公开的一方面中,提供了一种视频传输方法,所述方法包括:利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到视频特征数据,所述自编码网络包括所述最终编码网络和最终解码网络;以及,将所述视频特征数据传输至解码端设备。
在本公开的另一方面中,提供了一种视频传输方法,所述方法包括:接收编码端设备传输的视频特征数据,其中,所述视频特征数据是所述编码端设备利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到的;利用所述自编码网络中的最终解码网络,对所述视频特征数据进行解码处理,得到解码后的预测残差数据;以及,根据所述解码后的预测残差数据确定传输后视频帧。
在本公开的又一方面中,提供了一种电子设备,包括:一个或多个处理器;以及存储装置,其上存储有一个或多个程序;其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如前所述的视频传输方法。
在本公开的再一方面中,提供了一种计算机存储介质,其上存储有计算机程序,所述程 序被执行时实现如前所述的视频传输方法。
附图说明
图1是根据本公开实施例的应用于编码端设备的视频传输方法的流程示意图;
图2是根据本公开实施例的训练得到所述自编码网络的流程示意图一;
图3是根据本公开实施例的训练得到所述自编码网络的流程示意图二;
图4是根据本公开实施例的应用于解码端设备的视频传输方法的流程示意图;
图5是根据本公开实施例的视频传输方法的原理示意图;
图6是根据本公开实施例的一种自编码网络示意图;
图7是根据本公开实施例的另一种自编码网络示意图。
具体实施方式
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。反之,提供这些实施例的目的在于使本公开透彻和完整,并将使本领域技术人员充分理解本公开的范围。
如本文所使用的,术语“和/或”包括一个或多个相关列举条目的任何和所有组合。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、元件和/或组件,但不排除存在或添加一个或多个其他特征、整体、步骤、操作、元件、组件和/或其群组。
本文所述实施例可借助本公开的理想示意图而参考平面图和/或截面图进行描述。因此,可根据制造技术和/或容限来修改示例图示。因此,实施例不限于附图中所示的实施例,而是包括基于制造工艺而形成的配置的修改。因此,附图中例示的区具有示意性属性,并且图中所示区的形状例示了元件的区的具体形状,但并不旨在是限制性的。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
传统的视频编解码技术为了传输高分辨率视频,通常先由编码端设备对视频帧进行变换处理和量化处理再由解码端设备对数据进行反量化处理和反变换处理,但是这一过程存在着降低码率与提升精度之间互相矛盾、计算量大、编解码耗时长等诸多弊端。
具体来说,在上述变换处理和量化处理的过程中,离散余弦变换、正弦变换和反变换往往通过正交矩阵进行,这本身并不会带来数据精度的损失,但对于提升数据的能量集中分布而言作用是有限的,即数据的稀疏化仍有可提升空间。量化是一种将连续的取值映射至有限的多个离散幅值的过程,此过程可以提升对数据的压缩程度,但量化幅值过大却会导致数据损失程度的增加,因此存在着降低码率与提升精度之间的矛盾。目前通过采用率失真优化的量化手段来解决降低码率与提升精度之间的矛盾,编码性能够得到提升,但是这种量化手段需要遍历多个可选量化值及率失真代价,计算量大,不利于硬件的快速实现,存在编码时间 延长等问题。
有鉴于此,本公开实施例提出,可以先利用样本视频帧的预测残差数据来训练得到包括最终编码网络和最终解码网络的、符合码率和精度要求的自编码网络,再由编码端设备和解码端设备分别利用最终编码网络和最终解码网络对视频数据进行处理,相较于传统的视频编解码技术而言,具有既能降低码率又能提升精度、计算量小、编解码时间耗时短等优势。
相应的,如图1所示,本公开实施例提供一种视频传输方法,所述方法可以包括如下步骤S11和S12。
在步骤S11中,利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到视频特征数据,所述自编码网络包括所述最终编码网络和最终解码网络。
在步骤S12中,将所述视频特征数据传输至解码端设备。
其中,预测残差数据为实际值与预测值之间的差值,实际值指的是待传输视频帧的实际像素值,预测值指的是利用已编码视频帧对待传输视频帧的像素值进行预测而得到的预测像素值。确定视频帧的预测残差数据属于现有技术,此处不再赘述。
在本公开实施例中,步骤S11以及步骤S12由编码端设备执行,预先训练得到自编码网络的步骤则既可以由编码端设备执行,也可以由解码端设备执行,还可以由除编码端设备和解码端设备之外的其他设备来执行。但是,在训练得到自编码网络后,编码端设备至少持有自编码网络中的最终编码网络,解码端设备至少持有自编码网络中的最终解码网络。
在将所述视频特征数据传输至解码端设备之后,解码端设备利用所述自编码网络中的最终解码网络,对所述视频特征数据进行解码处理,得到解码后的预测残差数据,根据所述解码后的预测残差数据确定传输后视频帧。传输后视频帧指的是解码端设备复原得到的视频帧,与待传输视频帧一致,或者与待传输视频帧之间的差别很小。也就是说,编码端设备向解码端设备传输的是经过编码的待传输视频帧,而不是直接向解码端设备传输待传输视频帧,解码端设备是对视频特征数据进行解码等处理才得到复原的待传输视频帧即传输后视频帧,而不是直接接收到编码端设备传输的待传输视频帧。
从上述步骤S11和S12可以看出,采用本公开实施例提供的视频传输方法,编码端设备通过利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到视频特征数据,将所述视频特征数据传输至解码端设备,以供解码端设备利用自编码网络中的最终解码网络来对视频特征数据进行解码等处理以得到传输后视频帧,这种视频传输方式相较于传统的视频编解码技术而言,既能够提高视频传输码率又能够提升视频帧编解码精度,还具有计算量小、计算复杂度低以及编解码时间耗时短的优势。
在预先训练得到自编码网络的过程中,可以先对初始网络进行初始化调整得到中间网络,再利用样本视频帧的预测残差数据来训练中间网络以得到自编码网络。相应的,在一些实施例中,如图2所示,所述方法还包括训练(即上述预先训练)得到所述自编码网络的步骤S21至S22。
在步骤S21中,根据样本视频帧的传输场景参数对初始网络进行调整,得到中间网络。
在步骤S22中,根据样本视频帧的预测残差数据训练所述中间网络,得到所述自编码网络;其中,所述样本视频帧的传输场景参数与所述待传输视频帧的传输场景参数相同。
在本公开实施例中,可以根据传输场景参数的不同,训练得到不同的自编码网络。也就 是说,在传输具有某种传输场景参数的待传输视频帧时所采用的自编码网络,是根据同样具有这种传输场景参数的样本视频帧来训练得到的;而传输具有不同的传输场景参数的待传输视频帧,采用的自编码网络也不同。
中间网络是根据样本视频帧的传输场景参数对初始网络进行优化后得到的,自编码网络是根据样本视频帧的预测残差数据对中间网络进行优化后得到的,初始网络、中间网络以及自编码网络,均可以包括两部分网络:用于编码处理的网络、用于解码处理的网络。相应的,在一些实施例中,所述初始网络包括初始编码网络和初始解码网络,所述中间网络包括中间编码网络和中间解码网络。如图3所示,所述根据样本视频帧的预测残差数据训练所述中间网络,得到所述自编码网络(即步骤S22)可以包括如下步骤S221至S223。
在步骤S221中,将所述样本视频帧的预测残差数据输入所述中间编码网络,得到所述中间编码网络输出的中间特征数据。
在步骤S222中,将所述中间特征数据输入所述中间解码网络,得到所述中间解码网络输出的中间预测残差数据。
在步骤S223中,在所述样本视频帧的预测残差数据、所述中间特征数据以及所述中间预测残差数据不满足预设的优化停止条件的情况下,调整所述中间网络的参数,直至所述样本视频帧的预测残差数据以及调整后所得到的中间特征数据和中间预测残差数据满足预设的优化停止条件。
应当理解的是,若首次将样本视频帧的预测残差数据输入中间编码网络得到中间特征数据再将中间特征数据输入至中间解码网络得到中间预测残差数据之时,样本视频帧的预测残差数据、中间特征数据以及中间预测残差数据已经满足预设的优化停止条件,则可以不必执行步骤S223,并且可以直接将中间网络作为自编码网络。
在所述样本视频帧的预测残差数据、所述中间特征数据以及所述中间预测残差数据不满足预设的优化停止条件的情况下,调整所述中间网络的参数,在调整中间网络的参数之后,继续执行步骤S221和步骤S222,即将样本视频帧的预测残差数据输入调整参数后的中间编码网络,得到调整参数后的中间编码网络输出的中间预测残差数据,将该中间预测残差数据输入调整参数后的中间解码网络,得到调整参数后的中间解码网络输出的中间预测残差数据。并再次判断样本视频帧的预测残差数据以及在调整参数后所得到的中间特征数据和中间预测残差数据是否满足预设的优化停止条件,若满足则可以停止调整中间网络的参数,将当前的中间网络确定为自编码网络,若不满足则继续调整中间网络的参数并再次执行步骤S221和步骤S222,直至样本视频帧的预测残差数据以及调整后所得到的中间特征数据和中间预测残差数据满足预设的优化停止条件。
满足预设的优化停止条件可以包括两个方面,一方面是此时的中间编码网络已经将码率降低到足够低,具体的可以体现为中间编码网络输出数据的维度小于预设维度阈值;另一方面是此时的中间解码网络已经能够将待传输视频帧复原到一定程度,也就是中间解码网络的复原度已经足够高,具体的可以体现为样本视频帧的预测残差数据和中间预测残差数据之间的平均方差小于预设方差阈值,此时是将样本视频帧的预测残差数据和中间预测残差数据之间的平均方差作为损失函数,损失函数的值越小,说明复原度越高,当损失函数已经小于预设方差阈值时,说明复原度已经足够高。
相应的,在一些实施例中,所述样本视频帧的预测残差数据、所述中间特征数据以及所 述中间预测残差数据满足预设的优化停止条件(即步骤S223中所述)可以包括:所述中间特征数据的数据维度小于预设维度阈值,且所述样本视频帧的预测残差数据和所述中间预测残差数据之间的平均方差小于预设方差阈值。
中间特征数据通常表现为特征向量的形式,其数据维度即指向量中的元素个数。在满足预设的优化停止条件时,说明自编码网络已经能够将待传输视频帧的码率压缩到足够低,并且还能够根据中间特征数据得到(相较于待传输视频帧的预测残差数据而言)复原度足够高的预测残差数据。
在一些实施例中,所述初始编码网络包括初始编码输入层、初始编码输出层和多个初始编码中间层,所述初始解码网络包括初始解码输入层、多个初始解码中间层和初始解码输出层和多个初始解码中间层;所述中间编码网络包括中间编码输入层、中间编码输出层和多个中间编码中间层,所述中间解码网络包括中间解码输入层、中间解码输出层和多个中间解码中间层;所述最终编码网络包括最终编码输入层、最终编码输出层和多个最终编码中间层,所述最终解码网络包括最终解码输入层、最终解码输出层和多个最终解码中间层。
在一些实施例中,所述初始编码中间层、所述初始解码中间层、所述中间编码中间层、所述中间解码中间层、所述最终编码中间层以及所述最终解码中间层的类型均共包括如下三种:卷积层、池化层和激活层。
在一些实施例中,所述初始编码输出层的类型、所述初始解码输入层的类型、所述中间编码输出层的类型、所述中间解码输入层的类型、所述最终编码输出层的类型以及所述最终解码输入层的类型均为全连接层。
在一些实施例中,所述初始编码输入层的类型、所述初始解码输出层的类型、所述中间编码输入层的类型、所述中间解码输出层的类型、所述最终编码输入层的类型以及所述最终解码输出层的类型均相同。
相应的,在一些实施例中,所述调整所述中间网络的参数(即步骤S223中所述)可以包括如下步骤:调整所述中间编码中间层的数量和所述中间解码中间层的数量,以及调整所述中间编码中间层的神经元数量和所述中间解码中间层的神经元数量。
如前所示,中间网络包括中间编码网络和中间解码网络,中间编码网络包括多个中间编码中间层,中间解码网络也包括多个中间解码中间层,并且中间编码中间层的类型和中间解码中间层的类型均包括卷积层、池化层和激活层这三种,那么在调整中间网络的参数时,可以对中间解码中间层的数量以及中间解码中间层的数量进行调整,即确定中间编码网络具体包括几个卷积层、几个池化层和几个激活层,以及确定中间解码网络具体包括几个卷积层、几个池化层和几个激活层。此外,在每一层均包括多个神经元的情况下,还可以对每一中间编码中间层的神经元数量以及每一中间解码中间层的神经元数量进行调整。
在一些实施例中,所述传输场景参数包括场景类别标识和传输带宽;所述根据样本视频帧的传输场景参数对初始网络进行调整,得到中间网络(即步骤S21)可以包括如下步骤:根据所述场景类别标识和所述传输带宽,调整所述初始编码中间层的数量和所述初始解码中间层的数量,以及调整所述初始编码输出层输出数据的维度。
其中,场景类别标识用于表征视频帧的应用场景,应用场景可以包括环境监控、视频浏览服务、视频通信等等。传输带宽则指在传输视频帧时的额定带宽。可以通过预先配置传输场景参数与初始网络的参数之间的映射关系,依据不同的传输场景参数来调整初始网络以得 到不同的中间网络。初始网络的参数即指初始编码中间层的数量、初始解码中间层的数量以及初始编码输出层输出数据的维度。初始编码输出层输出数据的维度,即指初始编码输出层所输出的初始特征数据的维度。
传统的视频编解码过程包括多个子过程,本公开实施例通过采用自编码网络代替整个传统的视频编解码过程,还有利于简化硬件和软件的部署过程。
进一步的,在训练得到自编码网络后,还可以利用传统的视频编解码技术即传统的变换处理、量化处理、反量化处理和反变换处理来对样本视频帧的预测残差数据进行处理以得到复原后的预测残差数据,将自编码网络的编码码率(即最终编码网络输出数据的维度)与传统视频编解码技术的编码码率(即变换处理和量化处理后的数据维度)进行对比,以及将利用自编码网络复原的样本视频帧的PSNR(Peak Signal to Noise Ratio,峰值信噪比)值与利用传统视频编解码技术复原的样本视频帧的PSNR值进行对比,根据对比结果对自编码网络的参数做进一步的调整。
如图4所示,当根据本公开实施例的视频传输方法应用于解码端设备时,可以包括如下步骤S31至S33。
在步骤S31中,接收编码端设备传输的视频特征数据,其中,所述视频特征数据是所述编码端设备利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到的。
在步骤S32中,利用所述自编码网络中的最终解码网络,对所述视频特征数据进行解码处理,得到解码后的预测残差数据。
在步骤S33中,根据所述解码后的预测残差数据确定传输后视频帧。
从上述步骤S31至S33可以看出,采用根据本公开实施例的视频传输方法,解码端设备接收编码端设备传输的视频特征数据;其中,所述视频特征数据是所述编码端设备利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到的;解码端设备利用所述自编码网络中的最终解码网络,对所述视频特征数据进行解码处理,得到解码后的预测残差数据。这种视频传输方式相较于传统的视频编解码技术而言,既能够提高视频传输码率又能够提升视频帧编解码精度,还具有计算量小、计算复杂度低以及编解码时间耗时短的优势。
如图5所示,在根据本公开实施例的视频传输方法中,自编码网络中的最终编码网络相当于一个变换/量化网络,编码端设备通过将待传输视频帧的预测残差数据块输入该变换/量化网络,能够输出熵编码码流(即视频特征数据)。自编码网络中的最终解码网络相当于一个反变换/反量化网络,解码端设备通过将编码端设备传输的熵编码码流输入至该反变换/反量化网络,能够输出解码数据(即解码后的预测残差数据)。
图6是根据本公开实施例的一种自编码网络的示意图,其中,最终编码网络(encoder)的输入层(Input layer)输入的是待传输视频帧的预测残差数据,以x1、x2……x6表示,自编码网络的隐藏层(Hidden layers)包括最终编码中间层、最终编码输出层、最终解码输入层以及最终解码中间层,在图6中将最终编码输出层和最终解码输入层示为同一层,以a1、a2、a3表示视频特征数据。最终解码网络(decoder)的输出层(Onput layer)输出的是解码后的预测残差数据,以表示。最终编码中间层所包括的卷积层、池化层和激活层以及最终解码中间层所包括的卷积层、池化层和激活层图中均未示出。
图7是根据本公开实施例的另一种自编码网络的示意图,图中示出了最终编码网络的输入层(Input)、最终解码网络的输出层(Onput)、最终编码网络的卷积层(Conv)、最终编码网络的池化层(Pool)、最终编码网络的激活层(Active)、最终解码网络的卷积层(Conv)、最终解码网络的激活层(Active)、最终编码网络的采样层(Upsampling)、最终编码输出层和最终解码输入层(FC,全连接层)。最终编码网络中间层和最终解码网络中间层均包括N个单元结构。
需要说明的是,本公开实施例中的自编码网络,同样还可以替换为其他深度网络例如GAN(Generative Adversarial Net,生成对抗神经网络),同样也是利用深度网络代替变换处理、量化处理、反变换处理以及反量化处理这整个视频编解码过程,并且训练深度网络的方法与训练自编码网络的方法也可以是相同的。无论是自编码网络,还是其他深度网络,均需要支持一些深度学习的基本操作,例如:Convolution(卷积)、Deconvolution(反卷积)、ReLU(激活)、Sigmoid(逻辑回归的预测和输出函数)、Full-Connection(全连接)、Reshape(可以重新调整矩阵的行数、列数、维数的函数)等。
此外,本公开实施例还提供一种电子设备,该电子设备可以包括:一个或多个处理器;存储装置,其上存储有一个或多个程序;其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如前所述的视频传输方法。
此外,本公开实施例还提供一种计算机存储介质,其上存储有计算机程序,其中,所述程序被执行时实现如前所述的视频传输方法。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。

Claims (10)

  1. 一种视频传输方法,包括:
    利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到视频特征数据,所述自编码网络包括所述最终编码网络和最终解码网络;以及
    将所述视频特征数据传输至解码端设备。
  2. 根据权利要求1所述的方法,其中,所述方法还包括训练得到所述自编码网络的以下步骤:
    根据样本视频帧的传输场景参数对初始网络进行调整,得到中间网络;以及
    根据样本视频帧的预测残差数据训练所述中间网络,得到所述自编码网络;其中,所述样本视频帧的传输场景参数与所述待传输视频帧的传输场景参数相同。
  3. 根据权利要求2所述的方法,其中,所述初始网络包括初始编码网络和初始解码网络,所述中间网络包括中间编码网络和中间解码网络;并且
    所述根据样本视频帧的预测残差数据训练所述中间网络,得到所述自编码网络包括:
    将所述样本视频帧的预测残差数据输入所述中间编码网络,得到所述中间编码网络输出的中间特征数据;
    将所述中间特征数据输入所述中间解码网络,得到所述中间解码网络输出的中间预测残差数据;以及
    在所述样本视频帧的预测残差数据、所述中间特征数据以及所述中间预测残差数据不满足预设的优化停止条件的情况下,调整所述中间网络的参数,直至所述样本视频帧的预测残差数据以及调整后所得到的中间特征数据和中间预测残差数据满足预设的优化停止条件。
  4. 根据权利要求3所述的方法,其中,所述样本视频帧的预测残差数据、所述中间特征数据以及所述中间预测残差数据满足预设的优化停止条件包括:
    所述中间特征数据的数据维度小于预设维度阈值,且所述样本视频帧的预测残差数据和所述中间预测残差数据之间的平均方差小于预设方差阈值。
  5. 根据权利要求3所述的方法,其中,
    所述初始编码网络包括初始编码输入层、初始编码输出层和多个初始编码中间层,所述初始解码网络包括初始解码输入层、多个初始解码中间层和初始解码输出层和多个初始解码中间层;所述中间编码网络包括中间编码输入层、中间编码输出层和多个中间编码中间层,所述中间解码网络包括中间解码输入层、中间解码输出层和多个中间解码中间层;所述最终编码网络包括最终编码输入层、最终编码输出层和多个最终编码中间层,所述最终解码网络包括最终解码输入层、最终解码输出层和多个最终解码中间层;
    所述初始编码中间层、所述初始解码中间层、所述中间编码中间层、所述中间解码中间 层、所述最终编码中间层以及所述最终解码中间层的类型均共包括如下三种:卷积层、池化层和激活层;
    所述初始编码输出层的类型、所述初始解码输入层的类型、所述中间编码输出层的类型、所述中间解码输入层的类型、所述最终编码输出层的类型以及所述最终解码输入层的类型均为全连接层;并且
    所述初始编码输入层的类型、所述初始解码输出层的类型、所述中间编码输入层的类型、所述中间解码输出层的类型、所述最终编码输入层的类型以及所述最终解码输出层的类型均相同。
  6. 根据权利要求5所述的方法,其中,所述调整所述中间网络的参数包括:
    调整所述中间编码中间层的数量和所述中间解码中间层的数量,以及调整所述中间编码中间层的神经元数量和所述中间解码中间层的神经元数量。
  7. 根据权利要求5所述的方法,其中,所述传输场景参数包括场景类别标识和传输带宽;所述根据样本视频帧的传输场景参数对初始网络进行调整,得到中间网络包括:
    根据所述场景类别标识和所述传输带宽,调整所述初始编码中间层的数量和所述初始解码中间层的数量,以及调整所述初始编码输出层输出数据的维度。
  8. 一种视频传输方法,包括:
    接收编码端设备传输的视频特征数据,其中,所述视频特征数据是所述编码端设备利用预先训练得到的自编码网络中的最终编码网络,对待传输视频帧的预测残差数据进行编码处理得到的;
    利用所述自编码网络中的最终解码网络,对所述视频特征数据进行解码处理,得到解码后的预测残差数据;以及
    根据所述解码后的预测残差数据确定传输后视频帧。
  9. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1-8任一项所述的视频传输方法。
  10. 一种计算机存储介质,其上存储有计算机程序,其中,所述程序被执行时实现如权利要求1-8任一项所述的视频传输方法。
PCT/CN2023/110191 2022-08-01 2023-07-31 视频传输方法、电子设备及计算机存储介质 WO2024027635A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210914809.XA CN117544778A (zh) 2022-08-01 2022-08-01 视频传输方法、电子设备及计算机存储介质
CN202210914809.X 2022-08-01

Publications (1)

Publication Number Publication Date
WO2024027635A1 true WO2024027635A1 (zh) 2024-02-08

Family

ID=89781238

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110191 WO2024027635A1 (zh) 2022-08-01 2023-07-31 视频传输方法、电子设备及计算机存储介质

Country Status (2)

Country Link
CN (1) CN117544778A (zh)
WO (1) WO2024027635A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110753225A (zh) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 一种视频压缩方法、装置及终端设备
CN111163320A (zh) * 2018-11-07 2020-05-15 合肥图鸭信息科技有限公司 一种视频压缩方法及***
CN111161363A (zh) * 2018-11-07 2020-05-15 合肥图鸭信息科技有限公司 一种图像编码模型训练方法及装置
WO2021063118A1 (en) * 2019-10-02 2021-04-08 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for image processing
CN113256744A (zh) * 2020-02-10 2021-08-13 武汉Tcl集团工业研究院有限公司 一种图像编码、解码方法及***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111163320A (zh) * 2018-11-07 2020-05-15 合肥图鸭信息科技有限公司 一种视频压缩方法及***
CN111161363A (zh) * 2018-11-07 2020-05-15 合肥图鸭信息科技有限公司 一种图像编码模型训练方法及装置
WO2021063118A1 (en) * 2019-10-02 2021-04-08 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and apparatus for image processing
CN110753225A (zh) * 2019-11-01 2020-02-04 合肥图鸭信息科技有限公司 一种视频压缩方法、装置及终端设备
CN113256744A (zh) * 2020-02-10 2021-08-13 武汉Tcl集团工业研究院有限公司 一种图像编码、解码方法及***

Also Published As

Publication number Publication date
CN117544778A (zh) 2024-02-09

Similar Documents

Publication Publication Date Title
WO2018103568A1 (zh) 云桌面内容编码与解码方法及装置、***
US11451827B2 (en) Non-transform coding
WO2020237646A1 (zh) 图像处理方法、设备及计算机可读存储介质
WO2022088631A1 (zh) 图像编码方法、图像解码方法及相关装置
CN113822147B (zh) 一种协同机器语义任务的深度压缩方法
WO2013067949A1 (zh) 矩阵编码方法与装置及解码方法与装置
WO2024125099A1 (zh) 可变码率图像压缩方法、***、装置、终端及存储介质
CN111726614A (zh) 一种基于空域下采样与深度学习重建的hevc编码优化方法
CN109922339A (zh) 结合多采样率下采样和超分辨率重建技术的图像编码框架
WO2023279961A1 (zh) 视频图像的编解码方法及装置
CN102857760B (zh) 一种无反馈的优化码率的分布式视频编解码方法及其***
CN116055726A (zh) 一种低延迟分层视频编码方法、计算机设备及介质
US20140241423A1 (en) Image coding and decoding methods and apparatuses
WO2024027635A1 (zh) 视频传输方法、电子设备及计算机存储介质
CN103826122B (zh) 一种复杂度均衡的视频编码方法及其解码方法
CN116527909A (zh) 编码参数的传输方法、装置、设备、存储介质及程序产品
CN103533351B (zh) 一种多量化表的图像压缩方法
WO2024007843A9 (zh) 一种编解码方法、装置及计算机设备
KR20040104831A (ko) 영상데이터의 압축 장치 및 방법
Doutsi et al. Retina-inspired video codec
CN116527903B (zh) 图像浅压缩方法及解码方法
Hu et al. Efficient image compression method using image super-resolution residual learning network
WO2022257130A1 (zh) 编解码方法、码流、编码器、解码器、***和存储介质
WO2023245544A1 (zh) 编解码方法、码流、编码器、解码器以及存储介质
CN117676149B (zh) 一种基于频域分解的图像压缩方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23849342

Country of ref document: EP

Kind code of ref document: A1