WO2019228082A1 - 用于深度神经网络频繁传输的压缩方法及*** - Google Patents

用于深度神经网络频繁传输的压缩方法及*** Download PDF

Info

Publication number
WO2019228082A1
WO2019228082A1 PCT/CN2019/082384 CN2019082384W WO2019228082A1 WO 2019228082 A1 WO2019228082 A1 WO 2019228082A1 CN 2019082384 W CN2019082384 W CN 2019082384W WO 2019228082 A1 WO2019228082 A1 WO 2019228082A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep neural
neural network
model
compression
transmitted
Prior art date
Application number
PCT/CN2019/082384
Other languages
English (en)
French (fr)
Inventor
段凌宇
陈子谦
楼燚航
黄铁军
Original Assignee
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学 filed Critical 北京大学
Priority to US17/057,882 priority Critical patent/US20210209474A1/en
Publication of WO2019228082A1 publication Critical patent/WO2019228082A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning

Definitions

  • the invention belongs to the field of artificial intelligence technology, and particularly relates to a compression method and system for frequent transmission of deep neural networks.
  • Figure 1 shows the traditional deep neural network compression algorithm.
  • traditional deep neural networks can optionally use data-driven or non-data-driven methods.
  • Different algorithms or not) such as pruning, low-rank decomposition, convolution kernel selection, and model reconstruction are used for deep neural networks.
  • pruning or not
  • low-rank decomposition or not
  • convolution kernel selection or model reconstruction
  • model reconstruction is used for deep neural networks.
  • Option to generate a preliminary compressed deep neural network model, and then optionally use knowledge transfer or retraining, and repeat the above-mentioned methods, and finally produce a preliminary compressed deep neural network model.
  • most of the preliminary compressed deep neural network models can no longer be decompressed and restored to the original original network model.
  • the network model is quantified in a quantized manner, and then optionally, the deep neural network is encoded in a coding manner, and finally an encoded quantized deep neural network is generated. model.
  • Figure 2 shows a schematic diagram of the process of transmission on the network using the traditional deep neural network compression method.
  • the original network can be compressed in a quantization or encoding manner, and the encoded compressed deep neural network can be transmitted.
  • a quantized compressed deep neural network can be obtained.
  • the present invention provides a compression method and system for frequent transmission of deep neural networks, extends the compression of deep neural networks to the field of transmission, and uses the potential between deep neural network models. Redundant compression reduces the overhead of deep neural networks under frequent transmissions, that is, compression is performed using multiple models under frequent transmissions.
  • a compression method for frequent transmission of a deep neural network including:
  • the original deep neural network model is replaced or accumulated to generate a received deep neural network.
  • the method specifically includes: the transmitting end sends the deep neural network to be transmitted to the compression end, and the compression end obtains data information and an organization manner of the one or more deep neural networks to be transmitted;
  • the prediction module at the compression end is based on the current and historical transmission of one or more deep neural network models, performs model prediction compression of multiple transmissions, and generates prediction residuals of one or more deep neural networks to be transmitted;
  • the quantization module at the compression end uses one or more quantization methods to quantize the prediction residuals based on the generated one or more prediction residuals to generate one or more quantized prediction residuals;
  • the encoding module on the compression side uses the encoding method to encode the quantized prediction residuals based on one or more generated quantized prediction residuals, generates one or more encoded prediction residuals, and transmits them;
  • the decompression end receives one or more encoded prediction residuals, and the decompression module at the decompression end uses a corresponding decoding method to decode the encoded prediction residuals to generate one or more quantized prediction residuals;
  • the model prediction and decompression module at the decompression end generates a received deep neural network at the receiving end through multiple model predictions based on one or more quantized prediction residuals and the previously stored deep neural network at the receiving end.
  • the data information and organization of the deep neural network include part or all of the data and network structure of the deep neural network.
  • the data information and organization method of the one or more deep neural network models corresponding to the historical transmission of the receiving end can be obtained.
  • the model is set as the default history transfer model.
  • the model prediction compression utilizes redundancy between multiple complete or predicted models to compress, which is one of the following ways: calculating between the deep neural network model to be transmitted and the deep neural network model of historical transmission
  • the overall residual, or the deep neural network model to be transmitted is transmitted as an internal residual of one or more layers, or a residual measured by a convolution kernel.
  • the model prediction compression includes one or more data information and an organization manner derived from one or more residual compression granularity or a deep neural network.
  • the multiple models of historical transmission at the receiving end are complete lossless models, or lossy partial models.
  • the quantization method includes adopting direct output of raw data, or precision control of weights to be transmitted, or adopting a kmeans nonlinear quantization algorithm.
  • the manner of predicting the multiple models includes: replacing or accumulating one or more deep neural network models stored originally.
  • the manner of the multiple model prediction includes: receiving one or more quantized prediction residuals simultaneously or non-simultaneously, and combining or accumulating part or all of the stored one or more deep neural networks.
  • a compression system for frequent transmission of deep neural networks including:
  • the model prediction compression module based on one or more deep neural network models of the current and historical transmission, combines part or all of the model differences between the model to be transmitted and the historical transmission model to generate one or more predictions. Residuals, and transmit the information required for related predictions;
  • the model prediction and decompression module based on the received one or more quantized prediction residuals and the deep neural network stored on the receiving end, combines or replaces the original deep neural network model to generate the received deep neural network;
  • model prediction compression module and the model prediction decompression module can add, delete, and modify the historically transmitted deep neural network model and the stored deep neural network.
  • the advantage of the present invention is that the present invention combines the redundancy between multiple models of the deep neural network on frequent transmissions, utilizes the knowledge information between the deep neural networks to compress, and reduces the size and bandwidth of the required transmission. . Under the same bandwidth limitation, the deep neural network can be transmitted better, and the possibility of targeted neural network compression at the front end is allowed, instead of only partially reducing the deep neural network after targeted compression.
  • FIG. 1 shows a flowchart of a conventional deep neural network compression algorithm
  • FIG. 2 is a schematic diagram showing a compression process of applying a conventional deep neural network compression algorithm to network transmission
  • FIG. 3 is a schematic diagram illustrating a compression process of a deep neural network transmission on a network according to the present invention
  • FIG. 4 is a schematic flowchart of a compression method for frequent transmission of a deep neural network according to the present invention
  • FIG. 5 shows a schematic flowchart of frequent transmission compression for a deep neural network in the case of a deep neural network model considering transmission preliminary compression
  • FIG. 6 shows a flowchart of compression of a deep neural network network under frequent transmission conditions provided by the present invention
  • FIG. 7 shows a schematic diagram of a multi-model prediction module proposed by the present invention in consideration of potential redundancy between deep neural network models for compression.
  • FIG. 3 shows a schematic diagram of the compression process of the present invention considering the transmission of a deep neural network on the network.
  • Based on this and historical transmission of one or more deep neural network models combine part or all of the model differences between the model to be transmitted and historically transmitted models to generate one or more prediction residuals, and The information required for related prediction is transmitted; based on the received one or more quantized prediction residuals and the deep neural network stored at the receiving end, the original deep neural network model is replaced or accumulated to generate the received depth Neural Networks.
  • the deep neural network is transmitted to the to-be-transmitted end in a lossy or lossless manner.
  • the deep neural network to be transmitted is compressed, and the compressed data is transmitted.
  • the size of the compressed data is based on the bandwidth condition, which is smaller or far smaller than the original model.
  • the CNN model before compression is 400MB
  • the compressed data transmitted by the model is much less than 400MB.
  • the reconstructed CNN model is still 400MB, and this model is used for image retrieval, segmentation and / or classification tasks, speech recognition, and so on.
  • FIG. 4 shows a flowchart of a compression method for frequent transmission of a deep neural network according to the present invention.
  • a feasible algorithm for multiple transmission model prediction is given in combination with the content of the invention, but is not limited thereto.
  • a VGG-16-retrain model needs to be transmitted, and the previous transmission model is available at both the receiving end and the transmitting end, such as original-vgg-16. Based on the present invention, it is not necessary to directly transmit the original model to be transmitted, namely VGG-16- retrain.
  • VGG-16- retrain Through the parameter residuals of each layer, a band transmission model with smaller data range and less information can be obtained.
  • a convolution layer of the same size convolution kernel of a deep neural network is used as the basic unit.
  • a base convolution layer can be used as the compression base of the convolution kernel of the same size. Combining the data distribution can obtain a smaller residual of the data distribution.
  • one or more convolution kernels can be used as the compression base.
  • each convolution kernel of each convolution layer can perform compression methods such as residual compression or quantization, and finally generate predictions. Residuals.
  • the redundancy between multiple models is utilized and combined to be compressed, eventually generating a prediction residual with a relatively small amount of information, and theoretically combined with a lossless prediction residual can restore the original losslessly
  • the network also generates smaller bandwidth and data requirements. Combining different network structures and multiple prediction models, and selecting the appropriate prediction model and prediction structure, a prediction residual with a higher compression rate can be obtained, and the information required for related predictions is transmitted.
  • Traditional compression methods focus on specialized compression of deep neural networks for a given task, and for the perspective of transmission, a broad and untargeted compression method needs to be adopted.
  • the traditional method can solve the required bandwidth problem to a certain extent, but it essentially generates a preliminary compression model, and then does not combine historical deep neural network information, that is, there is greater redundancy between models. . That is, a preliminary compressed deep neural network model (uncoded) is transmitted.
  • the present invention can also use the redundancy between different preliminary compressed deep neural networks or the redundancy between uncompressed networks. Compression to make the compression rate higher in the transmission stage, saving the transmission bandwidth.
  • the present invention provides a process for compressing a deep neural network network under frequent transmission conditions. It includes the following steps:
  • the transmitting end sends the deep neural network to be transmitted to the compression end, and the compression end obtains the data information and organization method of the one or more deep neural networks to be transmitted.
  • the data information and organization of the deep neural network include part or all of the data and network structure of the deep neural network, so a neural network to be transmitted may constitute one or more data and organization of the deep neural network.
  • the prediction module at the compression end performs model prediction compression for multiple transmissions based on one or more deep neural network models of the current and historical transmission, and generates prediction residuals of one or more deep neural networks to be transmitted.
  • Model prediction compression is an algorithm module that combines compression between multiple transmissions of this transmission and the historical transmission of the corresponding receiving end, including, but not limited to, the overall calculation Residuals, or deep neural network models to be transmitted, are transmitted as residuals of one or more layers inside the structure or residuals measured in different units such as a convolution kernel. Finally, combining different multi-model compression granularities to generate prediction residuals for one or more deep neural networks,
  • one or more models of prediction compression include, but not only, one or more data information and organization methods derived from one or more residual compression granularity or deep neural networks.
  • the multiple models of the historical transmission at the receiving end can be complete non-destructive models or partial models that are lossy. This does not affect the calculation of redundancy between multiple models, which can be filled by filling in blanks or other methods, or taking appropriate measures.
  • the unified representation of deep neural network model is not affected.
  • the residuals After the residuals are calculated, they can be directly output or a feasible compression algorithm is used to compress the prediction residuals to control the transmission size.
  • the quantization module at the compression end uses one or more quantization methods to quantize the prediction residuals to generate one or more quantized prediction residuals.
  • the quantization method includes directly outputting the original data, that is, without quantization.
  • Quantization refers to the received one or more prediction residuals, and adopts the following but not limited to the following algorithms to control the transmission size, such as the accuracy control of the weights to be transmitted (such as 32-bit floating point limit to n decimal places, or Into 2 ⁇ n times, etc.), or adopt non-linear quantization algorithms such as kmeans to generate one or more quantized prediction residuals.
  • one or more quantized prediction residuals transmitted in an iterative manner can be generated for different needs.
  • 32-bit floating-point data can be quantized into three sets of 8-bit quantized prediction residuals.
  • all or only part of one or more quantized prediction residuals are transmitted.
  • the encoding module at the compression end uses the encoding method to encode the quantized prediction residuals based on one or more generated quantized prediction residuals, generates one or more encoded prediction residuals, and transmits them.
  • one or more encoding methods may be used to encode and transmit one or more quantized prediction residuals. It is then converted into a bit stream and sent to the network for transmission.
  • One or more encoded prediction residuals are received at the decompression end, and the decompression module at the decompression end adopts a corresponding decoding method to decode the encoded prediction residuals to generate one or more quantized prediction residuals.
  • one or more decoding methods corresponding to the encoding end may be used to decode one or more encoded prediction residuals to generate one or more quantized prediction residuals.
  • S6 The model prediction decompression module on the decompression side, based on one or more quantized prediction residuals and the previously stored deep neural network at the receiving end, generates the received deep neural network at the receiving end through multiple model predictions. .
  • the model prediction decompression module based on the received one or more quantized prediction residuals and the deep neural network stored at the receiving end, it includes replacing or accumulating one or more deep neural network models stored in the original. And other methods to generate a received deep neural network.
  • one or more quantized prediction residuals can be received simultaneously or non-simultaneously, combined with the accumulation or replacement of one or more deep neural networks stored in part or all, and finally through an organization method To generate the received deep neural network and complete the transmission.
  • the present invention considers potential redundancy between deep neural network models for compression, and proposes a multi-model prediction module, including a compression and decompression module, which is used at the compression and decompression ends. History of "useless" deep neural network information.
  • model prediction compression module In the model prediction compression module, based on one or more deep neural network models of the current and historical transmission, combine part or all of the model differences between the model to be transmitted and the historical transmission model to generate one or Multiple prediction residuals, and the information required for related predictions is transmitted.
  • model prediction and decompression module In the model prediction and decompression module, based on the received one or more quantized prediction residuals and the deep neural network stored at the receiving end, including the replacement or accumulation of the original deep neural network model to generate, Received deep neural network.
  • model prediction compression and decompression module In the model prediction compression and decompression module, add, delete, and modify historical deep neural network models and stored deep neural networks.
  • the present invention combines the redundancy between multiple models of the deep neural network on frequent transmissions, and uses the knowledge information between the deep neural networks to compress, reducing the size and bandwidth.
  • the deep neural network can be better transmitted, while allowing the deep neural network to perform targeted compression at the front end, instead of only partially reducing the deep neural network after targeted compression.
  • modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment.
  • the modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except for such features and / or processes or units, which are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed may be employed in any combination or All processes or units of the equipment are combined.
  • the various component embodiments of the present invention may be implemented by hardware, or by software modules running on one or more processors, or by a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used to implement some or all functions of some or all components in a device for creating a virtual machine according to an embodiment of the present invention.
  • DSP digital signal processor
  • the invention may also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing part or all of the method described herein.
  • Such a program that implements the present invention may be stored on a computer-readable medium or may have the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本发明公开了用于深度神经网络频繁传输的压缩方法及***,扩展深度神经网络压缩至传输领域,利用深度神经网络模型之间的潜在冗余性进行压缩,减少深度神经网络在频繁传输下的开销。本发明的优势在于:本发明结合了深度神经网络在频繁传输上的多个模型之间的冗余性,利用了深度神经网络之间的知识信息进行压缩,减少了所需传输的大小和带宽。在相同带宽限制下,能更好地将深度神经网络进行传输,同时允许深度神经网络在前端进行针对性压缩的可能,而非只能将深度神经网络进行针对性压缩后进行部分的还原。

Description

用于深度神经网络频繁传输的压缩方法及*** 技术领域
本发明属于人工智能技术领域,具体涉及一种用于深度神经网络频繁传输的压缩方法及***。
背景技术
随着人工智能的发展,深度神经网络展现了强大的能力,并在各个领域取得了卓越的效果,并且各种深度神经网络模型不断发展,并且广泛的在网络中传播并发展。然而随着深度神经网络的发展,其运行时需要的庞大计算资源和存储开销亦备受关注,因此,如何能在保持深度神经网络强大性能的情况下,减小深度神经网络的体积和计算能力,许多深度神经网络压缩的方法被提出。例如采取对网络剪枝、奇异值分解、二值深度神经网络构建、知识蒸馏等方式,结合量化、哈夫曼编码等,可以将深度神经网络在一定程度上进行压缩,并形成轻量级的网络。大部分方法针对某一项给定任务进行压缩,并将原网络进行重新训练,压缩的时间较长,并不一定可以进行对压缩后的网络解压缩。
图1展示了传统的深度神经网络压缩的算法。如图1所示,传统的深度神经网络可选地采用数据驱动或非数据驱动的方式,针对深度神经网络采用剪枝、低秩分解、卷积核选择、模型重构等不同算法(或不选),生成初步压缩的深度神经网络模型,而后可选地采用知识迁移或重新训练,并重复上述做法,最终产生一个初步压缩的深度神经网络模型。同时,初步压缩的深度神经网络模型大部分程度上无法再解压还原回初始的原始网络模型。
在得到初步压缩的深度神经网络模型后,可选地,采取量化的方式对网络模型进行量化,而后可选地,采用编码的方式对深度神经网络进行编码,最终生成编码的量化的深度神经网络模型。
图2展示了采用传统深度神经网络压缩的方法用于网络上传输的流程示意图。如图2所示,基于当前传统深度网络压缩从单个模型角度对深度神经网络进行压缩,我们将其归并于单模型压缩方法。可选地,可以通过量化或编码的方式对原始网络进行压缩,并传输编码的压缩的深度神经网络。在解码端,对 接收到的编码的压缩模型进行解码后,可以得到量化的压缩的深度神经网络。
然而目前的方法均从“减少深度神经网络存储、计算开销”角度出发,而随着深度神经网络频繁更新和在网络上频繁传输,深度神经网络带来的传输开销亦是一个亟待解决的问题。通过减少存储大小间接减少传输的开销是一种可行的方法,但是在面对更宽泛的深度神经网络频繁传输条件下,就需要一种能够在传输阶段对深度神经网络进行压缩的方法,让模型能在传输端进行高效的压缩,在接收端对传输的压缩模型进行解压,并且最大程度保持原有深度神经网络的属性。例如,在带宽限制但不考虑接收端存储大小时,若在接收端频繁接收深度神经网络模型时,需要提出一种用于深度神经网络传输的压缩方法和***。
发明内容
针对深度神经网络频繁传输下高额的带宽开销,本发明提供一种针对用于深度神经网络频繁传输的压缩方法和***,扩展深度神经网络压缩至传输领域,利用深度神经网络模型之间的潜在冗余性进行压缩,减少深度神经网络在频繁传输下的开销,即利用频繁传输下的多个模型进行压缩。
根据本发明的一个方面,提供了一种用于深度神经网络频繁传输的压缩方法,包括:
基于本次和历史传输的一个或多个深度神经网络模型,将待传输模型部分或全部与历史传输的模型之间部分或全部的模型差异进行组合,生成一个或多个预测残差,并将相关预测所需信息进行传输;
基于接收到的一个或多个量化的预测残差和在接收端存储的深度神经网络进行组合,对原存储的深度神经网络模型进行替换或累加,生成接收的深度神经网络。
优选的,所述方法具体包括:传输端将待传输的深度神经网络送入压缩端,压缩端获得待传输的一个或多个深度神经网络的数据信息和组织方式;
压缩端的预测模块基于本次和历史传输的一个或多个深度神经网络模型,进行多次传输的模型预测压缩,产生待传输的一个或多个深度神经网络的预测残差;
压缩端的量化模块基于产生的一个或多个预测残差,采取一种或多种量化方式对预测残差进行量化,生成一个或多个量化的预测残差;
压缩端的编码模块基于一个或多个产生的量化的预测残差,采取编码的方法对量化的预测残差进行编码,生成一个或多个编码的预测残差并传输;
解压端接收到一个或多个编码的预测残差,在解压端的解压模块采取对应的解码方法对编码的预测残差进行解码,产生一个或多个量化的预测残差;
在解压端的模型预测解压模块,基于一个或多个量化的预测残差和在接收端前一次的存储的深度神经网络,通过多个模型预测的方式,在接收端生成接收的深度神经网络。
优选的,所述深度神经网络的数据信息和组织方式包含部分或全部深度神经网络的数据和网络结构。
优选的,在所述压缩端基于频繁传输的环境下,能够获得对应接收端的历史传输的一个或多个深度神经网络模型的数据信息和组织方式,若无历史传输的深度神经网络模型,将空模型设定为默认的历史传输模型。
优选的,所述模型预测压缩利用多个完整或预测的模型之间的冗余性进行压缩,为以下方式之一:采用计算待传输的深度神经网络模型和历史传输的深度神经网络模型之间整体的残差,或待传输的深度神经网络模型以内部一层或多层结构的残差,或以卷积核衡量的残差进行传输。
优选的,所述模型预测压缩包含源于一种或多种残差压缩粒度或深度神经网络的一种或多种数据信息和组织方式。
更优选的,所述接收端的历史传输的多个模型是完整无损模型,或有损的部分模型。
优选的,所述量化方式包含采取原始数据直接输出,或者对待传输的权重的精度控制,或者采取kmeans非线性量化算法。
优选的,所述多个模型预测的方式包括:对原存储的一个或多个深度神经网络模型进行替换或累加。
优选的,所述多个模型预测的方式包括:同时或非同时地接收一个或多个量化的预测残差,结合原存储的一个或多个深度神经网络部分或全部的累加或替换。
根据本发明的一个方面,还提供了一种用于深度神经网络频繁传输的压缩***,包括:
模型预测压缩模块,基于本次和历史传输的一个或多个深度神经网络模型,将待传输模型部分或全部与历史传输的模型之间部分或全部的模型差异进行组合,生成一个或多个预测残差,并将相关预测所需信息进行传输;
模型预测解压模块,基于接收到的一个或多个量化的预测残差和在接收端存储的深度神经网络进行组合,对原存储的深度神经网络模型进行替换或累加,生成接收的深度神经网络;
其中,在所述模型预测压缩模块和模型预测解压模块,能够对历史传输的深度神经网络模型和存储的深度神经网络进行添加、删除、修改。
本发明的优势在于:本发明结合了深度神经网络在频繁传输上的多个模型之间的冗余性,利用了深度神经网络之间的知识信息进行压缩,减少了所需传输的大小和带宽。在相同带宽限制下,能更好地将深度神经网络进行传输,同时允许深度神经网络在前端进行针对性压缩的可能,而非只能将深度神经网络进行针对性压缩后进行部分的还原。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了传统深度神经网络压缩算法的流程图;
图2示出了将传统深度神经网络压缩算法应用于网络传输上的压缩流程示意图;
图3示出了本发明考虑深度神经网络在网络上传输的压缩流程示意图;
图4示出了本发明提出的用于深度神经网络频繁传输的压缩方法流程示意图;
图5示出了考虑传输初步压缩的深度神经网络模型情况下,用于深度神经网络的频繁传输压缩的流程示意图;
图6示出了本发明提供的在频繁传输条件下对深度神经网络网络压缩的流程图;
图7示出了本发明考虑了结合深度神经网络模型之间的潜在冗余性进行压缩提出的多模型预测模块原理图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施方式。虽然附图中显示了本公开的示例性实施方式,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
图3展示了本发明考虑深度神经网络在网络上传输的压缩流程示意图。基于本次和历史传输的一个或多个深度神经网络模型,将待传输模型部分或全部与历史传输的模型之间部分或全部的模型差异进行组合,生成一个或多个预测残差,并将相关预测所需信息进行传输;基于接收到的一个或多个量化的预测残差和在接收端存储的深度神经网络进行组合,对原存储的深度神经网络模型进行替换或累加,生成接收的深度神经网络。
如图3所示,在给定带宽的条件下,有损或无损的将深度神经网络传输到待传输端。将待传输深度神经网络进行压缩,并将压缩后的数据进行传输。其中,压缩后的数据大小基于带宽条件,小于或远小于原始模型。例如压缩前的CNN模型为400MB,而模型传输的压缩后数据远小于400MB。在接收端解压并还原为有损或无损的初始传输模型,并用于不同的任务下。例如解压缩后,重构的CNN模型仍然为400MB,并将这个模型用于图像检索、分割和/或分类任务、语音识别中等等。
图4展示了本发明的深度神经网络频繁传输的压缩方法的流程示意图。如图4所示,结合发明内容,给出多次传输模型预测的一种可行的算法,但不局限于此。
例如需要传输VGG-16-retrain模型,在接收端和发送端均有上一次传输的模型,例如original-vgg-16,基于本发明,不需要直接传输原始的待传输模型,即VGG-16-retrain。通过每一层的参数残差,即可获得一个更小数据范围 和更少信息量的带传输模型。同理,以深度神经网络的同大小卷积核的卷积层为基本单位,可以以一个基地卷积层作为同大小卷积核的压缩基地,结合数据分布可以获得更小数据分布的残差待传输网络层。类似的,可以以1个或多个卷积核作为压缩基底,对待传输的VGG-16-ratrain,每一个卷积层的每一个卷积核进行残差压缩或量化等压缩方法,最终生成预测残差。
相比于直接传输,多个模型之间的冗余性被利用并结合被压缩,最终产生一个具有相对较小信息量的预测残差,并理论上结合无损的预测残差可以无损的还原原始网络,同时产生较小的带宽和数据需求。结合不同的网络结构和多个预测模型,选择恰当的预测模型和预测结构,就可以获得一个较高压缩率的预测残差,同时将相关预测所需的信息进行传输。
传统的压缩方法关注于在某个给定任务下对深度神经网络进行专精的压缩,而针对传输角度考虑,一个宽泛且非针对性的压缩方法需要被采纳。采用传统的方法,可以一定程度上解决需要的带宽问题,但是其本质上是产生一个初步的压缩模型,而后并没有结合历史的深度神经网络信息,即存在较大的冗余性在模型之间。即传输初步压缩的深度神经网络模型(未编码),如图5所示,本发明也可以利用不同初步压缩的深度神经网络之间冗余性或和未压缩的网络之间的冗余性进行压缩,使之在传输阶段压缩率更高,节省带来的传输带宽。
如图6所示,第一方面,本发明提供了一种在频繁传输条件下,对深度神经网络网络压缩的流程。具体包括如下步骤:
S1:传输端将待传输的深度神经网络送入压缩端,压缩端获得待传输的一个或多个深度神经网络的数据信息和组织方式。其中,深度神经网络的数据信息和组织方式包含部分或全部深度神经网络的数据和网络结构,故一个待传输的神经网络可以组成一个或多个深度神经网络的数据信息和组织方式。
S2:压缩端的预测模块基于本次和历史传输的一个或多个深度神经网络模型,进行多次传输的模型预测压缩,产生待传输的一个或多个深度神经网络的预测残差。
其中,在压缩端基于频繁传输的环境下,可以获得对应接收端的历史传输的一个或多个深度神经网络模型的数据信息和组织方式。若无历史传输的深度 神经网络模型,可以将空模型设定为默认的历史传输模型。
模型预测压缩为一个结合本次传输和对应接收端的历史传输的多模型之间压缩的算法模块,包含但不仅限于采用计算待传输的深度神经网络模型和历史传输的深度神经网络模型之间整体的残差,或待传输的深度神经网络模型以内部一层或多层结构的残差或以卷积核等不同单位衡量的残差进行传输等。最终,结合不同的多模型压缩粒度,产生一个或多个深度神经网络的预测残差,
其中,一个或多个模型预测压缩包含但不仅包含源于一种或多种残差压缩粒度或深度神经网络的一种或多种数据信息和组织方式。
其中接收端的历史传输的多个模型可以是完整无损模型,亦或有损的部分模型,此处不影响对多模型之间冗余性计算,可以通过填空白或其他方式进行弥补,或采取恰当的深度神经网络模型表示方法进行统一。
残差计算后可以直接输出或采取可行的压缩算法对预测残差进行压缩以控制传输大小。
S3:压缩端的量化模块基于产生的一个或多个预测残差,采取一种或多种量化方式对预测残差进行量化,生成一个或多个量化的预测残差。
其中,量化方式包含采取原始数据直接输出,即不进行量化。
量化即针对接收到的一个或多个预测残差,采取下列但不仅限于下列的算法进行对传输大小进行控制,如对待传输的权重的精度控制(如32位浮点限制为n位小数,或转化为2^n次等),或采取kmeans等非线性量化算法,产生一个或多个量化的预测残差。
其中,针对一个预测的残差,可以针对不同的需求,产生一个或多个迭代式传输的量化的预测残差,如32位浮点数据,可以量化为3组8位量化的预测残差,针对不同需求,全部传输或仅传输部分一个或多个量化的预测残差。
故最终产生一个或多个量化的预测残差。
S4:在压缩端的编码模块基于一个或多个产生的量化的预测残差,采取编码的方法对量化的预测残差进行编码,生成一个或多个编码的预测残差并传输。
在编码模块,可以采用一个或多个编码的方法对一个或多个量化的预测残差进行编码后传输。而后转化为比特流送入网络进行传输。
S5:在解压端接收到一个或多个编码的预测残差,在解压端的解压模块采 取对应的解码方法对编码的预测残差进行解码,产生一个或多个量化的预测残差。
在解压模块,可以采用与编码端对应的一个或多个解码的方法对一个或多个编码的预测残差进行解码,产生一个或多个量化的预测残差。
S6:在解压端的模型预测解压模块,基于一个或多个量化的预测残差和在接收端前一次的存储的深度神经网络,通过多个模型预测的方式,在接收端生成接收的深度神经网络。
其中,在模型预测解压模块,基于接收到的一个或多个量化的预测残差和在接收端存储的深度神经网络进行组合,包含对原存储的一个或多个深度神经网络模型进行替换或累加等方式,生成接收的深度神经网络。
其中,在模型预测解压模块,可以同时或非同时地接收一个或多个量化的预测残差,结合原存储的一个或多个深度神经网络部分或全部的累加或替换,最终通过一种组织方式,生成接收到的深度神经网络,并完成传输。
如图7所示,第二方面,本发明考虑了结合深度神经网络模型之间的潜在冗余性进行压缩,提出了多模型预测模块,包含压缩和解压模块,在压缩端和解压端利用了历史存储的“无用”的深度神经网络信息。
1:在模型预测压缩模块,基于本次和历史传输的一个或多个深度神经网络模型,将待传输模型部分或全部与历史传输的模型之间部分或全部的模型差异进行组合,生成一个或多个预测残差,并将相关预测所需信息进行传输。
2:在模型预测解压模块,基于接收到的一个或多个量化的预测残差和在接收端存储的深度神经网络进行组合,包含对原存储的深度神经网络模型进行替换或累加等方式,生成接收的深度神经网络。
3:在模型预测压缩和解压模块,对历史传输的深度神经网络模型和存储的深度神经网络进行添加、删除、修改。
通过上述的方法和***,本发明结合了深度神经网络在频繁传输上的多个模型之间的冗余性,利用了深度神经网络之间的知识信息进行压缩,减少了所需传输的大小和带宽。在相同带宽限制下,能更好地将深度神经网络进行传输,同时允许深度神经网络在前端进行针对性压缩的可能,而非只能将深度神经网 络进行针对性压缩后进行部分的还原。
需要说明的是:
在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述,构造这类装置所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它 实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的虚拟机的创建装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (13)

  1. 一种用于深度神经网络频繁传输的压缩方法,其特征在于,包括:
    基于本次和历史传输的一个或多个深度神经网络模型,将待传输模型部分或全部与历史传输的模型之间部分或全部的模型差异进行组合,生成一个或多个预测残差,并将相关预测所需信息进行传输;
    基于接收到的一个或多个量化的预测残差和在接收端存储的深度神经网络进行组合,对原存储的深度神经网络模型进行替换或累加,生成接收的深度神经网络。
  2. 根据权利要求1所述的方法,其特征在于,所述方法具体包括:传输端将待传输的深度神经网络送入压缩端,压缩端获得待传输的一个或多个深度神经网络的数据信息和组织方式;
    压缩端的预测模块基于本次和历史传输的一个或多个深度神经网络模型,进行多次传输的模型预测压缩,产生待传输的一个或多个深度神经网络的预测残差;
    压缩端的量化模块基于产生的一个或多个预测残差,采取一种或多种量化方式对预测残差进行量化,生成一个或多个量化的预测残差;
    压缩端的编码模块基于一个或多个产生的量化的预测残差,采取编码的方法对量化的预测残差进行编码,生成一个或多个编码的预测残差并传输;
    解压端接收到一个或多个编码的预测残差,在解压端的解压模块采取对应的解码方法对编码的预测残差进行解码,产生一个或多个量化的预测残差;
    在解压端的模型预测解压模块,基于一个或多个量化的预测残差和在接收端前一次的存储的深度神经网络,通过多个模型预测的方式,在接收端生成接收的深度神经网络。
  3. 根据权利要求2所述的方法,其特征在于,
    所述深度神经网络的数据信息和组织方式包含部分或全部深度神经网络的数据和网络结构。
  4. 根据权利要求2所述的方法,其特征在于,
    在所述压缩端基于频繁传输的环境下,能够获得对应接收端的历史传输的一个或多个深度神经网络模型的数据信息和组织方式,若无历史传输的深度神经网络模型,将空模型设定为默认的历史传输模型。
  5. 根据权利要求2所述的方法,其特征在于,
    所述模型预测压缩利用多个完整或预测的模型之间的冗余性进行压缩。
  6. 根据权利要求5所述的方法,其特征在于,
    所述模型预测压缩为以下方式之一:采用计算待传输的深度神经网络模型和历史传输的深度神经网络模型之间整体的残差,或待传输的深度神经网络模型以内部一层或多层结构的残差,或以卷积核衡量的残差进行传输。
  7. 根据权利要求2所述的方法,其特征在于,
    所述模型预测压缩包含源于一种或多种残差压缩粒度或深度神经网络的一种或多种数据信息和组织方式。
  8. 根据权利要求4所述的方法,其特征在于,
    所述接收端的历史传输的多个模型是完整无损模型和/或有损的部分模型。
  9. 根据权利要求2所述的方法,其特征在于,
    所述量化方式包含采取原始数据直接输出,或者对待传输的权重的精度控制,或者采取kmeans非线性量化算法。
  10. 根据权利要求2所述的方法,其特征在于,
    所述多个模型预测的方式包括:对原存储的一个或多个深度神经网络模型进行替换或累加。
  11. 根据权利要求2所述的方法,其特征在于,
    所述多个模型预测的方式包括:同时或非同时地接收一个或多个量化的预测残差,结合原存储的一个或多个深度神经网络部分或全部的累加或替换。
  12. 一种用于深度神经网络频繁传输的压缩***,其特征在于,包括:
    模型预测压缩模块,基于本次和历史传输的一个或多个深度神经网络模型,将待传输模型部分或全部与历史传输的模型之间部分或全部的模型差异进行组合,生成一个或多个预测残差,并将相关预测所需信息进行传输;
    模型预测解压模块,基于接收到的一个或多个量化的预测残差和在接收端存储的深度神经网络进行组合,对原存储的深度神经网络模型进行替换或累加,生成接收的深度神经网络。
  13. 根据权利要求12所述的方法,其特征在于,其中,在所述模型预测压缩模块和模型预测解压模块,能够对历史传输的深度神经网络模型和存储的深 度神经网络进行添加、删除、修改。
PCT/CN2019/082384 2018-05-29 2019-04-12 用于深度神经网络频繁传输的压缩方法及*** WO2019228082A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/057,882 US20210209474A1 (en) 2018-05-29 2019-04-12 Compression method and system for frequent transmission of deep neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810528239.4A CN108665067B (zh) 2018-05-29 2018-05-29 用于深度神经网络频繁传输的压缩方法及***
CN201810528239.4 2018-05-29

Publications (1)

Publication Number Publication Date
WO2019228082A1 true WO2019228082A1 (zh) 2019-12-05

Family

ID=63777949

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/082384 WO2019228082A1 (zh) 2018-05-29 2019-04-12 用于深度神经网络频繁传输的压缩方法及***

Country Status (3)

Country Link
US (1) US20210209474A1 (zh)
CN (1) CN108665067B (zh)
WO (1) WO2019228082A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12013958B2 (en) 2022-02-22 2024-06-18 Bank Of America Corporation System and method for validating a response based on context information
US12050875B2 (en) 2022-02-22 2024-07-30 Bank Of America Corporation System and method for determining context changes in text

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11037330B2 (en) * 2017-04-08 2021-06-15 Intel Corporation Low rank matrix compression
CN108665067B (zh) * 2018-05-29 2020-05-29 北京大学 用于深度神经网络频繁传输的压缩方法及***
US10785681B1 (en) * 2019-05-31 2020-09-22 Huawei Technologies Co., Ltd. Methods and apparatuses for feature-driven machine-to-machine communications
CN111814955B (zh) * 2020-06-19 2024-05-31 浙江大华技术股份有限公司 神经网络模型的量化方法、设备及计算机存储介质
CN116134791A (zh) * 2020-09-19 2023-05-16 华为技术有限公司 一种信道信息上报方法和装置
CN115150614A (zh) * 2021-03-30 2022-10-04 中国电信股份有限公司 图像特征的传输方法、装置和***
CN116527561A (zh) * 2022-01-20 2023-08-01 北京邮电大学 一种网络模型的残差传播方法和残差传播装置
CN114422606B (zh) * 2022-03-15 2022-06-28 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 联邦学习的通信开销压缩方法、装置、设备及介质
CN114422802B (zh) * 2022-03-28 2022-08-09 浙江智慧视频安防创新中心有限公司 一种基于码本的自编码机图像压缩方法
US20240097703A1 (en) * 2022-09-20 2024-03-21 Hong Kong Applied Science and Technology Research Institute Company Limited Hardware Implementation of Frequency Table Generation for Asymmetric-Numeral-System-Based Data Compression
US20240204804A1 (en) * 2022-12-16 2024-06-20 Industrial Technology Research Institute Data processing system and data processing method for deep neural network model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104735459A (zh) * 2015-02-11 2015-06-24 北京大学 视频局部特征描述子的压缩方法、***及视频压缩方法
CN106557812A (zh) * 2016-11-21 2017-04-05 北京大学 基于dct变换的深度卷积神经网络压缩与加速方案
CN107396124A (zh) * 2017-08-29 2017-11-24 南京大学 基于深度神经网络的视频压缩方法
CN107688850A (zh) * 2017-08-08 2018-02-13 北京深鉴科技有限公司 一种深度神经网络压缩方法
CN108665067A (zh) * 2018-05-29 2018-10-16 北京大学 用于深度神经网络频繁传输的压缩方法及***

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4717860B2 (ja) * 2007-08-22 2011-07-06 眞一郎 湯村 データ圧縮方法及び画像表示方法及び表示画像拡大方法
US8805106B2 (en) * 2008-09-26 2014-08-12 Futurewei Technologies, Inc. System and method for compressing and decompressing images and video
US10192327B1 (en) * 2016-02-04 2019-01-29 Google Llc Image compression with recurrent neural networks
CN106127297B (zh) * 2016-06-02 2019-07-12 中国科学院自动化研究所 基于张量分解的深度卷积神经网络的加速与压缩方法
KR20240104228A (ko) * 2016-07-14 2024-07-04 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. 변환 기반 잔차 코딩을 이용한 예측 화상 코딩
CN107679617B (zh) * 2016-08-22 2021-04-09 赛灵思电子科技(北京)有限公司 多次迭代的深度神经网络压缩方法
US10621486B2 (en) * 2016-08-12 2020-04-14 Beijing Deephi Intelligent Technology Co., Ltd. Method for optimizing an artificial neural network (ANN)
EP3507773A1 (en) * 2016-09-02 2019-07-10 Artomatix Ltd. Systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models, a multiscale synthesis framework and novel network architectures
CN106485316B (zh) * 2016-10-31 2019-04-02 北京百度网讯科技有限公司 神经网络模型压缩方法以及装置
CN107644252A (zh) * 2017-03-10 2018-01-30 南京大学 一种多机制混合的递归神经网络模型压缩方法
CN109302608B (zh) * 2017-07-25 2021-06-22 华为技术有限公司 图像处理方法、设备及***
KR102535361B1 (ko) * 2017-10-19 2023-05-24 삼성전자주식회사 머신 러닝을 사용하는 영상 부호화기 및 그것의 데이터 처리 방법
CN107832847A (zh) * 2017-10-26 2018-03-23 北京大学 一种基于稀疏化后向传播训练的神经网络模型压缩方法
CN107832837B (zh) * 2017-11-28 2021-09-28 南京大学 一种基于压缩感知原理的卷积神经网络压缩方法及解压缩方法
US11295208B2 (en) * 2017-12-04 2022-04-05 International Business Machines Corporation Robust gradient weight compression schemes for deep learning applications
WO2019118639A1 (en) * 2017-12-12 2019-06-20 The Regents Of The University Of California Residual binary neural network
US20190287217A1 (en) * 2018-03-13 2019-09-19 Microsoft Technology Licensing, Llc Machine learning system for reduced network bandwidth transmission of content
US10783622B2 (en) * 2018-04-25 2020-09-22 Adobe Inc. Training and utilizing an image exposure transformation neural network to generate a long-exposure image from a single short-exposure image
WO2019219846A1 (en) * 2018-05-17 2019-11-21 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concepts for distributed learning of neural networks and/or transmission of parameterization updates therefor
US10855986B2 (en) * 2018-05-29 2020-12-01 Qualcomm Incorporated Bandwidth compression for neural network systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104735459A (zh) * 2015-02-11 2015-06-24 北京大学 视频局部特征描述子的压缩方法、***及视频压缩方法
CN106557812A (zh) * 2016-11-21 2017-04-05 北京大学 基于dct变换的深度卷积神经网络压缩与加速方案
CN107688850A (zh) * 2017-08-08 2018-02-13 北京深鉴科技有限公司 一种深度神经网络压缩方法
CN107396124A (zh) * 2017-08-29 2017-11-24 南京大学 基于深度神经网络的视频压缩方法
CN108665067A (zh) * 2018-05-29 2018-10-16 北京大学 用于深度神经网络频繁传输的压缩方法及***

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12013958B2 (en) 2022-02-22 2024-06-18 Bank Of America Corporation System and method for validating a response based on context information
US12050875B2 (en) 2022-02-22 2024-07-30 Bank Of America Corporation System and method for determining context changes in text

Also Published As

Publication number Publication date
CN108665067A (zh) 2018-10-16
US20210209474A1 (en) 2021-07-08
CN108665067B (zh) 2020-05-29

Similar Documents

Publication Publication Date Title
WO2019228082A1 (zh) 用于深度神经网络频繁传输的压缩方法及***
US11057634B2 (en) Content adaptive optimization for neural data compression
US12026925B2 (en) Channel-wise autoregressive entropy models for image compression
WO2020237646A1 (zh) 图像处理方法、设备及计算机可读存储介质
CN110602494A (zh) 基于深度学习的图像编码、解码***及编码、解码方法
CN111641826B (zh) 对数据进行编码、解码的方法、装置与***
US20230186927A1 (en) Compressing audio waveforms using neural networks and vector quantizers
Li et al. Multiple description coding based on convolutional auto-encoder
Hong et al. Efficient neural image decoding via fixed-point inference
JP2020053820A (ja) 量子化及び符号化器作成方法、圧縮器作成方法、圧縮器作成装置及びプログラム
CN108632630A (zh) 一种结合位运算和概率预测的二值图像编码方法
CN111050170A (zh) 基于gan的图片压缩***构建方法、压缩***及方法
CN116527943B (zh) 基于矢量量化索引和生成模型的极限图像压缩方法及***
WO2023184980A1 (zh) 一种基于码本的自编码机图像压缩方法
CN114501031B (zh) 一种压缩编码、解压缩方法以及装置
Lu et al. Image Compression Based on Mean Value Predictive Vector Quantization.
US20240212221A1 (en) Rate-adaptive codec for dynamic point cloud compression
CN111294055B (zh) 一种基于自适应字典的数据压缩的编解码方法
CN115329952B (zh) 一种模型压缩方法、装置和可读存储介质
WO2024134350A1 (en) Rate-adaptive codec for dynamic point cloud compression
US20240244218A1 (en) Encoding method, decoding method, bitstream, encoder, decoder, storage medium, and system
Moon et al. Local Non-linear Quantization for Neural Network Compression in MPEG-NNR
KR20230134856A (ko) 정규화 플로우를 활용한 오디오 신호를 부호화 및 복호화 하는 방법 및 그 학습 방법
JP2024527536A (ja) ニューラルネットワークおよびベクトル量子化器を使用したオーディオ波形の圧縮
WO2023222313A1 (en) A method, an apparatus and a computer program product for machine learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19810187

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 300421)

122 Ep: pct application non-entry in european phase

Ref document number: 19810187

Country of ref document: EP

Kind code of ref document: A1