CN112232509A

CN112232509A - Edge calculation model compression optimization method, device, equipment and readable medium

Info

Publication number: CN112232509A
Application number: CN202011078937.2A
Authority: CN
Inventors: 赵冰; 张清
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-01-15

Abstract

The invention discloses a compression optimization method of an edge calculation model, which comprises the following steps: training to obtain an initial model, and pruning the initial model based on the contribution value of the neuron; adjusting parameters of the model obtained after pruning, and iterating the pruning process to obtain a compressed model; and carrying out model accelerated optimization on the compressed model based on TensrT, and deploying the accelerated optimized model at the edge end. The invention also discloses an edge calculation model compression optimization device, computer equipment and a readable storage medium. The method realizes the compression and model optimization of the deep learning model by combining the model pruning and the TensorRT, and is beneficial to applying the model with a complex cloud structure and a large volume to the edge deployment. On the premise of lower model performance loss, the storage space and computational power requirements needed by the model are effectively reduced, more models can be deployed by the edge device, real-time analysis of multi-model multi-channel videos is realized, and the performance of the edge device is utilized to the maximum extent.

Description

Edge calculation model compression optimization method, device, equipment and readable medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a compression optimization method, a compression optimization device, compression optimization equipment and a readable medium for an edge calculation model.

Background

With the development of deep learning technology, more and more artificial intelligent solutions enter a landing stage. Due to the limitation of model calculation speed, network transmission speed and scene conditions in part of scenes, a technical route for transmitting data to a cloud model and returning a result to scene equipment after calculation in the traditional method does not adapt to the requirements of the corresponding scene any more.

The appearance of edge computing provides a solution for the above problems, and the deep learning model is directly deployed in edge equipment, and the edge equipment directly obtains a computing result after receiving data without depending on a cloud environment. The edge device generally has limited storage space and is often deployed with a plurality of deep learning models, the loading speed of the models is influenced by overlarge models, and the occupation of resources such as memory, storage and the like is overlarge.

Great serves as a mainstream equipment manufacturer in the field of edge computing equipment at present, and a set of technical solution integrating hardware computing power and software development components is provided by Jetson series equipment. The hardware equipment of the cloud development model is the same as the current mainstream cloud server equipment in structure, and the cloud development model can run on the edge end equipment in a seamless mode. Meanwhile, a software component of the method comprises TensorRT, a model acceleration optimization scheme is provided, and the inference speed optimization work of the model at the edge end is further realized.

In the prior art, accelerated optimization of a model is mainly concerned, and the volume of the model is not sufficiently concerned by compression, so that the volume of the optimized model is still large. Although it provides a quantization compression scheme that can compress the model volume to some extent, the main objective is to improve the model inference speed. And the mainstream model compression algorithm is not supported, so that the model compression cannot be effectively realized, and the storage space of equipment is saved.

Disclosure of Invention

In view of this, an object of the embodiments of the present invention is to provide a method, an apparatus, a device, and a readable medium for compression and optimization of an edge computation model, which combine model pruning and TensorRT to implement compression and model optimization of a deep learning model, and are helpful for applying a model with a complex cloud structure and a large volume to edge deployment, so as to meet model deployment requirements under the condition of limited computation resources and storage resources of edge devices. On the premise of less model performance loss, the storage space and the calculation force requirements needed by the model can be effectively reduced, more models can be deployed by the edge equipment, the real-time analysis of multi-model multi-channel videos is realized, and the performance of the edge equipment is utilized to the maximum extent.

Based on the above purpose, an aspect of the embodiments of the present invention provides a method for compressing and optimizing an edge computation model, including the following steps: training to obtain an initial model, and pruning the initial model based on the contribution value of the neuron; adjusting parameters of the model obtained after pruning, and iterating the pruning process to obtain a compressed model; and carrying out model accelerated optimization on the compressed model based on TensrT, and deploying the accelerated optimized model at the edge end.

In some embodiments, pruning the initial model based on the contribution values of the neurons comprises: the contribution values of the neurons are ranked and neurons with contribution values below a threshold are removed.

In some embodiments, pruning the initial model based on the contribution values of the neurons comprises: selecting a scaling factor in a model normalization layer as a judgment basis for the contribution value of the neuron; and selecting the layer with the scaling factor smaller than the threshold value for pruning.

In some embodiments, further comprising: and penalizing the scaling factor through a loss function so that the scaling factor is gradually reduced in the iterative pruning process.

In some embodiments, the parameter adjustment is performed on the model obtained after pruning, and the iterative pruning process includes: adjusting parameters of the model obtained after pruning based on model performance loss and model compression ratio; and carrying out iterative pruning on the model after parameter adjustment based on the contribution value of the neuron.

In some embodiments, training results in an initial model, pruning the initial model based on the contribution values of the neurons comprising: training on a cloud server to obtain an initial model, and pruning the initial model based on the contribution value of the neuron; adjusting parameters of the model obtained after pruning, and iterating the pruning process to obtain a compressed model, wherein the compressed model comprises the following steps: and (4) adjusting parameters of the model obtained after pruning on the cloud server, and iterating the pruning process to obtain a compressed model.

In some embodiments, model-accelerated optimization of the post-compression model based on TensorRT comprises: and carrying out model acceleration optimization on the compressed model based on TensrT at the edge end.

In another aspect of the embodiments of the present invention, there is also provided an edge calculation model compression optimization apparatus, including: the training module is configured for training to obtain an initial model and pruning the initial model based on the contribution value of the neuron; the compression module is configured to adjust parameters of the model obtained after pruning and iterate the pruning process to obtain a compressed model; and the deployment module is configured to perform model accelerated optimization on the compressed model based on TensrT and deploy the accelerated optimized model at the edge end.

In some embodiments, the training module is further configured to: the contribution values of the neurons are ranked and neurons with contribution values below a threshold are removed.

In some embodiments, the training module is further configured to: selecting a scaling factor in a model normalization layer as a judgment basis for the contribution value of the neuron; and selecting the layer with the scaling factor smaller than the threshold value for pruning.

In some embodiments, the training module is further configured to: and penalizing the scaling factor through a loss function so that the scaling factor is gradually reduced in the iterative pruning process.

In some embodiments, the compression module is further configured to: adjusting parameters of the model obtained after pruning based on model performance loss and model compression ratio; and carrying out iterative pruning on the model after parameter adjustment based on the contribution value of the neuron.

In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method.

In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.

The invention has the following beneficial technical effects: the model pruning and the TensorRT are combined to realize the compression and model optimization of the deep learning model, the model with a complex cloud structure and a large volume is favorably applied to the deployment of the edge end, and the model deployment requirements under the condition of limited computing resources and storage resources of edge equipment are met. On the premise of less model performance loss, the storage space and the calculation force requirements needed by the model can be effectively reduced, more models can be deployed by the edge equipment, the real-time analysis of multi-model multi-channel videos is realized, and the performance of the edge equipment is utilized to the maximum extent.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a schematic diagram of an embodiment of an edge calculation model compression optimization method provided by the present invention;

FIG. 2 is a schematic diagram of an embodiment of an edge calculation model compression optimization apparatus provided in the present invention;

FIG. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention;

FIG. 4 is a schematic diagram of an embodiment of a computer-readable storage medium provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In view of the above, according to a first aspect of the embodiments of the present invention, an embodiment of a method for optimizing compression of an edge calculation model is provided. FIG. 1 is a schematic diagram illustrating an embodiment of an edge calculation model compression optimization method provided by the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:

s01, training to obtain an initial model, and pruning the initial model based on the contribution value of the neuron;

s02, adjusting parameters of the model obtained after pruning, and iterating the pruning process to obtain a compressed model; and

and S03, performing model accelerated optimization on the compressed model based on TensorRT, and deploying the accelerated optimized model at the edge end.

In this embodiment, model compression based on a model pruning technology is performed on a deep learning model after training is completed in a cloud server, and the compressed model is further optimized in an accelerated manner by combining with an invida TensrT method. And finally, deploying the compressed and optimized model at the edge end, and reducing the occupation of the model on the storage resources of the edge equipment.

In this embodiment, model compression and optimization of the target detection task are taken as an example, in order to implement model compression and acceleration of the target detection model on the edge computing device Jetson. Firstly, training a YOLOV3 target detection network model on a cloud server is realized, after the training is finished, based on a model pruning technology, after the compression performance index expected to be reached by the model is specified, the model compression technology automatically adopts different sparsity degrees to different layers of the model to explore a compressible space. And after the model is compressed, testing the detection performance and the model compression ratio of the model, and balancing between the performance loss of the model and the model compression ratio to obtain the optimal compression model. And after the model compression is finished, optimizing and adjusting the model network by combining a TensorRT tool so as to realize the acceleration of model calculation. And deploying the model after model compression and optimization on a Jetson device. And on the premise that the test result of the server side is that the model precision is lost by 11.7%, the model volume is compressed by 82%, the detection speed is improved by 113.8%, and the Parameters are compressed by 82%. Under the condition of ensuring acceptable precision loss, the model volume can be obtained, the parameter quantity is greatly compressed, and the detection speed is greatly improved.

In some embodiments of the invention, pruning the initial model based on the contribution values of the neurons comprises: the contribution values of the neurons are ranked and neurons with contribution values below a threshold are removed.

In this embodiment, neurons in the network are ranked according to their contribution to the performance of the model, and neurons with lower rankings are removed according to a threshold, resulting in a smaller and faster network.

In some embodiments of the invention, pruning the initial model based on the contribution values of the neurons comprises: selecting a scaling factor in a model normalization layer as a judgment basis for the contribution value of the neuron; and selecting the layer with the scaling factor smaller than the threshold value for pruning.

In this embodiment, the scaling factor γ in the batch normalization layer (BN) is selected as the importance determination criterion, and the smaller the value of the scaling factor γ is, the lower the importance of the corresponding connection channel is, and the layer with lower importance can be preferentially clipped.

In some embodiments of the invention, further comprising: and penalizing the scaling factor through a loss function so that the scaling factor is gradually reduced in the iterative pruning process.

In the present embodiment, a smooth-L1 loss function is used to penalize γ to drive smaller γ toward 0. The loss function is defined as:

where (x, y) are the training input and target, W is the trainable weight, g () is the penalty function for guiding sparsity, γ is the adjustment of both terms, and the Smooth curve at zero is obtained by the Smooth _ L1 method.

In some embodiments of the present invention, the adjusting parameters of the model obtained after pruning, and the iterative pruning process includes: adjusting parameters of the model obtained after pruning based on model performance loss and model compression ratio; and carrying out iterative pruning on the model after parameter adjustment based on the contribution value of the neuron.

In the embodiment, the training-pruning-repeating iterative pruning method is adopted to gradually reduce the parameter weight, so that the model performance is prevented from being reduced by removing excessive neuron connections at one time. The performance loss of the model is reduced as small as possible, and the model volume and parameter compression with a larger proportion is realized.

In some embodiments of the invention, training results in an initial model, and pruning the initial model based on the contribution values of the neurons comprises: training on a cloud server to obtain an initial model, and pruning the initial model based on the contribution value of the neuron; adjusting parameters of the model obtained after pruning, and iterating the pruning process to obtain a compressed model, wherein the compressed model comprises the following steps: and (4) adjusting parameters of the model obtained after pruning on the cloud server, and iterating the pruning process to obtain a compressed model.

In some embodiments of the invention, model-accelerated optimization of the compressed model based on TensorRT comprises: and carrying out model acceleration optimization on the compressed model based on TensrT at the edge end.

In the embodiment, model pruning and TensorRT are combined to realize compression and model optimization of the deep learning model, so that the model with a complex cloud structure and a large volume is favorably applied to edge deployment, and the model deployment requirements under the condition of limited computing resources and storage resources of edge equipment are met. On the premise of less loss of model performance, the storage space and computational power requirements needed by the model can be effectively reduced, and by combining the method provided by the invention, the edge equipment can deploy more models, so that the real-time analysis of multi-model multi-channel videos is realized, and the performance of the edge equipment is utilized to the maximum extent.

It should be particularly noted that, the steps in the embodiments of the edge calculation model compression optimization method described above may be mutually intersected, replaced, added, and deleted, and therefore, these reasonable permutation and combination transformations applied to the edge calculation model compression optimization method should also belong to the scope of the present invention, and should not limit the scope of the present invention to the embodiments.

In view of the above object, according to a second aspect of the embodiments of the present invention, an edge calculation model compression optimization apparatus is provided. Fig. 2 is a schematic diagram illustrating an embodiment of an edge calculation model compression optimization apparatus provided by the present invention. As shown in fig. 2, the embodiment of the present invention includes the following modules: the training module S11 is configured to be used for training to obtain an initial model, and pruning the initial model based on the contribution value of the neuron; the compression module S12 is configured to adjust parameters of the model obtained after pruning and iterate the pruning process to obtain a compressed model; and the deployment module S13 is configured to perform model accelerated optimization on the compressed model based on TensrT and deploy the accelerated optimized model at the edge.

In some embodiments of the invention, the training module S11 is further configured to: the contribution values of the neurons are ranked and neurons with contribution values below a threshold are removed.

In some embodiments of the invention, the training module S11 is further configured to: selecting a scaling factor in a model normalization layer as a judgment basis for the contribution value of the neuron; and selecting the layer with the scaling factor smaller than the threshold value for pruning.

In some embodiments of the invention, the training module S11 is further configured to: and penalizing the scaling factor through a loss function so that the scaling factor is gradually reduced in the iterative pruning process.

In some embodiments of the invention, the compression module S12 is further configured to: adjusting parameters of the model obtained after pruning based on model performance loss and model compression ratio; and carrying out iterative pruning on the model after parameter adjustment based on the contribution value of the neuron.

In view of the above object, a third aspect of the embodiments of the present invention provides a computer device. Fig. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in fig. 3, an embodiment of the present invention includes the following means: at least one processor S21; and a memory S22, the memory S22 storing computer instructions S23 executable on the processor, the instructions when executed by the processor implementing the steps of the above method.

The invention also provides a computer readable storage medium. FIG. 4 is a schematic diagram illustrating an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 4, the computer readable storage medium stores S31 a computer program that, when executed by a processor, performs the method as described above S32.

Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by a computer program to instruct related hardware, and the program of the edge calculation model compression optimization method can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods as described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

Furthermore, the methods disclosed according to embodiments of the present invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions defined in the methods disclosed in embodiments of the invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. An edge calculation model compression optimization method is characterized by comprising the following steps:

training to obtain an initial model, and pruning the initial model based on the contribution value of the neuron;

adjusting parameters of the model obtained after pruning, and iterating the pruning process to obtain a compressed model; and

and carrying out model accelerated optimization on the compressed model based on TensrT, and deploying the accelerated and optimized model at the edge end.

2. The edge-computing model compression optimization method of claim 1, wherein pruning the initial model based on contribution values of neurons comprises:

the contribution values of the neurons are ranked and neurons with contribution values below a threshold are removed.

3. The edge-computing model compression optimization method of claim 1, wherein pruning the initial model based on contribution values of neurons comprises:

selecting a scaling factor in a model normalization layer as a judgment basis for the contribution value of the neuron;

and selecting the layer with the scaling factor smaller than the threshold value for pruning.

4. The edge computing model compression optimization method of claim 3, further comprising:

and penalizing the scaling factor through a loss function, so that the scaling factor is gradually reduced in the iterative pruning process.

5. The edge calculation model compression optimization method of claim 1, wherein the parameter adjustment is performed on the model obtained after pruning, and the iterative pruning process comprises:

adjusting parameters of the model obtained after pruning based on model performance loss and model compression ratio;

and carrying out iterative pruning on the model after parameter adjustment based on the contribution value of the neuron.

6. The edge computing model compression optimization method of claim 1, wherein an initial model is obtained by training, and pruning the initial model based on contribution values of neurons comprises:

training on a cloud server to obtain an initial model, and pruning the initial model based on the contribution value of a neuron;

adjusting parameters of the model obtained after pruning, and iterating the pruning process to obtain a compressed model, wherein the compressed model comprises the following steps: and (4) adjusting parameters of the model obtained after pruning on the cloud server, and iterating the pruning process to obtain a compressed model.

7. The edge computing model compression optimization method of claim 1, wherein performing model-accelerated optimization on the compressed model based on TensorRT comprises:

and carrying out model acceleration optimization on the compressed model based on TensrT at an edge end.

8. An edge calculation model compression optimization apparatus, comprising:

the training module is configured for training to obtain an initial model and pruning the initial model based on the contribution value of the neuron;

the compression module is configured to adjust parameters of the model obtained after pruning and iterate the pruning process to obtain a compressed model; and

and the deployment module is configured to perform model accelerated optimization on the compressed model based on TensrT and deploy the accelerated optimized model at the edge end.

9. A computer device, comprising:

at least one processor; and

a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of any of the methods 1-7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.