CN113537470A - Model quantization method and device, storage medium and electronic device - Google Patents

Model quantization method and device, storage medium and electronic device Download PDF

Info

Publication number
CN113537470A
CN113537470A CN202110825902.9A CN202110825902A CN113537470A CN 113537470 A CN113537470 A CN 113537470A CN 202110825902 A CN202110825902 A CN 202110825902A CN 113537470 A CN113537470 A CN 113537470A
Authority
CN
China
Prior art keywords
target
channel
layer
channels
target channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110825902.9A
Other languages
Chinese (zh)
Inventor
赵梦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202110825902.9A priority Critical patent/CN113537470A/en
Publication of CN113537470A publication Critical patent/CN113537470A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to the field of machine learning technologies, and in particular, to a method and an apparatus for model quantization, a computer-readable storage medium, and an electronic device, where the model includes a plurality of reference layers, where the reference layers include at least one of a convolutional layer and a full link layer, and the method includes: acquiring an outlier of the weight in the reference layer, and determining a target channel in the reference layer according to the outlier; copying each target channel, and adjusting the weight value in each target channel according to the copying times of each target channel to obtain a target layer corresponding to the reference layer; and performing quantization processing on each target layer, and determining the number of input channels corresponding to each target channel according to the number of each target channel so as to finish the quantization of the model. The technical scheme of the embodiment of the disclosure solves the influence of outliers on the dynamic range, and improves the accuracy of the quantized model.

Description

Model quantization method and device, storage medium and electronic device
Technical Field
The present disclosure relates to the field of machine learning technologies, and in particular, to a model quantization method and apparatus, a computer-readable storage medium, and an electronic device.
Background
With the increasing development of machine learning, the application of neural network models is becoming more and more extensive. In order to make up the contradiction between the demand and supply of the computing power of the mobile terminal, in recent years, models are compressed to be popular for the research of the field of deep learning, wherein the model Quantization (Quantization) technology can not only improve the computing efficiency, but also reduce the memory occupation and the energy consumption.
The model quantization method in the prior art mainly adopts a threshold value clipping quantization method and an outlier sensing quantization method, wherein the threshold value clipping quantization method compresses an outlier to a threshold value T, and although the influence of the outlier on a dynamic range is solved, the outlier is distorted; the outlier sensing quantization method needs special hardware and is not suitable for commercial mobile equipment such as mobile phones.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The purpose of the present disclosure is to provide a model quantization method, a model quantization apparatus, a computer-readable medium, and an electronic device, so as to solve the influence of outliers on a dynamic range at least to a certain extent, and improve the accuracy of a quantized model.
According to a first aspect of the present disclosure, there is provided a method of model quantization, the model comprising a plurality of reference layers, wherein the reference layers comprise at least one of a convolutional layer and a fully-connected layer, the method comprising:
acquiring an outlier of the weight in the reference layer, and determining a target channel in the reference layer according to the outlier;
copying each target channel, and adjusting the weight value in each target channel according to the copying times of each target channel to obtain a target layer corresponding to the reference layer;
and performing quantization processing on each target layer, and determining the number of input channels corresponding to each target channel according to the number of each target channel so as to finish the quantization of the model.
According to a second aspect of the present disclosure, there is provided an apparatus for model quantization, the model comprising a plurality of reference layers, wherein the reference layers comprise at least one of convolutional layers and fully-connected layers, the apparatus comprising:
the determining module is used for acquiring an outlier of the weight in the reference layer, determining the determining module according to the outlier, acquiring the outlier of the weight in the reference layer, and determining a target channel in the reference layer according to the outlier;
the copying module is used for copying each target channel and adjusting the weight value in each target channel according to the copying times of each target channel to obtain a target layer corresponding to the reference layer;
and the quantization module is used for performing quantization processing on each target layer and determining the number of input channels corresponding to each target channel according to the number of each target channel so as to finish the quantization of the model.
According to a third aspect of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the above-mentioned method.
According to a fourth aspect of the present disclosure, there is provided an electronic apparatus, comprising:
a processor; and
a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the above-described method.
According to the model quantization method provided by the embodiment of the disclosure, an outlier of a weight in a reference layer is obtained, and a target channel in the reference layer is determined according to the outlier; copying a target channel, and adjusting a weight value in the target channel according to the copying times to obtain a target layer corresponding to the reference layer; and carrying out quantization processing on the target layer, and determining the number of input channels corresponding to the target channels according to the number of the target channels so as to finish the quantization of the model. Compared with the prior art, on one hand, the target channel where the outlier is located is copied, and the weight value of the channel where the outlier is located is modified according to the copying times, so that the influence of the outlier on the model quantization precision is reduced, and the influence of the outlier on the dynamic range is solved. On the other hand, the number of the input channels corresponding to the target channels is determined according to the number of the target channels, and under the condition that the output result is not changed, the identification precision of the quantized model is improved. On the other hand, no other special hardware is introduced, so that the mobile terminal can also finish the quantification of the model, and the application range of the model quantification method is expanded.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which embodiments of the present disclosure may be applied;
FIG. 2 shows a schematic diagram of an electronic device to which embodiments of the present disclosure may be applied;
FIG. 3 schematically illustrates a flow chart of a method of model quantization in an exemplary embodiment of the disclosure;
FIG. 4 schematically illustrates a flow chart for obtaining a target channel in an exemplary embodiment of the disclosure;
FIG. 5 schematically illustrates a schematic diagram of replicating an input channel in an exemplary embodiment of the disclosure;
FIG. 6 schematically illustrates a flow chart of a model quantification method when the number of replications is 1 in an exemplary embodiment of the present disclosure;
FIG. 7 schematically illustrates a diagram of a pedestrian motion pattern recognition model in an exemplary embodiment of the present disclosure;
fig. 8 schematically illustrates a composition diagram of a model quantizing device in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
Fig. 1 is a schematic diagram illustrating a system architecture of an exemplary application environment to which a model quantization method and apparatus according to an embodiment of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal devices 101, 102, 103 may be various electronic devices with model quantification functionality including, but not limited to, desktop computers, laptop computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The model quantization method provided by the embodiment of the present disclosure is generally executed by the terminal devices 101, 102, and 103, and accordingly, the model quantization apparatus is generally disposed in the terminal devices 101, 102, and 103. However, it is easily understood by those skilled in the art that the model quantization method provided in the embodiment of the present disclosure may also be executed by the server 105, and accordingly, the model quantization apparatus may also be disposed in the server 105, which is not particularly limited in the exemplary embodiment. For example, in an exemplary embodiment, the user may upload the model to the server 105 through the terminal devices 101, 102, and 103, and the server may perform quantization on the model through the model quantization method provided in the embodiment of the present disclosure, and transmit the quantized model to the terminal devices 101, 102, and 103.
The exemplary embodiment of the present disclosure provides an electronic device for implementing a model quantization method, which may be the terminal device 101, 102, 103 or the server 105 in fig. 1. The electronic device includes at least a processor and a memory for storing executable instructions of the processor, the processor configured to perform a model quantification method via execution of the executable instructions.
The following takes the mobile terminal 200 in fig. 2 as an example, and exemplifies the configuration of the electronic device. It will be appreciated by those skilled in the art that the configuration of figure 2 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes. In other embodiments, mobile terminal 200 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. The interfacing relationship between the components is only schematically illustrated and does not constitute a structural limitation of the mobile terminal 200. In other embodiments, the mobile terminal 200 may also interface differently than shown in fig. 2, or a combination of multiple interfaces.
As shown in fig. 2, the mobile terminal 200 may specifically include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor module 280, a display 290, a camera module 291, an indicator 292, a motor 293, a button 294, and a Subscriber Identity Module (SIM) card interface 295. Wherein the sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, and the like.
Processor 210 may include one or more processing units, such as: the Processor 210 may include an Application Processor (AP), a modem Processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband Processor, and/or a Neural-Network Processing Unit (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors.
The NPU is a Neural-Network (NN) computing processor, which processes input information quickly by using a biological Neural Network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the mobile terminal 200, for example: image recognition, face recognition, speech recognition, text understanding, and the like.
A memory is provided in the processor 210. The memory may store instructions for implementing six modular functions: detection instructions, connection instructions, information management instructions, analysis instructions, data transmission instructions, and notification instructions, and execution is controlled by processor 210.
The charge management module 240 is configured to receive a charging input from a charger. The power management module 241 is used for connecting the battery 242, the charging management module 240 and the processor 210. The power management module 241 receives the input of the battery 242 and/or the charging management module 240, and supplies power to the processor 210, the internal memory 221, the display screen 290, the camera module 291, the wireless communication module 260, and the like.
The wireless communication function of the mobile terminal 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like. Wherein, the antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals; the mobile communication module 250 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied to the mobile terminal 200; the modem processor may include a modulator and a demodulator; the Wireless communication module 260 may provide a solution for Wireless communication including a Wireless Local Area Network (WLAN) (e.g., a Wireless Fidelity (Wi-Fi) network), Bluetooth (BT), and the like, applied to the mobile terminal 200. In some embodiments, antenna 1 of the mobile terminal 200 is coupled to the mobile communication module 250 and antenna 2 is coupled to the wireless communication module 260, such that the mobile terminal 200 may communicate with networks and other devices via wireless communication techniques.
The mobile terminal 200 implements a display function through the GPU, the display screen 290, the application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 290 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.
The mobile terminal 200 may implement a photographing function through the ISP, the camera module 291, the video codec, the GPU, the display screen 290, the application processor, and the like. The ISP is used for processing data fed back by the camera module 291; the camera module 291 is used for capturing still images or videos; the digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals; the video codec is used to compress or decompress digital video, and the mobile terminal 200 may also support one or more video codecs.
The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the mobile terminal 200. The external memory card communicates with the processor 210 through the external memory interface 222 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
Internal memory 221 may be used to store computer-executable program code, which includes instructions. The internal memory 221 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the mobile terminal 200, and the like. In addition, the internal memory 221 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk Storage device, a Flash memory device, a Universal Flash Storage (UFS), and the like. The processor 210 executes various functional applications of the mobile terminal 200 and data processing by executing instructions stored in the internal memory 221 and/or instructions stored in a memory provided in the processor.
The mobile terminal 200 may implement an audio function through the audio module 270, the speaker 271, the receiver 272, the microphone 273, the earphone interface 274, the application processor, and the like. Such as music playing, recording, etc.
The depth sensor 2801 is used to acquire depth information of a scene. In some embodiments, a depth sensor may be provided to the camera module 291.
The pressure sensor 2802 is used to sense a pressure signal and convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 2802 may be disposed on the display screen 290. Pressure sensor 2802 can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like.
The gyro sensor 2803 may be used to determine a motion gesture of the mobile terminal 200. In some embodiments, the angular velocity of the mobile terminal 200 about three axes (i.e., x, y, and z axes) may be determined by the gyroscope sensor 2803. The gyro sensor 2803 can be used to photograph anti-shake, navigation, body-feel game scenes, and the like.
In addition, other functional sensors, such as an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc., may be provided in the sensor module 280 according to actual needs.
Other devices for providing auxiliary functions may also be included in mobile terminal 200. For example, the keys 294 include a power-on key, a volume key, and the like, and a user can generate key signal inputs related to user settings and function control of the mobile terminal 200 through key inputs. Further examples include indicator 292, motor 293, SIM card interface 295, etc.
In the related art, deep learning is widely applied and becomes the mainstream direction of machine learning, but as the network structure becomes more and more complex, the requirement on computing power becomes higher and higher. With the enhancement of computing power of the end equipment, the application of the mobile end deep learning is more and more extensive.
The intelligent sensing hub is a solution based on combination of software and hardware on a low-power consumption MCU and a lightweight RTOS operating system, and has a main function of connecting and processing data from various sensor devices. Due to its low power consumption characteristics, deep learning models are also increasingly being deployed today.
In order to make up the contradiction between the demand and supply of the computing power of the mobile terminal, in recent years, models are compressed to be popular for the research of the field of deep learning, wherein the model Quantization (Quantization) technology can not only improve the computing efficiency, but also reduce the memory occupation and the energy consumption. Therefore, various model quantization algorithms have been a popular research direction in the deep learning of the mobile terminal.
The mainstream quantization method is to linearly map the entire range of the distribution to the range of the quantization grid, which maps the rarely occurring outliers into the quantization grid as well. Therefore, when an outlier occurs in the data, the data distribution is not uniform, and a part of the dynamic range is wasted according to the maximum value mapping, so that the quantization precision loss is large. To address this problem, there are two approaches: one is Clipping (Clipping), that is, finding an optimal threshold T, and thresholding the outlier before applying linear quantization to minimize the loss of quantization accuracy, for example, the TensorRT of NVIDIA uses relative entropy (KL-subvrgence) to find the optimal threshold T; another method is Outlier-Aware Quantization (Outlier-Aware Quantization), which separately quantizes outliers and center values, uses a low-precision grid for the center values, and a high-precision grid for outliers, and reduces the influence of outliers on the Quantization precision.
The model quantization method and the model quantization apparatus according to exemplary embodiments of the present disclosure are specifically described below.
Fig. 3 shows a flow of a model quantization method in the present exemplary embodiment, the model including a plurality of reference layers, wherein the reference layers include at least one of a convolutional layer and a fully-connected layer, the model quantization method including the steps of:
step S310, acquiring an outlier of the weight in the reference layer, and determining a target channel in the reference layer according to the outlier;
step S320, copying the target channel, and adjusting a weight value in the target channel according to the copying times to obtain a target layer corresponding to the reference layer;
step S330, carrying out quantization processing on the target layer, and determining the number of input channels corresponding to the target channels according to the number of the target channels to finish the quantization of the model.
Compared with the prior art, on one hand, the target channel where the outlier is located is copied, and the weight value of the channel where the outlier is located is modified according to the copying times, so that the influence of the outlier on the model quantization precision is reduced, and the influence of the outlier on the dynamic range is solved. On the other hand, the number of the input channels corresponding to the target channels is determined according to the number of the target channels, and under the condition that the output result is not changed, the identification precision of the quantized model is improved. On the other hand, no other special hardware is introduced, so that the mobile terminal can also finish the quantification of the model, and the application range of the model quantification method is expanded.
The respective steps of the model quantization method in the present disclosure are explained in detail below.
In step S310, an outlier of the weight in the reference layer is obtained, and a target channel in the reference layer is determined according to the outlier.
In an exemplary embodiment, the model of the present disclosure may be a CNN model, a DNN model, or another type of model, which is not specifically limited in this exemplary embodiment.
In this example embodiment, the model may include a plurality of reference layers, where a reference component may include at least one of a convolutional layer or a fully-connected layer, and the present disclosure may obtain, for each reference layer, an outlier of a weight in the reference layer, specifically obtain a histogram of the weight in the reference layer to determine the outlier in the weight, or obtain the outlier of the weight in the reference layer in another manner, which is not specifically limited in this example embodiment.
In the present exemplary embodiment, after determining the outlier of the weight, the target channel in the reference layer may be determined according to the outlier, and as shown in fig. 4, determining the target channel in the reference layer according to the outlier may include steps S410 to S430.
In step S410, a reference channel corresponding to each outlier is determined, and the number of outliers in each reference channel is determined.
In an example embodiment of the present disclosure, after determining the outliers, reference channels corresponding to the outliers may be determined, and the number of the outliers in each of the reference channels may be counted.
In step S420, a separation channel coefficient is set according to the number of reference channels and the number of channels in the reference layer.
In this exemplary embodiment, the number of the reference channels may be obtained first, and then a separation channel coefficient may be determined according to the number of the reference channels, specifically, all the number of channels in the reference layer may be determined first, and then the separation channel coefficient may be set according to a ratio of the number of the reference channels to the number of all the channels, where a relationship between the classification channel coefficient and the ratio is positively correlated, where a relevant parameter may be 1, or may be 0.5, 0.3, and the like, and the relevant parameter may also be customized according to a requirement of a user, which is not specifically limited in this exemplary embodiment.
In step S430, a target channel is determined in the reference channel according to the number of outliers in the reference channel and the separation channel coefficient.
In this exemplary embodiment, after the separation channel coefficient, the number of target channels may be determined according to the separation channel coefficient and the total number of channels in the reference layer, specifically, a product of the separation channel coefficient and the total number of channels may be used as the number of target channels, and when the correlation parameter is 1, all reference channels may be used as the target channels.
When the correlation parameter is less than 1, determining the priority of each reference channel according to the number of outliers in each reference channel, where the greater the number of outliers, the higher the priority of the reference channel, and after determining the priority of each reference channel, determining a target channel according to the number of target channels and the priority, that is, selecting the reference channels with the number of target channels in order of priority as the target channels.
In step S320, copying the target channel, and adjusting a weight value in the target channel according to the number of copying times to obtain a target layer corresponding to the reference layer;
in this example embodiment, after determining the target channel, the target channel may be copied, and the weight value in the target channel may be adjusted according to the copy. Specifically, the following operation may be performed for each of the above-described target channels.
Specifically, the number of the target channels after the copying may be determined according to the number of times of copying the target channels, and in an example embodiment, each pair of the target channels is copied once, and one target channel is added, and the number of the target channels is the number of times of copying plus 1. In another exemplary embodiment, if one target channel is not copied, the number of target channels is multiplied by 2, and the number of target channels is multiplied by 2, i.e. if the number of copies is m, the number n of target channels is changed to 2mAfter determining the number of the target channels after the copying. Then, the weighted value adjusting formula can be used to adjust the target channels according to the number of the target channelsThe weight value in the whole target channel is adjusted according to the formula:
Figure BDA0003173523860000111
wherein, wnewIndicating the adjusted weight value, n indicating the number of the target channels, woldIndicating the pre-adjustment weight value.
In this example embodiment, different numbers of copies may be performed on different target channels, where the number of copies may be determined according to the size of the outlier, and the size of the outlier may be positively correlated with the number of copies, that is, the larger the outlier is, the more copies may be made there, and specific related parameters may be customized according to user requirements, which is not specifically limited in this example embodiment.
In another embodiment of the present disclosure, when copying the target channel, the weights of the non-outliers in the copied target channel may all be set to 0, and at this time, when adjusting the weight value in the target channel by using the weight value adjustment formula, only the outliers of the weights in the target channel may be set.
After the above operation is performed for each target channel, a target layer corresponding to the reference layer is obtained. In this exemplary embodiment, after the target layer is obtained, it may be determined whether an outlier exists in the weight of the target channel in the target layer, and if an outlier still exists in the target channel, the outlier may be copied once in the target channel, and the weight value in each target channel may be reduced by half. Until each target channel has no outlier, so as to reduce the influence of the outlier on the model quantization precision and solve the influence of the outlier on the dynamic range.
In step S330, the target layer is quantized, and the number of input channels corresponding to the target channel is determined according to the number of the target channel, so as to complete the quantization of the model.
In an example embodiment of the present disclosure, the target layer may be quantized in a linear quantization manner, and after the quantization is completed, the number of input channels corresponding to each target channel may be determined according to the number of each target channel, so that the number of the target channels is the same as the number of the input channels corresponding to the target channel.
Specifically, a custom layer may be configured for the quantized target layer, and the custom layer may copy the input channels corresponding to the target channels according to the number of the target channels, so that the number of the target channels is the same as the number of the input channels corresponding to the target channels. And further, after the target channel is copied, the technical effect that the obtained output result is kept unchanged is achieved, and the accuracy of model quantization is improved. In this example embodiment, the above-mentioned custom layer may be an Outlier Channel Splitting (OCS) layer.
For example, referring to experience 5, assuming that the weight w3 in FIG. 5 is an outlier, the target channel [ w3, w4] of w3 is copied once and all the weights in that channel are halved. At this time, the input channel x2 corresponding to the target channel where w3 is located may also be assisted, so that the obtained result is still [ y1, y2 ].
In an example embodiment of the present disclosure, referring to fig. 6, in the model quantization method of this embodiment, first, step S610 is performed to set a separation channel coefficient, then step S620 is performed to find a weight outlier, then step S630 is performed to copy a target channel where the outlier is located, and reduce the weight value in the target channel by half to obtain a target layer, then step S640 is performed to linearly quantize the target layer, and finally step S650 may be performed to insert a custom layer to adjust the number of input channels corresponding to the target channel.
The details of the above steps have already been described in detail, and therefore, are not described herein.
The following description is given with a specific example to make the technical effect of the model quantization method disclosed more clear.
In an example embodiment of the present disclosure, a pedestrian motion recognition mode may be taken as an example, and as shown in fig. 7, the model may include at least one convolutional layer and a fully-connected layer 740, for example, a first convolutional layer 710, a second convolutional layer 720, and an nth convolutional layer 730, where the above-mentioned pedestrian motion recognition may further include a softmax function 750 before being output. The number of the convolutional layers may be 3, 4, or more, and may also be customized according to the user requirement, which is not specifically limited in this exemplary embodiment.
In the present exemplary embodiment, the model identification accuracy of the model before model quantization is not performed may be calculated; the model identification precision after the model is quantized by adopting a model quantization method in the prior art and the model identification precision after the model quantization method disclosed by the invention are adopted, and a model identification precision comparison table is constructed based on the model quantization precisions under the three conditions.
When the pedestrian motion recognition mode is quantized, at least one of the plurality of convolutional layers or the all-connected layers may be linearly quantized, or both the plurality of convolutional layers and the all-connected layers may be linearly quantized, or may be customized according to a user requirement, which is not specifically limited in this exemplary embodiment.
When the model is quantized, a Post-training Quantization (PTQ) method can be adopted, that is, the model can be used on a common cpu/gpu/dsp without a training process, and is also suitable for mobile devices such as mobile phones and tablet computers.
The model identification accuracy comparison table is shown in table 1.
TABLE 1 model identification accuracy comparison table
Figure BDA0003173523860000131
In table 1, float represents the recognition accuracy of the model when the model is not quantized, uint8 represents the recognition accuracy of the model after the model is quantized by using the model quantization method in the prior art, and uint8+ OCS represents the recognition accuracy of the model after the model quantization method proposed by the present disclosure is used. As can be seen from table 1, the recognition accuracy of the model quantized by the model quantization proposed in the present disclosure is improved by about 0.21% compared to the recognition accuracy of the model quantized by the model quantization method in the prior art. The method and the device have the technical effect of improving the identification precision of the quantized model.
The pedestrian motion model is quantized after being trained, the original 32-bit model is quantized into an 8-bit model, the size of the model is reduced by 4 times, the performance can be improved by 1.5 times, and the identification accuracy of the trained model is reduced. Therefore, by adopting the model quantization method disclosed by the invention in quantization after training, the influence of outliers on model accuracy can be limited,
in summary, in the exemplary embodiment, the target channel where the outlier is located is copied, and the weight value of the channel where the outlier is located is modified according to the number of times of copying, so that the influence of the outlier on the model quantization precision is reduced, and the influence of the outlier on the dynamic range is also solved. On the other hand, the number of the input channels corresponding to the target channels is determined according to the number of the target channels, and under the condition that the output result is not changed, the identification precision of the quantized model is improved. On the other hand, no other special hardware is introduced, so that the mobile terminal can also finish the quantification of the model, and the application range of the model quantification method is expanded.
It is noted that the above-mentioned figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Further, referring to fig. 8, an apparatus 800 for model quantization is further provided in the present exemplary embodiment, and includes a determining module 810, a copying module 820, and a quantizing module 830. Wherein:
the determining module 810 may be configured to obtain an outlier of the weight in the reference layer, and determine a target channel in the reference layer according to the outlier; the copying module 820 may be configured to copy each target channel, and adjust a weight value in each target channel according to the number of times of copying each target channel to obtain a target layer corresponding to the reference layer; the quantization module 830 may be configured to perform quantization processing on each target layer, and determine the number of input channels corresponding to each target channel according to the number of each target channel to complete quantization of the model.
In an exemplary embodiment, the determining module 810, in obtaining outliers of weights in the reference layer, is a histogram that may first obtain the weights of the reference layer; outliers in the weights are then determined from the histogram.
The determining module 810 may determine, when determining that the target channel in the reference layer is according to the outlier, a reference channel corresponding to each outlier, and determine the number of outliers in each reference channel; setting a separation channel coefficient according to the number of the reference channels and the total number of the channels in the reference layer; and determining a target channel in the reference channel according to the number of outliers in the reference channel and the separation channel coefficient. Specifically, the number of target channels may be determined from the separation channel coefficient and the total number of channels in the reference layer; determining the priority of each reference channel according to the number of outliers in each reference channel; and determining the target channel in the reference channels according to the number and the priority of the target channels.
In an example embodiment of the present disclosure, the model quantization apparatus may further include a verification module, where the verification module is configured to verify that the weights in the target channels in the target layer are outliers; and copying all target channels where the outliers are located once, and halving the weight values in all the target channels to update the target layer.
In this exemplary embodiment, the replication module 820 may first determine the number of each target channel according to the replication times of each target channel when adjusting the weight value in each target channel according to the replication times of each target channel; then, the weight value in each target channel is adjusted according to the number of each target channel by using a weight value adjusting formula, wherein the weight value adjusting formula is as follows:
Figure BDA0003173523860000141
wherein, wnewIndicating adjusted weight values, n indicating the number of target channels, woldIndicating the pre-adjustment weight value.
In this exemplary embodiment, the quantization module 830 determines the number of input channels corresponding to each target channel according to the number of the target channels to quantize the model, and may configure a custom layer for the target layer, where the custom layer is configured to copy the input channels corresponding to each target channel according to the number of the target channels, so that the number of the input channels corresponding to each target channel is the same as the number of the input channels corresponding to each target channel.
The specific details of each module in the above apparatus have been described in detail in the method section, and details that are not disclosed may refer to the method section, and thus are not described again.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the disclosure described in the above-mentioned "exemplary methods" section of this specification, when the program product is run on the terminal device.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Furthermore, program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims (10)

1. A method of model quantization, the model comprising a plurality of reference layers, wherein the reference layers comprise at least one of a convolutional layer and a fully-connected layer, the method comprising:
acquiring an outlier of the weight in the reference layer, and determining a target channel in the reference layer according to the outlier;
copying each target channel, and adjusting the weight value in each target channel according to the copying times of each target channel to obtain a target layer corresponding to the reference layer;
and performing quantization processing on each target layer, and determining the number of input channels corresponding to each target channel according to the number of each target channel so as to finish the quantization of the model.
2. The method of claim 1, wherein the determining the target channel in the reference layer from the outliers comprises:
determining a reference channel corresponding to each outlier, and determining the number of the outliers in each reference channel;
setting a separation channel coefficient according to the number of the reference channels and the total number of the channels in the reference layer;
and determining a target channel in the reference channel according to the number of outliers in the reference channel and the separation channel coefficient.
3. The method of claim 2, wherein determining a target channel in a reference channel based on the number of outliers in the reference channel and the separation channel coefficient comprises:
determining the number of the target channels according to the separation channel coefficient and the total number of the channels in the reference layer;
determining a priority of each of the reference channels according to a number of outliers in each of the reference channels;
and determining a target channel in the reference channels according to the number of the target channels and the priority.
4. The method of claim 1, wherein prior to performing quantization processing on each of the target layers, the method further comprises:
if the weight in the target channel in the target layer has an outlier;
copying each target channel where the outlier is located once, and halving the weight value in each target channel to update the target layer.
5. The method of claim 1, wherein adjusting the weight value in each of the target channels according to the number of copies of each of the target channels comprises:
determining the number of each target channel according to the copying times of each target channel;
adjusting the weight value of each target channel according to the number of each target channel by using a weight value adjustment formula, wherein the weight value adjustment formula is as follows:
Figure FDA0003173523850000021
wherein, wnewIndicating adjusted weight values, n indicating the number of target channels, woldIndicating the pre-adjustment weight value.
6. The method of claim 1, wherein determining the number of input channels corresponding to each of the target channels according to the number of the target channels comprises:
and configuring a custom layer for the target layer, wherein the custom layer is used for copying the input channels corresponding to the target channels according to the quantity of the target channels, so that the quantity of the target channels is the same as that of the input channels corresponding to the target channels.
7. The method of claim 1, wherein obtaining outliers of weights in the reference layer comprises:
obtaining a histogram of the weights of the reference layer;
determining outliers in the weights from the histogram.
8. An apparatus for model quantization, the model comprising a plurality of reference layers, wherein the reference layers comprise at least one of convolutional layers and fully-connected layers, the apparatus comprising:
the determining module is used for acquiring an outlier of the weight in the reference layer and determining a target channel in the reference layer according to the outlier;
the copying module is used for copying each target channel and adjusting the weight value in each target channel according to the copying times of each target channel to obtain a target layer corresponding to the reference layer;
and the quantization module is used for performing quantization processing on each target layer and determining the number of input channels corresponding to each target channel according to the number of each target channel so as to finish the quantization of the model.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a model quantification method as claimed in any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the model quantification method of any one of claims 1 to 7.
CN202110825902.9A 2021-07-21 2021-07-21 Model quantization method and device, storage medium and electronic device Pending CN113537470A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110825902.9A CN113537470A (en) 2021-07-21 2021-07-21 Model quantization method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110825902.9A CN113537470A (en) 2021-07-21 2021-07-21 Model quantization method and device, storage medium and electronic device

Publications (1)

Publication Number Publication Date
CN113537470A true CN113537470A (en) 2021-10-22

Family

ID=78129218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110825902.9A Pending CN113537470A (en) 2021-07-21 2021-07-21 Model quantization method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN113537470A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217318A (en) * 2023-11-07 2023-12-12 瀚博半导体(上海)有限公司 Text generation method and device based on Transformer network model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117217318A (en) * 2023-11-07 2023-12-12 瀚博半导体(上海)有限公司 Text generation method and device based on Transformer network model
CN117217318B (en) * 2023-11-07 2024-01-26 瀚博半导体(上海)有限公司 Text generation method and device based on Transformer network model

Similar Documents

Publication Publication Date Title
CN109086709B (en) Feature extraction model training method and device and storage medium
CN111414736B (en) Story generation model training method, device, equipment and storage medium
JP7324838B2 (en) Encoding method and its device, apparatus and computer program
CN110263131B (en) Reply information generation method, device and storage medium
CN112562019A (en) Image color adjusting method and device, computer readable medium and electronic equipment
CN111950570B (en) Target image extraction method, neural network training method and device
CN110147533B (en) Encoding method, apparatus, device and storage medium
CN111476783A (en) Image processing method, device and equipment based on artificial intelligence and storage medium
CN111860841B (en) Optimization method, device, terminal and storage medium of quantization model
JP2023508062A (en) Dialogue model training method, apparatus, computer equipment and program
CN111866483A (en) Color restoration method and device, computer readable medium and electronic device
CN113742082A (en) Application resource allocation method and device, computer readable medium and terminal
CN110555102A (en) media title recognition method, device and storage medium
CN113744286A (en) Virtual hair generation method and device, computer readable medium and electronic equipment
CN111916097A (en) Method and system for Gaussian weighted self-attention for speech enhancement
CN113537470A (en) Model quantization method and device, storage medium and electronic device
CN113284206A (en) Information acquisition method and device, computer readable storage medium and electronic equipment
CN111414737B (en) Story generation model training method, device, equipment and storage medium
CN113902636A (en) Image deblurring method and device, computer readable medium and electronic equipment
CN115936092A (en) Neural network model quantization method and device, storage medium and electronic device
CN110990549A (en) Method and device for obtaining answers, electronic equipment and storage medium
CN114996515A (en) Training method of video feature extraction model, text generation method and device
CN111310701B (en) Gesture recognition method, device, equipment and storage medium
CN113240599A (en) Image toning method and device, computer-readable storage medium and electronic equipment
CN113989121A (en) Normalization processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination