WO2020207174A1 - Procédé et appareil de génération de réseau neuronal quantifié - Google Patents

Procédé et appareil de génération de réseau neuronal quantifié Download PDF

Info

Publication number
WO2020207174A1
WO2020207174A1 PCT/CN2020/078586 CN2020078586W WO2020207174A1 WO 2020207174 A1 WO2020207174 A1 WO 2020207174A1 CN 2020078586 W CN2020078586 W CN 2020078586W WO 2020207174 A1 WO2020207174 A1 WO 2020207174A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
network
initial neural
quantized
parameters
Prior art date
Application number
PCT/CN2020/078586
Other languages
English (en)
Chinese (zh)
Inventor
刘阳
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2020207174A1 publication Critical patent/WO2020207174A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and device for generating a quantitative neural network.
  • a BN (Batch Normalization) layer is usually connected behind the convolutional layer included in the neural network.
  • the BN layer is used to normalize the output of the convolutional layer before passing the output of the convolutional layer to other layers, so as to improve the convergence speed of the neural network.
  • the BN layer includes a normalization parameter for multiplying the output of the convolution layer to normalize the output of the convolution layer.
  • the output of the convolutional layer is obtained by convolution of the weight of the convolutional layer and the input of the convolutional layer.
  • the data types of weights and normalization parameters are usually floating point.
  • the embodiments of the present disclosure propose a method and apparatus for generating a quantitative neural network.
  • the embodiments of the present disclosure provide a method for generating a quantized neural network.
  • the method includes: obtaining a training sample set and an initial neural network, wherein the training sample includes sample information and a sample predetermined for the sample information
  • the initial neural network includes the original floating-point network parameters.
  • the original floating-point network parameters are the floating-point weight of the convolutional layer in the initial neural network and the floating-point normalization of the batch normalization layer connected to the convolutional layer.
  • a product of parameters convert the original floating-point network parameters in the initial neural network into integer network parameters; generate a quantized initial neural network based on the converted integer network parameters; select training samples from the training sample set, and execute The following training steps: use the sample information in the selected training sample as the input of the quantized initial neural network, use the sample result in the selected training sample as the expected output of the quantized initial neural network, and train the quantized initial neural network; To determine the completion of the quantized initial neural network training, based on the completed quantized initial neural network, a quantized neural network is generated.
  • generating a quantized initial neural network based on the converted integer network parameters includes: converting the converted integer network parameters into floating-point network parameters, and converting the converted into floating-point network parameters The initial neural network is determined as the quantified initial neural network.
  • converting the original floating-point network parameters in the initial neural network into integer network parameters includes: converting floating-point weights corresponding to the original floating-point network parameters into integer weights, and converting the original floating-point network parameters into integer weights.
  • the floating-point normalization parameter corresponding to the floating-point network parameter is converted into an integer normalization parameter; the converted integer weight and the integer normalization parameter are integrated to obtain the integer network parameter.
  • the method further includes: in response to determining that the quantized initial neural network has not been trained, performing the following steps: selecting training samples from the unselected training samples included in the training sample set; adjusting the parameters of the quantized initial neural network , Obtain new floating-point network parameters; convert new floating-point network parameters into new integer network parameters, and generate a new quantized initial neural network based on the new integer network parameters; use the most recently selected training The samples and the newly generated quantized initial neural network continue to perform the training steps.
  • generating a new quantized initial neural network based on the new integer network parameters includes: converting the new integer network parameters into floating-point network parameters, and converting the new integer network parameters into floating-point network parameters The quantized initial neural network is determined as the new quantitative initial neural network.
  • the method further includes: sending the quantitative neural network to the user terminal, so that the user terminal stores the received quantitative neural network.
  • the embodiments of the present disclosure provide a method for processing information, the method comprising: obtaining the information to be processed and a target quantization neural network, wherein the target quantization neural network adopts any one of the above-mentioned first aspect Generated by the method of the embodiment; input the information to be processed into the target quantized neural network to obtain the processing result and output.
  • an embodiment of the present disclosure provides an apparatus for generating a quantized neural network.
  • the apparatus includes: a first acquiring unit configured to acquire a training sample set and an initial neural network, wherein the training sample includes sample information And for the sample results predetermined for the sample information, the initial neural network includes the original floating-point network parameters.
  • the original floating-point network parameters are the floating-point weights of the convolutional layer in the initial neural network and the batches connected to the convolutional layer.
  • the product of floating-point normalization parameters of the one layer is configured to convert the original floating-point network parameters in the initial neural network into integer network parameters; the generation unit is configured to be based on the converted integer Type network parameters to generate a quantized initial neural network; the first execution unit is configured to select training samples from the training sample set, and perform the following training steps: use the sample information in the selected training samples as the input of the quantized initial neural network, The sample results in the selected training samples are used as the expected output of the quantized initial neural network, and the quantized initial neural network is trained; in response to determining that the quantized initial neural network has been trained, a quantized neural network is generated based on the completed quantized initial neural network. .
  • the generating unit is further configured to: convert the converted integer network parameters into floating-point network parameters, and determine the initial neural network including the converted floating-point network parameters as the quantized initial neural network .
  • the conversion unit includes: a conversion module configured to convert floating-point weights corresponding to the original floating-point network parameters into integer weights, and to convert floating-point weights corresponding to the original floating-point network parameters
  • the normalization parameter is converted into an integer normalization parameter
  • the integration module is configured to integrate the converted integer weight and the integer normalization parameter to obtain an integer network parameter.
  • the device further includes: a second execution unit configured to, in response to determining that the quantized initial neural network has not been trained, perform the following steps: select training samples from unselected training samples included in the training sample set ; Adjust the parameters of the quantized initial neural network to obtain new floating-point network parameters; convert the new floating-point network parameters into new integer network parameters, and generate new quantized initial nerves based on the new integer network parameters Network; use the most recently selected training sample and the newly generated quantized initial neural network to continue the training steps.
  • a second execution unit configured to, in response to determining that the quantized initial neural network has not been trained, perform the following steps: select training samples from unselected training samples included in the training sample set ; Adjust the parameters of the quantized initial neural network to obtain new floating-point network parameters; convert the new floating-point network parameters into new integer network parameters, and generate new quantized initial nerves based on the new integer network parameters Network; use the most recently selected training sample and the newly generated quantized initial neural network to continue the training steps.
  • the second execution unit is further configured to: convert the new integer network parameter into a floating-point network parameter, and determine the quantized initial neural network including the converted floating-point network parameter as a new Quantify the initial neural network.
  • the device further includes: a sending unit configured to send the quantized neural network to the user terminal, so that the user terminal can store the received quantized neural network.
  • an embodiment of the present disclosure provides a device for processing information, the device includes: a second acquisition unit configured to acquire the information to be processed and a target quantization neural network, wherein the target quantization neural network is As generated by the method of any one of the embodiments of the above first aspect; the input unit is configured to input the information to be processed into the target quantization neural network to obtain the processing result and output.
  • the embodiments of the present disclosure provide an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when one or more programs are processed by one or more The processor executes, so that one or more processors implement the method of any one of the foregoing first aspect or second aspect.
  • an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the method of any one of the foregoing first aspect or second aspect is implemented.
  • the method and device for generating a quantized neural network obtained by the embodiments of the present disclosure obtain a training sample set and an initial neural network, where the initial neural network includes primitive floating-point network parameters, and then the primitive float in the initial neural network Point network parameters are converted into integer network parameters, and based on the converted integer network parameters, a quantized initial neural network is generated.
  • training samples are selected from the training sample set, and the following training steps are performed:
  • the sample information is used as the input of the quantized initial neural network, and the sample results in the selected training samples are used as the expected output of the quantized initial neural network, and the quantized initial neural network is trained; in response to determining that the training of the quantized initial neural network is completed, based on the training completion
  • the quantized initial neural network generates a quantized neural network, so that in the training process of the neural network, the floating-point network parameters in the neural network are converted into integer network parameters, thereby adding quantitative constraints to the network parameters of the neural network , Which helps to reduce the storage space occupied by the neural network, and the CPU consumption when using the neural network for information processing, and to improve the efficiency of information processing; and, it is directly related to the training of the neural network in the prior art.
  • the solution of the present disclosure can reduce the accuracy loss caused by quantizing network parameters and improve the accuracy of the quantized neural network. Furthermore, the quantized neural network of the present disclosure is used for information processing. Electronic equipment, compared to electronic equipment using a quantitative neural network for information processing in the prior art, can have more accurate information processing functions.
  • FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure can be applied
  • Fig. 2 is a flowchart of an embodiment of a method for generating a quantitative neural network according to the present disclosure
  • Fig. 3 is a schematic diagram of an application scenario of the method for generating a quantitative neural network according to an embodiment of the present disclosure
  • Fig. 4 is a flowchart of another embodiment of a method for generating a quantitative neural network according to the present disclosure
  • Fig. 5 is a schematic structural diagram of an embodiment of an apparatus for generating a quantized neural network according to the present disclosure
  • Fig. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.
  • FIG. 1 shows an exemplary system architecture 100 to which the method for generating a quantitative neural network, an apparatus for generating a quantitative neural network, a method for processing information, or an embodiment of an apparatus for processing information of the present disclosure can be applied. .
  • the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and so on.
  • Various communication client applications such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, and social platform software, can be installed on the terminal devices 101, 102, and 103.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, 103 can be various electronic devices, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression Standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Motion Picture Experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.
  • the server 105 may be a server that provides various services, for example, a model processing server that processes the initial neural network sent by the terminal devices 101, 102, and 103.
  • the model processing server can analyze and process the received initial neural network and other data, and feed back the processing result (for example, the quantitative neural network) to the terminal device.
  • the method for generating a quantitative neural network is generally executed by the server 105. Accordingly, the device for generating a quantitative neural network is generally set in the server 105; in addition, the method of the present disclosure
  • the method for processing information provided by the embodiment is generally executed by the terminal equipment 101, 102, 103, and correspondingly, the device for processing information is generally set in 101, 102, 103.
  • the server can be hardware or software.
  • the server can be implemented as a distributed server cluster composed of multiple servers, or as a single server.
  • the server is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.
  • terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks and servers.
  • the method for generating a quantitative neural network includes the following steps:
  • Step 201 Obtain a training sample set and an initial neural network.
  • the execution body (for example, the server shown in FIG. 1) of the method for generating a quantitative neural network can obtain the training sample set and the initial neural network remotely or locally through a wired connection or a wireless connection.
  • the training samples in the training sample set include sample information and sample results predetermined for the sample information.
  • the sample information is the information that can be processed by the initial neural network, which can include but is not limited to at least one of the following: text, image, audio, and video.
  • the initial neural network may be a neural network for face recognition
  • the sample information may be a sample face image.
  • the sample result is the expected result obtained by processing the sample information using the initial neural network (for example, gender information used to characterize the gender of the person corresponding to the sample face image).
  • the first-acquainted neural network can be an untrained neural network or a trained neural network.
  • the function or input and output of the initial neural network can be predetermined.
  • the above-mentioned execution subject can obtain a training sample set for training the initial neural network.
  • the initial neural network includes primitive floating-point network parameters.
  • the original floating-point network parameter is the product of the floating-point weight of the convolutional layer in the initial neural network and the floating-point normalization parameter of the batch normalization layer connected to the convolutional layer.
  • the initial neural network includes a convolutional layer and a batch normalization layer.
  • the convolutional layer includes floating-point weights. Floating point weights can be used to perform convolution operations with the input of the convolutional layer to obtain the output of the convolutional layer.
  • the batch normalization layer can be connected to the convolutional layer and used to normalize the output of the convolutional layer.
  • the batch normalization layer includes a floating-point normalization parameter used to multiply the output of the convolutional layer to perform normalization processing on the output of the convolutional layer.
  • the convolutional layer and the batch normalization layer can be regarded as a network structure. The output of the network structure is determined by the product of the output of the convolutional layer and the floating-point normalization parameter.
  • the output of the convolutional layer is determined by the floating
  • the point weight is determined by the convolution of the input of the convolutional layer. Therefore, the output of the above network structure can be convolved with the input of the convolutional layer by integrating the floating-point normalization parameter and the floating-point weight. determine.
  • the input of the convolutional layer is the input variable of the aforementioned network result
  • the product of the floating-point normalization parameter and the floating-point weight is the parameter of the aforementioned network structure.
  • the product of the floating-point normalization parameter and the floating-point weight can be determined as the floating-point network parameter.
  • the original floating-point network parameters are the network parameters included in the initial neural network and to be quantified.
  • the quantization of floating-point data refers to converting floating-point data into integer data within a certain value range.
  • the value range is limited by the number of bits of the integer data. For example, if the integer data to be converted is 8 bits (ie 8 bits), the value range is (0, 255). It should be noted that, in this embodiment, when the original floating-point network parameter is quantized, the number of bits of the integer network parameter to be quantized can be predetermined by a technician.
  • floating-point data since floating-point data can record data information after the decimal point, it has higher precision.
  • the integer type data does not record the data information after the decimal point, so it can occupy less storage space, and the calculation speed is faster when the integer type data is used for calculation.
  • weights and normalization parameters in the neural network in the prior art are usually stored as floating-point types.
  • Step 202 Convert the original floating-point network parameters in the initial neural network into integer network parameters.
  • the above-mentioned executive body can convert the original floating-point network parameters in the initial neural network into integer network parameters.
  • the above-mentioned executive body may first determine the number of bits of the integer network parameters to be converted, and then use various existing methods to convert the original floating-point network parameters in the initial neural network into integer network parameters. It can be understood that converting the original floating-point network parameters in the initial neural network into integer network parameters is equivalent to adding quantitative constraints to the initial neural network.
  • the original floating-point network parameter includes the value "21.323", and it is determined in advance that the number of bits of the integer weight to be converted is eight bits, and the value range of the integer network parameter can be determined as (0, 255) , And then can directly use the rounding method to convert the value "21.323" in the original floating-point network parameter into an integer network parameter "21".
  • the above-mentioned executive body can convert the original floating-point network parameters in the initial neural network into integer network parameters through the following steps: First, the above-mentioned executive body can convert the original floating-point network parameters The floating-point weights corresponding to the network parameters are converted into integer weights, and the floating-point normalization parameters corresponding to the original floating-point network parameters are converted into integer normalization parameters. Then, the above-mentioned execution subject can integrate the converted integer weight and the integer normalization parameter to obtain the integer network parameter.
  • This implementation method first adds quantization constraints to floating-point weights and floating-point normalization parameters, and then uses the integer weights and integer normalization parameters added with quantization constraints to obtain integer network parameters, which can reduce The accuracy loss of the quantized integer network parameters helps to improve the accuracy of the quantized initial neural network.
  • Step 203 Generate a quantized initial neural network based on the converted integer network parameters.
  • the above-mentioned executive body can generate a quantized initial neural network.
  • the above-mentioned execution body may directly determine the initial neural network including the converted integer network parameters as a quantized neural network; or the above-mentioned execution body may also process the initial neural network including the converted integer network parameters, And the initial neural network after processing is determined as the quantitative initial neural network.
  • the above-mentioned executive body can generate a quantized initial neural network through the following steps: the above-mentioned executive body can convert the converted integer network parameters into floating-point network parameters, and include the conversion The initial neural network of the floating-point network parameters is determined as the quantized initial neural network.
  • the conversion of integer network parameters into floating-point network parameters is the inverse process of converting the original floating-point network parameters into integer network parameters. You can refer to the steps of converting the original floating-point network parameters into integer network parameters. Convert the converted integer network parameters to obtain floating-point network parameters.
  • the converted integer network parameter is "21". From the "21.323" in the original floating-point network parameter, it can be seen that the floating-point network parameter is accurate to three decimal places. Therefore, the integer network parameter "21" can be converted into the floating-point network parameter "21.000".
  • floating-point data can have higher precision than integer data. Therefore, after adding quantization constraints to the initial neural network, and then converting the integer network parameters into floating-point network parameters, it helps to improve the training accuracy and obtain more accurate training results in the subsequent training of the initial neural network.
  • Step 204 select training samples from the training sample set, and perform the following training steps: use sample information in the selected training samples as the input of the quantized initial neural network, and use the sample results in the selected training samples as the quantized initial neural network Train the quantized initial neural network for the expected output of, and generate a quantized neural network based on the completed quantized initial neural network in response to determining that the training of the quantized initial neural network is completed.
  • the above-mentioned execution subject may select training samples from the training sample set, and perform the following training steps:
  • Step 2041 Use the sample information in the selected training sample as the input of the quantized initial neural network, and use the sample result in the selected training sample as the expected output of the quantized initial neural network, and train the quantized initial neural network.
  • the above-mentioned executive body can use machine learning methods to train the quantized initial neural network.
  • the above-mentioned execution body inputs the sample information into the quantized initial neural network to obtain the actual result, and then uses the preset loss function to calculate the difference between the actual result and the sample result in the training sample.
  • the L2 norm can be used.
  • the number is the difference between the actual result calculated by the loss function and the sample result in the training sample.
  • Step 2042 in response to determining that the training of the quantized initial neural network is completed, generate a quantized neural network based on the completed quantized initial neural network.
  • the above-mentioned execution subject can determine whether the current training of the quantized initial neural network meets a preset completion condition, and if it is satisfied, it can determine that the training of the quantized initial neural network is completed.
  • the completion condition may include but is not limited to at least one of the following: the training time exceeds the preset duration; the number of training times exceeds the preset number; the calculated difference is less than the preset difference threshold.
  • the above-mentioned execution subject may generate a quantized neural network based on the completed quantized initial neural network in response to determining that the training is completed.
  • the quantized neural network is a neural network that has been trained and the included network parameters are integer network parameters.
  • the above-mentioned execution entity may directly determine the trained quantized initial neural network as a quantized neural network; in response to determining the completed quantized initial neural network
  • the network parameters in the neural network are floating-point network parameters
  • the above-mentioned executive body can convert the floating-point network parameters in the trained quantized initial neural network into integer network parameters, and then include the converted integer network parameters
  • the quantized initial neural network after training is determined to be the quantized neural network.
  • the execution body may perform the following steps: select training samples from the unselected training samples included in the training sample set; adjust the parameters of the quantized initial neural network , Obtain new floating-point network parameters; convert new floating-point network parameters into new integer network parameters, and generate a new quantized initial neural network based on the new integer network parameters; use the most recently selected training
  • select training samples from the unselected training samples included in the training sample set may be adjusted to perform the above training steps.
  • adjust the parameters of the quantized initial neural network Obtain new floating-point network parameters; convert new floating-point network parameters into new integer network parameters, and generate a new quantized initial neural network based on the new integer network parameters; use the most recently selected training
  • the samples and the newly generated quantized initial neural network continue to perform the above training steps (steps 2041-2042).
  • various implementations may be adopted to adjust the parameters of the quantified initial neural network based on the difference between the actual result obtained by calculation and the sample result in the training sample.
  • the BP (Back Propagation) algorithm and the SGD (Stochastic Gradient Descent) algorithm can be used to adjust the parameters of the quantized initial neural network.
  • the parameters are usually adjusted to floating point. Therefore, after adjusting the parameters, the quantized initial neural network will obtain new floating-point network parameters.
  • the above-mentioned executive body can convert the new floating-point network parameters into new integer network parameters to re-include the new floating-point network parameters.
  • the quantization of the parameters of the type network adds quantitative constraints to the initial neural network to generate a new quantized initial neural network.
  • the above-mentioned execution subject may generate a new quantized initial neural network in various ways based on the new integer network parameters.
  • the quantized initial neural network including the new integer network parameters can be directly determined as the new quantized initial neural network.
  • the above-mentioned execution subject may also process the quantized initial neural network including the new integer network parameters, and determine the processed quantized initial neural network as a new quantized initial neural network.
  • the above-mentioned executive body can generate a new quantized initial neural network through the following steps: the above-mentioned executive body can convert new integer network parameters into floating-point network parameters, and will include The quantized initial neural network of the converted floating-point network parameters is determined to be the new quantized initial neural network.
  • the above-mentioned execution subject may send the quantitative neural network to the user terminal, so that the user terminal can store the received quantitative neural network.
  • the quantized neural network with quantized constraints can take up less storage space. Through this implementation, the storage resources of the user terminal can be saved.
  • FIG. 3 is a schematic diagram of an application scenario of the method for generating a neural network according to this embodiment.
  • the server 301 can first obtain the training sample set 302 and the initial neural network 303, where the training samples in the training sample set 302 include sample information and sample results predetermined for the sample information.
  • the initial neural network 303 includes the original floating-point network parameters 304 (for example, "2.134").
  • the original floating-point network parameter 304 is the product of the floating-point weight of the convolutional layer in the initial neural network 303 and the floating-point normalization parameter of the batch normalization layer connected to the convolutional layer.
  • the server 301 converts the original floating-point network parameter 304 in the initial neural network 303 into an integer network parameter 305 (for example, "2").
  • the server 301 generates a quantized initial neural network 306 based on the converted integer network parameters 305.
  • the server 301 can select the training sample 3021 from the training sample set 302, and perform the following training steps: use the sample information 30211 in the selected training sample 3021 as the input of the quantized initial neural network 306, and use the selected training sample 3021
  • the sample result 30212 in is used as the expected output of the quantized initial neural network 306 to train the quantized initial neural network 306; in response to determining that the training of the quantized initial neural network 306 is completed, based on the completed quantized initial neural network 306, a quantized neural network 307 is generated .
  • the method provided by the above-mentioned embodiments of the present disclosure converts the floating-point network parameters in the neural network into integer network parameters during the training process of the neural network, thereby adding quantitative constraints to the network parameters of the neural network, which is helpful
  • the efficiency of information processing is improved; and, in contrast to the prior art, the network parameters in the trained neural network are directly performed Compared with quantization and generating a quantized neural network, the solution of the present disclosure can reduce the accuracy loss caused by quantizing network parameters and improve the accuracy of the quantized neural network.
  • the electronic device that uses the quantized neural network of the present disclosure for information processing Compared with the prior art electronic devices that use quantitative neural networks for information processing, they can have more accurate information processing functions.
  • FIG. 4 shows a flow 400 of an embodiment of a method for processing information.
  • the process 400 of the method for processing information includes the following steps:
  • Step 401 Obtain the information to be processed and the target quantization neural network.
  • the execution body of the method for processing information can remotely or locally obtain the training information and the target quantization neural network through a wired connection or a wireless connection.
  • the target quantization neural network is generated by using the method of any one of the embodiments corresponding to FIG. 2.
  • the target quantitative neural network is a quantitative neural network to be used for information processing.
  • the information to be processed can be the information that the target quantization neural network can process. It can include but is not limited to at least one of the following: text, image, audio, and video.
  • the target quantization neural network is a model used for face recognition, and the information to be processed may be a face image.
  • the information to be processed may be pre-stored on the execution subject, or sent to the execution subject by other electronic devices.
  • the processing result can be the output result of the target quantified neural network.
  • Step 402 Input the information to be processed into the target quantization neural network to obtain the processing result and output.
  • the above-mentioned execution subject may input the information to be processed into the target quantization neural network to obtain the processing result output by the target quantization neural network.
  • the above-mentioned execution body can output the processing result.
  • the above-mentioned execution subject may output the processing result to other electronic devices connected in communication, or may output and display the processing result.
  • the method provided by the embodiment of the present disclosure adopts the quantized neural network generated in any embodiment corresponding to FIG. 2, which can make the quantized neural network suitable for the user terminal and at the same time help reduce the consumption of storage resources of the user terminal; and
  • the user terminal uses the quantized neural network for information processing, due to the low complexity of the quantized neural network, the efficiency of the user terminal’s information processing can be improved, and the CPU consumption of the user terminal can be reduced; in addition, because it is sent to the user
  • the quantitative neural network of the terminal is a neural network obtained by adding quantitative constraints in the training process.
  • the quantitative neural network of the present disclosure Compared with the quantitative neural network generated by adding quantitative constraints to the trained neural network in the prior art, the quantitative neural network of the present disclosure The accuracy loss of the neural network is smaller, and furthermore, the user terminal can realize more accurate information processing and output by using the quantitative neural network of the present disclosure.
  • the present disclosure provides an embodiment of a device for generating a quantized neural network.
  • the device embodiment corresponds to the method embodiment shown in FIG. 2 ,
  • the device can be specifically applied to various electronic equipment.
  • the apparatus 500 for generating a quantized neural network in this embodiment includes: a first acquisition unit 501, a conversion unit 502, a generation unit 503, and a first execution unit 504.
  • the first obtaining unit 501 is configured to obtain a training sample set and an initial neural network, where the training samples include sample information and sample results predetermined for the sample information, and the initial neural network includes original floating-point network parameters and original floating-point network parameters.
  • the type network parameter is the product of the floating-point weight of the convolutional layer in the initial neural network and the floating-point normalization parameter of the batch normalization layer connected to the convolutional layer;
  • the conversion unit 502 is configured to convert the initial neural network
  • the original floating-point network parameters in the, are converted into integer network parameters;
  • the generation unit 503 is configured to generate a quantized initial neural network based on the converted integer network parameters;
  • the first execution unit 504 is configured to select training from the training sample set Sample, and perform the following training steps: use the sample information in the selected training sample as the input of the quantized initial neural network, and use the sample result in the selected training sample as the expected output of the quantized initial neural network. Perform training; in response to determining that the training of the quantized initial neural network is completed, a quantized neural network is generated based on the completed quantized initial neural network.
  • the first acquisition unit 501 of the apparatus 500 for generating a quantized neural network may remotely or locally acquire the training sample set and the initial neural network through a wired connection or a wireless connection.
  • the training samples in the training sample set include sample information and sample results predetermined for the sample information.
  • the sample information is the information that can be processed by the initial neural network, which can include but is not limited to at least one of the following: text, image, audio, and video.
  • the first-acquainted neural network can be an untrained neural network or a trained neural network.
  • the initial neural network includes primitive floating-point network parameters.
  • the original floating-point network parameter is the product of the floating-point weight of the convolutional layer in the initial neural network and the floating-point normalization parameter of the batch normalization layer connected to the convolutional layer.
  • the conversion unit 502 can convert the original floating-point network parameters in the initial neural network into integer network parameters.
  • the generation unit 503 may generate a quantized initial neural network.
  • the first execution unit 504 may select training samples from the training sample set, and perform the following training steps: use sample information in the selected training samples as quantification
  • the input of the initial neural network, the sample results in the selected training samples are used as the expected output of the quantized initial neural network, and the quantized initial neural network is trained; in response to determining that the training of the quantized initial neural network is completed, the quantized initial neural network is based on the completed training Network to generate a quantitative neural network.
  • the generating unit 503 may be further configured to: convert the converted integer network parameter into a floating-point network parameter, and convert the converted integer network parameter into a floating-point network parameter.
  • the initial neural network is determined to be the quantitative initial neural network.
  • the conversion unit 502 may include: a conversion module (not shown in the figure) configured to convert the floating-point weight corresponding to the original floating-point network parameter into an integer Weight, and convert the floating-point normalization parameters corresponding to the original floating-point network parameters into integer normalization parameters; the quadrature module (not shown in the figure) is configured as a pair of converted integer weights Integrate with integer normalization parameters to obtain integer network parameters.
  • a conversion module (not shown in the figure) configured to convert the floating-point weight corresponding to the original floating-point network parameter into an integer Weight, and convert the floating-point normalization parameters corresponding to the original floating-point network parameters into integer normalization parameters
  • the quadrature module (not shown in the figure) is configured as a pair of converted integer weights Integrate with integer normalization parameters to obtain integer network parameters.
  • the apparatus 500 may further include: a second execution unit (not shown in the figure), configured to perform the following steps in response to determining that the quantized initial neural network has not been trained: Select training samples from the unselected training samples included in the training sample set; adjust the parameters of the quantized initial neural network to obtain new floating-point network parameters; convert the new floating-point network parameters into new integer network parameters, And based on the new integer network parameters, generate a new quantized initial neural network; use the most recently selected training sample and the newly generated quantized initial neural network to continue the training step.
  • a second execution unit (not shown in the figure), configured to perform the following steps in response to determining that the quantized initial neural network has not been trained: Select training samples from the unselected training samples included in the training sample set; adjust the parameters of the quantized initial neural network to obtain new floating-point network parameters; convert the new floating-point network parameters into new integer network parameters, And based on the new integer network parameters, generate a new quantized initial neural network; use the most recently selected training sample and the newly generated quantized initial neural
  • the second execution unit may be further configured to: convert the new integer network parameter into a floating-point network parameter, and include the converted floating-point network parameter.
  • the quantized initial neural network is determined as a new quantized initial neural network.
  • the apparatus 500 may further include: a sending unit (not shown in the figure), configured to send the quantized neural network to the user terminal, so that the user terminal can analyze the received quantized neural network Store it.
  • a sending unit (not shown in the figure), configured to send the quantized neural network to the user terminal, so that the user terminal can analyze the received quantized neural network Store it.
  • the device 500 provided by the above-mentioned embodiment of the present disclosure converts the floating point weights in the neural network into integer weights during the training process of the neural network, thereby adding quantitative constraints to the weights of the neural network, which helps reduce The storage space occupied by the small neural network and the consumption of the CPU when the neural network is used for information processing improves the efficiency of information processing; and directly quantifies the weights in the trained neural network as in the prior art to generate Compared with the quantitative neural network, the solution of the present disclosure can reduce the loss of precision caused by weighting, and improve the accuracy of the quantitative neural network. Furthermore, the electronic equipment that uses the quantitative neural network of the present disclosure for information processing is compared with the current situation. There are technical electronic devices that use quantitative neural networks for information processing, which can have more accurate information processing functions.
  • the present disclosure provides an embodiment of a device for processing information.
  • the device embodiment corresponds to the method embodiment shown in FIG. 4.
  • the device can be specifically applied to various electronic devices.
  • the apparatus 600 for processing information in this embodiment includes: a second acquiring unit 601 and an input unit 602.
  • the second obtaining unit 601 is configured to obtain the information to be processed and the target quantized neural network, where the target quantized neural network is generated using the method of any one of the embodiments corresponding to FIG. 2
  • the input unit 602 is It is configured to input the information to be processed into the target quantized neural network to obtain the processing result and output.
  • the second acquiring unit 601 of the apparatus 600 for processing information may remotely or locally acquire the training information and the target quantization neural network through a wired connection or a wireless connection.
  • the target quantization neural network is generated by using the method of any one of the embodiments corresponding to FIG. 2.
  • the target quantitative neural network is a quantitative neural network to be used for information processing.
  • the information to be processed can be the information that the target quantization neural network can process. It can include but is not limited to at least one of the following: text, image, audio, and video.
  • the input unit 602 can input the information to be processed into the target quantization neural network to obtain the processing result and output of the target quantization neural network.
  • the apparatus 600 provided in the above-mentioned embodiment of the present disclosure adopts the quantized neural network generated in any embodiment corresponding to FIG. 2, which can make the quantized neural network applicable to the user terminal and help reduce the consumption of storage resources of the user terminal. ; And, when the user terminal is using the quantitative neural network for information processing, due to the low complexity of the quantitative neural network, the efficiency of the user terminal for information processing can be improved, and the CPU consumption of the user terminal can be reduced; in addition, due to the transmission
  • the quantized neural network for the user terminal is a neural network obtained by adding quantitative constraints in the training process.
  • the quantized neural network generated by adding quantitative constraints to the trained neural network
  • the present disclosure The precision loss of the quantized neural network is smaller, and furthermore, the user terminal can realize more accurate information processing and output by using the quantized neural network of the present disclosure.
  • FIG. 7 shows a schematic structural diagram of an electronic device (for example, the terminal device or the server in FIG. 1) 700 suitable for implementing the embodiments of the present disclosure.
  • the terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (e.g. Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 7 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 700 may include a processing device (such as a central processing unit, a graphics processor, etc.) 701, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 702 or from a storage device 708.
  • the program in the memory (RAM) 703 executes various appropriate actions and processing.
  • the RAM 703 also stores various programs and data required for the operation of the electronic device 700.
  • the processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704.
  • An input/output (I/O) interface 705 is also connected to the bus 704.
  • the following devices can be connected to the I/O interface 705: including input devices 606 such as touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, liquid crystal display (LCD), speakers, vibration An output device 707 such as a device; a storage device 708 such as a magnetic tape and a hard disk; and a communication device 709.
  • the communication device 709 may allow the electronic device 700 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 7 shows an electronic device 700 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices.
  • the process described above with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 709, or installed from the storage device 708, or installed from the ROM 702.
  • the processing device 701 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the computer-readable medium described in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs.
  • the electronic device obtains a training sample set and an initial neural network, where the training sample includes sample information and targeting Sample information Pre-determined sample results.
  • the initial neural network includes the original floating-point network parameters.
  • the original floating-point network parameters are the floating-point weight of the convolutional layer in the initial neural network and the batch normalization connected to the convolutional layer.
  • the product of floating-point normalization parameters of the layer convert the original floating-point network parameters in the initial neural network into integer network parameters, and generate a quantized initial neural network based on the converted integer network parameters; from training samples Focus on selecting training samples and perform the following training steps: use the sample information in the selected training samples as the input of the quantized initial neural network, and use the sample results in the selected training samples as the expected output of the quantized initial neural network.
  • the initial neural network is trained; in response to determining that the training of the quantized initial neural network is completed, a quantized neural network is generated based on the completed quantized initial neural network.
  • the electronic device can also be caused to: obtain the information to be processed and the target quantization neural network, where the target quantization neural network is any of those in the embodiment corresponding to FIG. 2 Generated by the method of an embodiment; input the information to be processed into the target quantization neural network to obtain the processing result and output.
  • the computer program code used to perform the operations of the present disclosure can be written in one or more programming languages or a combination thereof.
  • the programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user’s computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider for example, using an Internet service provider to pass Internet connection.
  • each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented in a software manner, or may be implemented in a hardware manner.
  • the name of the unit does not constitute a limitation on the unit itself under certain circumstances.
  • the first acquisition unit can also be described as "a unit for acquiring a training sample set and an initial neural network".

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Machine Translation (AREA)

Abstract

L'invention porte sur un procédé et un appareil de génération de réseau neuronal quantifié. Un mode de réalisation particulier du procédé consiste : à obtenir un ensemble d'échantillons d'apprentissage et un réseau neuronal initial ; à convertir un paramètre de réseau à virgule flottante d'origine dans le réseau neuronal initial en un paramètre de réseau de type entier ; à partir du paramètre de réseau de type entier obtenu par conversion, à générer un réseau neuronal initial quantifié ; à sélectionner un échantillon d'apprentissage dans l'ensemble d'échantillons d'apprentissage, et à exécuter les étapes suivantes consistant : à utiliser des informations d'échantillon dans l'échantillon d'apprentissage en tant qu'entrée du réseau neuronal initial quantifié, à utiliser un résultat d'échantillon dans l'échantillon d'apprentissage en tant que sortie attendue du réseau neuronal initial quantifié, puis à entraîner le réseau neuronal initial quantifié ; et s'il est déterminé que l'apprentissage du réseau neuronal initial quantifié est achevé, à générer un réseau neuronal quantifié basé sur le réseau neuronal initial quantifié entraîné. Le mode de réalisation facilite la réduction d'un espace de stockage occupé par le réseau neuronal, et la réduction de la consommation CPU quand le réseau neuronal est utilisé pour un traitement d'informations, de telle sorte que l'efficacité de traitement d'informations est améliorée.
PCT/CN2020/078586 2019-04-11 2020-03-10 Procédé et appareil de génération de réseau neuronal quantifié WO2020207174A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910288941.2A CN109961141A (zh) 2019-04-11 2019-04-11 用于生成量化神经网络的方法和装置
CN201910288941.2 2019-04-11

Publications (1)

Publication Number Publication Date
WO2020207174A1 true WO2020207174A1 (fr) 2020-10-15

Family

ID=67026033

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/078586 WO2020207174A1 (fr) 2019-04-11 2020-03-10 Procédé et appareil de génération de réseau neuronal quantifié

Country Status (2)

Country Link
CN (1) CN109961141A (fr)
WO (1) WO2020207174A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961141A (zh) * 2019-04-11 2019-07-02 北京字节跳动网络技术有限公司 用于生成量化神经网络的方法和装置
CN110443165B (zh) * 2019-07-23 2022-04-29 北京迈格威科技有限公司 神经网络量化方法、图像识别方法、装置和计算机设备
CN110852421B (zh) * 2019-11-11 2023-01-17 北京百度网讯科技有限公司 模型生成方法和装置
CN111340226B (zh) * 2020-03-06 2022-01-25 北京市商汤科技开发有限公司 一种量化神经网络模型的训练及测试方法、装置及设备
CN112308226B (zh) * 2020-08-03 2024-05-24 北京沃东天骏信息技术有限公司 神经网络模型的量化、用于输出信息的方法和装置
CN113011569B (zh) * 2021-04-07 2024-06-18 开放智能机器(上海)有限公司 离线量化参数加注方法、装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590460A (zh) * 2017-09-12 2018-01-16 北京达佳互联信息技术有限公司 人脸分类方法、装置及智能终端
CN108509179A (zh) * 2018-04-04 2018-09-07 百度在线网络技术(北京)有限公司 用于生成模型的方法和装置
CN109165736A (zh) * 2018-08-08 2019-01-08 北京字节跳动网络技术有限公司 应用于卷积神经网络的信息处理方法和装置
CN109961141A (zh) * 2019-04-11 2019-07-02 北京字节跳动网络技术有限公司 用于生成量化神经网络的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590460A (zh) * 2017-09-12 2018-01-16 北京达佳互联信息技术有限公司 人脸分类方法、装置及智能终端
CN108509179A (zh) * 2018-04-04 2018-09-07 百度在线网络技术(北京)有限公司 用于生成模型的方法和装置
CN109165736A (zh) * 2018-08-08 2019-01-08 北京字节跳动网络技术有限公司 应用于卷积神经网络的信息处理方法和装置
CN109961141A (zh) * 2019-04-11 2019-07-02 北京字节跳动网络技术有限公司 用于生成量化神经网络的方法和装置

Also Published As

Publication number Publication date
CN109961141A (zh) 2019-07-02

Similar Documents

Publication Publication Date Title
WO2020207174A1 (fr) Procédé et appareil de génération de réseau neuronal quantifié
WO2020155907A1 (fr) Procédé et appareil pour la génération d'un modèle de conversion au style cartoon
CN110021052B (zh) 用于生成眼底图像生成模型的方法和装置
US11651198B2 (en) Data processing method and apparatus for neural network
CN112149699B (zh) 用于生成模型的方法、装置和用于识别图像的方法、装置
CN110009101B (zh) 用于生成量化神经网络的方法和装置
JP7437517B2 (ja) 予測情報を生成する方法、装置、電子機器及びコンピュータ可読媒体
CN111985831A (zh) 云计算资源的调度方法、装置、计算机设备及存储介质
WO2023185515A1 (fr) Procédé et appareil d'extraction de caractéristiques, support de stockage et dispositif électronique
CN111683156A (zh) 信息推送方法、装置、电子设备和计算机可读介质
WO2022012178A1 (fr) Procédé de génération de fonction objective, appareil, dispositif électronique et support lisible par ordinateur
CN114420135A (zh) 基于注意力机制的声纹识别方法及装置
CN113468344A (zh) 实体关系抽取方法、装置、电子设备和计算机可读介质
CN110046670B (zh) 特征向量降维方法和装置
CN109598344B (zh) 模型生成方法和装置
CN111653261A (zh) 语音合成方法、装置、可读存储介质及电子设备
WO2022121800A1 (fr) Procédé et appareil de positionnement de sources sonores et dispositif électronique
CN113593527B (zh) 一种生成声学特征、语音模型训练、语音识别方法及装置
CN116072108A (zh) 模型生成方法、语音识别方法、装置、介质及设备
CN113823312B (zh) 语音增强模型生成方法和装置、语音增强方法和装置
CN111709784B (zh) 用于生成用户留存时间的方法、装置、设备和介质
CN113780534A (zh) 网络模型的压缩方法、图像生成方法、装置、设备及介质
CN111680754A (zh) 图像分类方法、装置、电子设备及计算机可读存储介质
CN112070163B (zh) 图像分割模型训练和图像分割方法、装置、设备
CN112015625B (zh) 报警设备控制方法、装置、电子设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20788619

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.02.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20788619

Country of ref document: EP

Kind code of ref document: A1