CN114065900A - Data processing method and data processing device - Google Patents

Data processing method and data processing device Download PDF

Info

Publication number
CN114065900A
CN114065900A CN202010753701.8A CN202010753701A CN114065900A CN 114065900 A CN114065900 A CN 114065900A CN 202010753701 A CN202010753701 A CN 202010753701A CN 114065900 A CN114065900 A CN 114065900A
Authority
CN
China
Prior art keywords
point number
fixed point
fixed
processing
normalization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010753701.8A
Other languages
Chinese (zh)
Inventor
刘默翰
赵磊
刘华彦
白立勋
石巍巍
隋志成
周力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010753701.8A priority Critical patent/CN114065900A/en
Publication of CN114065900A publication Critical patent/CN114065900A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The embodiment of the application discloses the field of artificial intelligence, in particular to a data processing method and a data processing device in the field of model compression, wherein the method comprises the following steps: performing target processing on input data by using the neural network obtained by training to obtain a target result; the target processing includes: carrying out first processing on the input data by utilizing the neural network to obtain a first fixed point number; determining a second fixed point number corresponding to the first fixed point number according to the corresponding relation of the fixed point numbers; obtaining the target result according to the second fixed point number; the second fixed point number is equal to a third fixed point number obtained by performing second processing on the first fixed point number, and the second processing comprises the following steps: carrying out inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by utilizing a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain a third fixed point number; outputting the target result; the calculation complexity can be reduced, and the occupied storage space can be reduced.

Description

Data processing method and data processing device
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a data processing method and a data processing apparatus.
Background
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.
Currently, neural networks, such as a Recurrent Neural Network (RNN), a Gated Recurrent Unit (GRU), a Long Short Term Memory (LSTM), a bidirectional long short term memory (BiLSTM), a Convolutional Neural Network (CNN), are increasingly applied to terminals (such as mobile phones and tablet computers). For example, the terminal employs the RNN to process timing-related processing tasks, such as Automatic Speech Recognition (ASR), semantic understanding, and the like. However, the computation capability of the end side (i.e. the terminal side) is limited, and the inference task is implemented by the end side by adopting a neural network to perform floating point computation, so that the inference speed is slow. Therefore, it is necessary to study a data processing method with higher computational efficiency.
Disclosure of Invention
The embodiment of the application discloses a data processing method and a data processing device, which can reduce the computational complexity, improve the model reasoning speed and reduce the occupied storage space.
In a first aspect, an embodiment of the present application provides a data processing method, where the method includes: performing target processing on input data by using the neural network obtained by training to obtain a target result; the input data comprises a plurality of computer processable signals, the target processing comprising: performing first processing on the input data by using the neural network to obtain a first fixed point number; determining a second fixed point number corresponding to the first fixed point number according to the corresponding relation of the fixed point numbers; obtaining the target result according to the second fixed point number; the fixed point number corresponding relation comprises a corresponding relation between the first fixed point number and the second fixed point number, the second fixed point number is equal to a third fixed point number obtained by performing second processing on the first fixed point number, and the second processing comprises the following steps: carrying out inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by utilizing a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain a third fixed point number; the target nonlinear activation function is an activation function adopted by the neural network; and outputting the target result.
The execution subject of the embodiment of the application is a data processing device. The data processing device executes the operation of determining the second fixed point number corresponding to the first fixed point number according to the fixed point number corresponding relation, so that the technical problem solved is the same as the technical problem solved by the second fixed point number obtained by performing the second processing on the first fixed point number, and the second fixed point number is obtained. That is to say, the data processing apparatus achieves the purpose that the second processing is performed on the first fixed point number by performing the operation of determining the second fixed point number corresponding to the first fixed point number according to the fixed point number correspondence, and does not need to perform the operation of "performing inverse quantization on the first fixed point number to obtain the first floating point number, processing the first floating point number by using the target nonlinear activation function to obtain the second floating point number, and quantizing the second floating point number to obtain the third fixed point number". It should be understood that the computational complexity of performing "inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by using a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain the third fixed point number" is higher than the computational complexity of performing "determining a second fixed point number corresponding to the first fixed point number according to the fixed point number correspondence relationship". Therefore, according to the fixed point number corresponding relation, determining a second fixed point number corresponding to the first fixed point number; the computational complexity can be reduced, and the model reasoning speed can be improved. In addition, the data processing apparatus needs to perform floating point operation to realize "second processing on the first fixed point number", and needs to occupy a large storage space. Because the data processing device adopts fixed-point operation in the whole process of executing target processing, the occupied storage space can be reduced.
In the embodiment of the application, a second fixed point number corresponding to the first fixed point number is determined according to the corresponding relation of the fixed point numbers; the method can reduce the computational complexity, improve the model reasoning speed and reduce the occupied storage space.
In a possible implementation manner, the determining, according to the fixed-point number correspondence, a second fixed-point number corresponding to the first fixed-point number includes: searching the second fixed point number corresponding to the first fixed point number in the fixed point number corresponding relation table; the fixed point number corresponding relation table comprises the fixed point number corresponding relation.
The data processing device may pre-store the lookup fixed-point number correspondence table. In an implementation mode, the second fixed point number corresponding to the first fixed point number can be accurately and quickly obtained by searching the fixed point number corresponding relation table.
In one possible implementation manner, the first processing includes performing matrix multiplication on the input data by using weight data, the first processing does not include quantization operation and inverse quantization operation, values included in the input data are fixed-point numbers, and values included in the weight data are fixed-point numbers.
In this implementation, fixed-point operations are adopted in the process of executing the first processing by the data processing apparatus, so that the computational complexity is low and the consumed storage resources are small.
In a possible implementation manner, the obtaining the target result according to the second fixed point number includes: performing first normalization processing on the second fixed point number to obtain a fourth fixed point number; the first normalization processing includes: calculating the value of a first formula to obtain a third floating point number, wherein the parameters in the first formula comprise the floating point number obtained by performing data type conversion on the second fixed point number; quantizing the third floating point number to obtain a fifth fixed point number; calculating the sum of the fifth fixed point number and the sixth fixed point number to obtain the fourth fixed point number; the first formula is a part of a first normalization formula corresponding to the first normalization processing, and the sixth fixed point number is an offset value in the first normalization formula; and obtaining the target result according to the fourth fixed point number.
In the process of carrying out first normalization processing on the second fixed point number to obtain the fourth fixed point number, only one quantization operation needs to be executed, inverse quantization operation does not need to be executed, and the calculation amount is small. That is, in the process of normalizing one fixed point number to obtain another fixed point number, only one quantization operation needs to be performed.
In the implementation mode, the second fixed point number is normalized through the first normalization formula obtained by performing mathematical transformation and combination calculation on the normalization formula and the inverse quantization formula, only one necessary quantization operation is reserved, the calculation complexity of the normalization operation can be reduced, and the operation efficiency is improved.
In one possible implementation manner, the target nonlinear activation function is obtained by multiplying a scaling factor in the first normalization formula by an initial nonlinear activation function.
In this implementation, by incorporating the scaling factor in the first normalization formula into the nonlinear activation function, the amount of computation of the normalization operation can be reduced.
In a possible implementation manner, the obtaining the target result according to the second fixed point number includes: carrying out second normalization processing on the second fixed point number to obtain a seventh fixed point number; the second normalization process includes: calculating a value of a second normalization formula to obtain a fourth floating point number, wherein parameters in the second normalization formula comprise floating point numbers obtained by performing data type conversion on the second fixed point number; quantizing the fourth floating-point number to obtain the seventh fixed-point number; and obtaining the target result according to the seventh fixed point number.
In the implementation mode, the second fixed point number is normalized through the second normalization formula obtained by performing mathematical transformation and combination calculation on the normalization formula and the inverse quantization formula, only one necessary quantization operation is reserved, the calculation complexity of the normalization operation can be reduced, and the operation efficiency is improved.
In one possible implementation manner, the target nonlinear activation function is obtained by multiplying a scaling factor in the second normalization formula by an initial nonlinear activation function.
In this implementation, by incorporating the scaling factor in the first normalization formula into the nonlinear activation function, the amount of computation of the normalization operation can be reduced.
In one possible implementation, the neural network is any one of a recurrent neural network RNN, a gated recurrent unit GRU, a long short term memory LSTM, a bidirectional long short term memory bilst, a Simple Recurrent Unit (SRU), and a future RNN variant.
The neural network may be a neural network model for processing time-series related features. It is understood that the neural network of the present disclosure may be an RNN, any current variation of an RNN, or any future variation of an RNN.
In one possible implementation, the plurality of computer-processable signals includes: at least one of a speech signal, a text signal, or an image signal.
In one possible implementation, the target nonlinear activation function is an activation function having an upper bound and a lower bound, or the target nonlinear activation function is a modified linear unit ReLU family function having an upper bound.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including: the processing module is used for carrying out target processing on the input data by utilizing the neural network obtained by training to obtain a target result; the input data comprises a plurality of computer processable signals, the target processing comprising: performing first processing on the input data by using the neural network to obtain a first fixed point number; determining a second fixed point number corresponding to the first fixed point number according to the corresponding relation of the fixed point numbers; obtaining the target result according to the second fixed point number; the fixed point number corresponding relation comprises a corresponding relation between the first fixed point number and the second fixed point number, the second fixed point number is equal to a third fixed point number obtained by performing second processing on the first fixed point number, and the second processing comprises the following steps: carrying out inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by utilizing a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain a third fixed point number; the target nonlinear activation function is an activation function adopted by the neural network; and the output module is used for outputting the target result.
In a possible implementation manner, the processing module is specifically configured to search the second fixed point number corresponding to the first fixed point number in a fixed point number correspondence table; the fixed point number corresponding relation table comprises the fixed point number corresponding relation.
In one possible implementation manner, the first processing includes performing matrix multiplication on the input data by using weight data, the first processing does not include quantization operation and inverse quantization operation, values included in the input data are fixed-point numbers, and values included in the weight data are fixed-point numbers.
In a possible implementation manner, the processing module is specifically configured to perform a first normalization process on the second fixed-point number to obtain a fourth fixed-point number; the first normalization processing includes: calculating the value of a first formula to obtain a third floating point number, wherein the parameters in the first formula comprise the floating point number obtained by performing data type conversion on the second fixed point number; quantizing the third floating point number to obtain a fifth fixed point number; calculating the sum of the fifth fixed point number and the sixth fixed point number to obtain the fourth fixed point number; the first formula is a part of a first normalization formula corresponding to the first normalization processing, and the sixth fixed point number is an offset value in the first normalization formula; and obtaining the target result according to the fourth fixed point number.
In one possible implementation manner, the target nonlinear activation function is obtained by multiplying a scaling factor in the first normalization formula by an initial nonlinear activation function.
In a possible implementation manner, the processing module is specifically configured to perform a second normalization process on the second fixed-point numbers to obtain seventh fixed-point numbers; the second normalization process includes: calculating a value of a second normalization formula to obtain a fourth floating point number, wherein parameters in the second normalization formula comprise floating point numbers obtained by performing data type conversion on the second fixed point number; quantizing the fourth floating-point number to obtain the seventh fixed-point number; and obtaining the target result according to the seventh fixed point number.
In one possible implementation manner, the target nonlinear activation function is obtained by multiplying a scaling factor in the second normalization formula by an initial nonlinear activation function.
In one possible implementation, the neural network is any one of a recurrent neural network RNN, a gated recurrent unit GRU, a long short term memory LSTM, a bidirectional long short term memory bilst, a Simple Recurrent Unit (SRU), and a future RNN variant.
In one possible implementation, the plurality of computer-processable signals includes: at least one of a speech signal, a text signal, or an image signal.
With regard to the technical effects brought about by the second aspect or various possible implementations, reference may be made to the introduction of the technical effects of the first aspect or the corresponding implementations.
Third aspect an embodiment of the present application provides a data processing apparatus, including: a processor, a memory for storing code, and an output device; the processor is configured to execute the method provided by the first aspect or various possible embodiments by reading the code stored in the memory, and the output device is configured to output the target result.
In a fourth aspect, embodiments of the present application provide a computer program product comprising program instructions, which when executed by a processor, cause the processor to perform the first aspect or various possible implementations described above.
Fifth aspect embodiments of the present application provide a computer-readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method as provided in the above first aspect or various possible embodiments.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.
FIGS. 1A-1C illustrate an application scenario of a natural language processing system;
FIG. 2 is a schematic comparison diagram of a normalization method provided in the embodiments of the present application;
fig. 3 is a process diagram of an example of a data processing method provided in an embodiment of the present application;
fig. 4 is a flowchart of a data processing method according to an embodiment of the present application;
FIG. 5 is a process diagram illustrating an example of target processing of input data using a trained neural network according to an embodiment of the present application;
FIG. 6 is a diagram illustrating a portion of an example process for training a neural network using training samples according to an embodiment of the present application;
fig. 7 is a flowchart of a natural language processing method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of a neural network processor according to an embodiment of the present application;
fig. 10 is a block diagram of a partial structure of a terminal device according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of a server 1100 according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described with reference to the accompanying drawings.
The terms "first" and "second," and the like in the description, claims, and drawings of the present application are used solely to distinguish between different objects and not to describe a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. Such as a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In this application, "at least one" means one or more, "a plurality" means two or more, "at least two" means two or three and three or more, "and/or" for describing an association relationship of associated objects, which means that there may be three relationships, for example, "a and/or B" may mean: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one item(s) below" or similar expressions refer to any combination of these items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b," a and c, "" b and c, "or" a and b and c.
The data processing method provided by the embodiment of the application can be applied to scenes such as image processing, voice recognition, natural language processing and the like, and is particularly suitable for scenes in which a neural network (for example, a recurrent neural network) is adopted to process characteristics related to time sequence.
The following describes a scenario in which the scheme of the present application can be applied.
As shown in fig. 1A, a natural language processing system includes a user equipment and a data processing apparatus.
The user equipment can be a mobile phone, a personal computer, a tablet computer, a wearable device, a personal digital assistant, a game machine, an information processing center and other terminal equipment. The user device is an initiator of a natural language processing task, and as an initiator of the natural language processing task (for example, a translation task, a repeat task, and the like), a user typically initiates the natural language processing task through the user device. A rephrase task is a task of converting one natural language text into another text that has the same meaning as the natural language text but expresses a different expression. For example, "What maps the second world war happen" can be restated as "What is the reflection of world war II".
The data processing device may be a device or a server having a data processing function, such as a cloud server, a web server, an application server, and a management server. The data processing device receives inquiry sentences such as inquiry sentences/voice/text and the like from the terminal equipment through an interactive interface, and then performs language data processing in the modes of machine learning, deep learning, searching, reasoning, decision making and the like through a memory for storing data and a processor for executing data processing. The memory may be a generic term that includes databases that store historical data locally, either on the data processing device or on other network servers.
FIG. 1B shows another application scenario of the natural language processing system. In this scenario, the terminal device directly serves as a data processing apparatus, directly receives an input from a user, and directly processes the input by hardware of the terminal device itself, and the specific process is similar to that shown in fig. 1A, and reference may be made to the above description, which is not repeated herein.
As shown in fig. 1C, the user device may be the local device 101 or 102, the data processing apparatus may be the execution device 210, and the data storage system 250 may be integrated on the execution device 210, or may be disposed on a cloud or other network server.
The data processing method is applied to a natural language processing scene, can improve the speed of executing a natural voice processing task, and reduces the cost of a storage space. It should be understood that the data processing method of the embodiment of the present application is applied to other scenarios in which a neural network is used to perform a prediction task (e.g., an image processing task), and can also improve the speed of performing the prediction task.
Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.
(1) Normalization processing
The essence of the neural network learning process is to learn data distribution, if normalization processing is not performed, the distribution of each batch of training data is different, and when viewed from a large direction, the neural network needs to find balance points in the multiple distributions, and when viewed from a small direction, because the distribution of input data of each layer of network is constantly changed, the balance points of each layer of network are leveled, and obviously, the neural network is difficult to converge. Of course, if only the normalization processing is performed on the input data (for example, the input image is divided by 255 and then is classified into 0 to 1), only the data distribution of the input layer can be guaranteed to be the same, and the data distribution of each layer of network input cannot be guaranteed to be the same, so the normalization processing also needs to be added in the middle layer of the neural network.
Common normalization processes include: batch Normalization (BN), Layer Normalization (LN), Instance Normalization (IN), Group Normalization (GN), and the like.
The four normalized calculation flows of BN, LN, IN and GN are almost the same, and can be divided into four steps:
1) calculating the mean value of the given data set, wherein the formula is as follows:
Figure BDA0002610835930000061
where μ denotes the mean of a given data set, X ═ X1,x2,…,xmM denotes the number of values (corresponding to a scalar) in a given data set, xiRepresenting the ith value in a given data set X.
2) And calculating the variance of the given data set, wherein the formula is as follows:
Figure BDA0002610835930000071
wherein σ2Denotes the variance of a given data set, μ denotes the mean of a given data set, X ═ X1,x2,…,xmM denotes the number of values (corresponding to a scalar) in a given data set, xiRepresenting the ith value in a given data set X.
3) Normalizing the numerical values in the given data set until the mean value is 0 and the variance is 1, wherein the formula is as follows:
Figure BDA0002610835930000072
wherein the content of the first and second substances,
Figure BDA0002610835930000073
representing the normalization process xiThe resulting value, ∈ is a small constant in order to prevent division by 0.
4) And changing and reconstructing to recover the learned distribution of the layer of network, wherein the formula is as follows:
Figure BDA0002610835930000074
wherein, yiFor normalizing xiThe final values, gamma and beta, are learned by the neural network during the training phase. After normalization, it is also necessary to re-pair
Figure BDA0002610835930000075
Zoom and shift are performed. Two parameters, γ and β, are used for learning.
Combining equation (3) and equation (4), it can be seen that the four normalization processes BN, LN, IN, and GN satisfy the following equations:
Figure BDA0002610835930000076
wherein, yiFor normalizing xiThe final value obtained. It should be understood that equation (5) can be written as
Figure BDA0002610835930000077
x' is a normalization process xiThe final value obtained.
Let us assume that the shape of the feature map shape is denoted as [ N, C, H, W ], where N denotes the batch size (batch size), i.e., N samples; c represents the number of channels; H. w represents the height and width of the feature map, respectively. The main differences between these methods are:
1. BN is normalized to N, H, W on lot (batch), while preserving the dimension of channel C;
2. LN is normalized to C, H, W in the channel direction, and has obvious effect on RNN;
3. IN normalizes H, W on image pixels for use IN stylized migration;
4. the GN groups channels (channels) and then normalizes them.
Fig. 2 is a schematic comparison diagram of a normalization method provided in an embodiment of the present application. IN fig. 2, N is a lot (corresponding to the number of samples), C is a channel, (H, W) is the height and width of the feature map, the feature map denoted by 201 corresponds to the normalization method of BN, the feature map denoted by 202 corresponds to the normalization method of LN, the feature map denoted by 203 corresponds to the normalization method of IN, and the feature map denoted by 204 corresponds to the normalization method of GN. In fig. 2, the mean and variance are calculated from the values of the darker portions of the color, thereby performing normalization.
As can be seen from fig. 2, the BN calculation is a normalization of the NHW (corresponding to a given data set) for each channel individually, with a set of γ, β for each channel, so that the learnable parameter is 2 ×; the LN is calculated by taking each CHW (corresponding to a given data set) out of the normalization process independently, and is not influenced by the batch size; the calculation of LNs may be used in the RNN network; the IN calculation is a normalization process that takes each HW (corresponding to a given data set) out separately, independent of the channel and the blocksize; the GN is calculated by dividing the channel C into G groups, then taking each (C/G) HW (corresponding to a given data set) separately out of the normalization process, and finally merging the data after G group normalization into CHW.
(2) Neural network
The neural network may be composed of neural units, the neural units may refer to operation units with xs and intercept 1 as inputs, and the output of the operation units may be:
Figure BDA0002610835930000081
where s is 1, 2, … … n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function, a tanh function, etc. A neural network is a network formed by a number of the above-mentioned single neural units joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.
(3) A Recurrent Neural Network (RNN) is used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are all connected, and each node between every two layers is connectionless. Although the common neural network solves a plurality of problems, the common neural network still has no capability for solving a plurality of problems. For example, you would typically need to use the previous word to predict what the next word in a sentence is, because the previous and next words in a sentence are not independent. The RNN is called a recurrent neural network because the current output of the RNN is also related to previous outputs. The concrete expression is that the recurrent neural network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more but connected, and the input of the hidden layer not only comprises the output of the input layer but also comprises the output of the hidden layer at the last moment. In theory, RNNs can process sequence data of any length. The training for RNN is the same as for conventional CNN. The error back-propagation algorithm is also used, but with a little difference: that is, if the RNN is network-deployed, the parameters therein, such as W, are shared; this is not the case with the conventional neural networks described as examples above. And in using the gradient descent algorithm, the output of each step depends not only on the network of the current step, but also on the state of the networks of the previous steps. This learning algorithm is referred to as the Time-based Back Propagation Through Time (BPTT).
Now that there is a convolutional neural network, why is a circular neural network? For simple reasons, in convolutional neural networks, there is a precondition assumption that: the elements are independent of each other, as are inputs and outputs, such as cats and dogs. However, in the real world, many elements are interconnected, such as stock changes over time, and for example, a person says: i like to travel, wherein the favorite place is Yunnan, and the opportunity is in future to go. Here, to fill in the blank, humans should all know to fill in "yunnan". Because humans infer from the context, but how do the machine do it? The RNN is generated. RNNs aim at making machines capable of memory like humans. Therefore, the output of the RNN needs to be dependent on the current input information and historical memory information.
(4) Fixed point operation
On a given rectangular coordinate system, points of which the coordinates are all integers are called integer points; the set of whole points is called the spatial grid. Operations in a spatial grid are referred to as fixed-point operations.
While the related terms and concepts of the neural network related to the embodiments of the present application are introduced above, a method for converting a matrix multiplication performed by the neural network (e.g., RNN) from a floating-point operation to a fixed-point operation to increase the inference speed of the neural network (i.e., increase the speed of the neural network performing a prediction task) is described below with reference to the accompanying drawings.
Fig. 3 is a process schematic diagram of an example of a data processing method according to an embodiment of the present application. The method flow in fig. 3 may be understood as a process in which the data processing apparatus performs a predictive task using a neural network (i.e., an inference process). In fig. 3, W1 represents weight data (e.g., a weight matrix), and the weight data W1 is fixed-point numbers (e.g., data of types int8, uint8, etc.); x1 represents input data (e.g., image data, voice data, etc.), which is a floating point number (e.g., float type data); quantization (quantify) is the quantization of the input data X1 from a floating point number (e.g., float type data) to a fixed point number (e.g., fluid 8 type data) X1'; matrix multiplication (matrix multiplication) is to calculate the product of weight data W1 and X1' to obtain A1 (fixed point number); dequantization (dequantization) is the dequantization of A1 from fixed point numbers to floating point numbers A1'; the addition (add) is to calculate the sum of A1' and the offset B1 (floating point number) to get C1 (floating point number); the activation represents that C1 is processed by using nonlinear activation functions (activation functions) such as tanh and sigmoid, and D1 (floating point number) is obtained; normalization means D1 was normalized to give E1 (floating point number). Wherein the normalization operation is optional, but not necessary. As can be seen from fig. 3, the neural network (e.g., RNN) performs matrix multiplication after online quantization of input data by using quantization technology; the result of the matrix multiplication computation is then dequantized back to the floating point number. By converting the matrix multiplication calculation into fixed point operation, the inference speed of the neural network can be improved. However, the data processing method in fig. 3 not only needs to perform quantization and dequantization frequently, but also needs to perform more floating point operations, and has higher computational complexity. Because the computational complexity of the floating-point operation is far higher than that of the fixed-point operation, the computational efficiency, namely the reasoning speed, can be effectively improved by converting the floating-point operation in the data processing method into the fixed-point operation on the premise of ensuring the computational accuracy. The data processing method of the whole process fixed-point processing provided by the embodiment of the application can adopt fixed-point operation in the whole processing process, not only can reduce the computational complexity, but also can ensure the computational accuracy.
Fig. 4 is a flowchart of a data processing method according to an embodiment of the present application. As shown in fig. 4, the method includes:
401. and the data processing device performs target processing on the input data by using the neural network obtained by training to obtain a target result.
The input data includes a plurality of computer-processable signals, the plurality of computer-processable signals including: at least one of a speech signal, a text signal, or an image signal. That is, the input data may be any one of image data, voice data, text data, and the like. The neural network may be a neural network suitable for processing characteristics related to a time sequence, such as a recurrent neural network RNN, a Gated Recurrent Unit (GRU), a Long Short Term Memory (LSTM), a bidirectional long short term memory (bilst), and the like; other neural networks that employ nonlinear activation functions, such as convolutional neural networks, are also possible. The target processing includes: performing first processing on the input data by using the neural network to obtain a first fixed point number; determining a second fixed point number corresponding to the first fixed point number according to the corresponding relation of the fixed point numbers; obtaining the target result according to the second fixed point number; the fixed-point number correspondence relationship includes a correspondence relationship between the first fixed-point number and the second fixed-point number, the second fixed-point number is equal to a third fixed-point number obtained by performing a second process on the first fixed-point number, and the second process includes: performing inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by using a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain the third fixed point number; the target nonlinear activation function is an activation function adopted by the neural network. In the present application, fixed point numbers refer to integers, for example, integers of the types int8, uint8, int16, uint16, int32, uint32, and the like. The first fixed point number may be one fixed point number or may be all fixed point numbers obtained by performing the first processing on the input data. In some embodiments, the target process further comprises: normalization operations (e.g., BN). Since the BN can be decomposed offline into the multiply-add-and-activate function, the data processing apparatus, after executing the activate function, can normalize the second fixed point number by the BN decomposed into the multiply-add-and-activate function. The data processing device can be a personal computer, a computer workstation, a smart phone, a tablet computer, a smart camera, a smart car or other types of cellular phones, a media consumption device, a wearable device, a set top box, a game machine, an Augmented Reality (AR) device, a Virtual Reality (VR) device and other terminal devices, and can also be a server. The input data is fixed point data, such as int8 type data. In some embodiments, the target nonlinear activation function is a nonlinear activation function having upper and lower bounds, such as a sigmoid function, a tanh function, and the like.
Optionally, before performing step 401, the data processing apparatus quantizes the original input data to obtain the fixed-point input data. Optionally, the input data acquired or received by the data processing device is fixed-point type data. The first processing of the input data by using the neural network may include: the input data is subjected to convolution operation (corresponding to matrix multiplication), pooling processing, and the like by using the neural network. Optionally, the first processing includes performing matrix multiplication on the input data by using weight data, the first processing does not include quantization operation and inverse quantization operation, the input data includes fixed-point numbers (i.e., integers) as numerical values, and the weight data includes fixed-point numbers as numerical values. That is, the data processing apparatus may perform the first processing on the input data by using the neural network using a fixed-point operation.
One possible implementation manner of determining the second fixed point number corresponding to the first fixed point number according to the fixed point number corresponding relationship is as follows: searching the second fixed point number corresponding to the first fixed point number in the fixed point number corresponding relation table; the fixed point number corresponding relation table comprises the fixed point number corresponding relation. The data processing apparatus may store the fixed-point number correspondence table in advance or may acquire the fixed-point number correspondence table from another device (for example, a server). Optionally, the data processing apparatus performs first processing on any input data by using the neural network, and each fixed point number obtained by the first processing is included in the fixed point number correspondence table. That is, the data processing apparatus can find the fixed point number corresponding to an arbitrary fixed point number obtained by first processing the input data from the fixed point number relation table. The second fixed point number corresponding to the first fixed point number in the fixed point number corresponding relation table can be understood as a fixed point number obtained by adopting the following operation; and carrying out inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by utilizing a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain the second fixed point number. It should be understood that the data processing apparatus achieves the purpose that the second processing is performed on the first fixed point number by performing the operation of determining the second fixed point number corresponding to the first fixed point number according to the fixed point number correspondence, without performing the operation of "performing inverse quantization on the first fixed point number to obtain the first floating point number, processing the first floating point number with the target nonlinear activation function to obtain the second floating point number, and quantizing the second floating point number to obtain the third fixed point number".
402. The data processing device outputs the target result.
In some embodiments, the data processing apparatus is a server, and one implementation of step 402 is as follows: the data processing apparatus transmits the target result (e.g., a speech recognition result, a review result, a translation result, etc.) to the terminal device. In some embodiments, the data processing apparatus is a terminal device, and one implementation of step 402 is as follows: the data processing device displays the target result (such as a voice recognition result, a rephrase result, a translation result and the like) through a display screen or plays the target result (such as the rephrase result, the translation result and the like) through an audio device.
It should be understood that the data processing apparatus replaces the following operation with the search of the above-mentioned second fixed-point number corresponding to the above-mentioned first fixed-point number in the fixed-point number correspondence table: performing inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by using a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain the second fixed point number; the calculation amount can be greatly reduced, the calculation efficiency is improved, and the occupied storage space is reduced. In addition, the data processing device adopts fixed point operation in the process of executing step 401, and does not need to execute quantization operation and inverse quantization operation, thereby improving the calculation efficiency.
In the embodiment of the application, a second fixed point number corresponding to the first fixed point number is determined according to the corresponding relation of the fixed point numbers; the computational complexity can be reduced, and the reasoning speed can be improved.
An example of the target processing of input data using a trained neural network is described below.
Fig. 5 is a schematic process diagram of an example of target processing on input data by using a trained neural network according to an embodiment of the present application. As shown in fig. 5, 501 denotes a matrix multiplication operation, that is, a matrix multiplication operation is performed on the input data X2 and the weight data W2 to obtain a 2; 502 represents an addition operation, i.e., the sum of a2 and offset B2 is calculated to yield C2 (fixed point number); 503 denotes a table look-up operation, i.e. D2 (fixed point number) corresponding to C2 in the fixed point number correspondence table is looked up; normalization means that normalization of D2 results in E2, and the normalization operation is optional and not necessary. Wherein, X2, W2, X2', B2, C2 and E2 are fixed-point numbers, and any data processing operation in fig. 3 adopts fixed-point operation, that is, the whole process of target processing realizes full fixed-point.
In some embodiments, the first processing of the input data by using the neural network to obtain a first fixed point number may correspond to 501 and/or 502 in fig. 5; determining that the second fixed point number corresponding to the first fixed point number corresponds to 503 in fig. 5 according to the fixed point number corresponding relationship; obtaining the target result according to the second fixed point number corresponds to 504 in fig. 5 and other operations. In the data processing method flow of fig. 4, determining the second fixed point number corresponding to the first fixed point number according to the fixed point number corresponding relationship may be implemented by looking up a table, for example, looking up the second fixed point number corresponding to the first fixed point number in the fixed point number corresponding relationship table. The technical purpose achieved by the table lookup operation 503 in fig. 5 is equivalent to: carrying out inverse quantization on the C2 to obtain C2' (floating point number); processing C2 'by using nonlinear activation functions (activation functions) such as tanh and sigmoid to obtain C2' (floating point number); quantification of C2 "yielded D2 (fixed point numbers). It should be understood that, in the data processing method provided in the embodiment of the present application, replacing the dequantization operation on the input value (corresponding to C2) of the activation function, the activation operation implemented by using the activation function, and the quantization operation on the output value (corresponding to C2 ") of the activation function with the table lookup operation can effectively reduce the amount of computation and reduce the consumed storage resources. For example, for technical purposes, according to the fixed-point number correspondence relationship, an operation that determines that the operation equivalent of the second fixed-point number corresponding to the first fixed-point number satisfies the following formula:
Xout=Q(Act(De(Xin))) (7);
wherein, XinRepresenting a first fixed point number, XoutRepresents a second fixed point number, De () represents an inverse quantization formula (corresponding to an inverse quantization operation), Act () represents an activation function (corresponding to an activation operation), and Q () represents a quantization formula (corresponding to a quantization operation). The Act () may be an activation function having upper and lower bounds, such as a sigmoid function, tanh function, and the like.
Some possible implementations for obtaining the fixed-point number correspondence are described below.
In a first mode
The data processing device trains a neural network by using the training sample and records quantization parameters for executing quantization operation and/or inverse quantization operation in the process of processing the training sample by using the neural network; based on the quantization parameter, the first fixed point number (i.e., X) is calculated by formula (7)in) And a second fixed point number (i.e., X)out) The corresponding relationship of (1), namely the fixed point number corresponding relationship. An alternative quantization formula (i.e., Q ()) is as follows:
Figure BDA0002610835930000111
where s is a quantization scaling factor, typically 2k-1, where k is the quantization precision and k is an integer greater than 1; o is the offset of 0 bits before and after x quantization. The quantization parameter may include one or more of max (x), min (x), and s, max (x) represents a maximum value of the data processed by the quantization operation, and min (x) represents a minimum value of the data processed by the quantization operation.
Accordingly, the inverse quantization formula is as follows:
x=αxq+o (9);
wherein the content of the first and second substances,
Figure BDA0002610835930000112
o is the offset of 0 bits before and after x quantization. An example of training a neural network by a data processing apparatus using training samples is described below。
Fig. 6 is a schematic diagram of a part of an example process for training a neural network by using training samples according to an embodiment of the present application. Part of the process of training the neural network with the training samples in fig. 6 includes the following operations: calculating the sum (corresponding to Add operation) of A3 '(e.g., A3' is the result of the matrix product operation) and B3 (corresponding to the offset value), resulting in C3 (floating point number); quantizing C3 to obtain C3', and saving the first quantization parameter (i.e. the parameter used for quantizing C3); inverse quantizing C3' with the first quantization parameter to obtain C3; processing the C3 with an activation function to obtain D3 (floating point number); quantize D3 to obtain D3', and save the second quantization parameter (i.e., the parameter used to quantize D3); and performing inverse quantization on the D3' by using the second quantization parameter to obtain D3. In some embodiments, the quantization parameter in the above formula (7) includes the above first quantization parameter and the above second quantization parameter. As can be seen from fig. 6, when the data processing apparatus trains the neural network by using the training samples, floating point calculation is adopted in the whole process, and after quantization and inverse quantization operations, the whole quantization parameters (i.e., the first quantization parameter and the second quantization parameter) are recorded during training. It will be appreciated that merging the operations within the dashed box in fig. 6 results in: xout=Q(Act(De(Xin) )) of formula (7; wherein C3' corresponds to XinD3' corresponds to Xout. Due to the input of a non-linear activation function with upper and lower bounds (corresponding to X)in) And output (corresponding to X)out) May be equivalent to a one-to-one correspondence (i.e., fixed point number correspondence), and the table lookup operation 503 in fig. 5 may be equivalent to the operation in the dashed box in fig. 6 for technical purposes. Illustratively, for technical purposes, the lookup operation 503 in FIG. 5 is equivalent to Xout=Q(sigmoid(De(Xin) ) or X) orout=Q(tanh(De(Xin))). That is, the data processing method provided by the present application replaces the inverse quantization operation, the activation operation, and the quantization operation in the dashed box in fig. 6 by the table lookup operation in fig. 5. The operations within the dashed box in FIG. 6 can be computed off-line, and since the nonlinear activation function has upper and lower bounds, the inputs and outputs are computedA fixed point number (corresponding to X) can be obtainedin) To fixed point number (corresponding to X)out) The one-to-one correspondence relationship of (1), namely the fixed point number correspondence relationship. The data processing apparatus or other device may determine the fixed-point number correspondence using the quantization parameter recorded when the neural network is trained and equation (7). Optionally, the data processing apparatus substitutes all fixed-point numbers possibly obtained by performing the first processing on the input data by using the neural network as input values into a formula (7) to calculate, to obtain a fixed-point number corresponding to each fixed-point number, and stores a corresponding relationship between the fixed-point numbers, that is, a fixed-point number corresponding relationship.
Mode two
The data processing device trains a neural network by using a training sample, records the corresponding relation between the fixed point number corresponding to the input value of the activation function and the fixed point number corresponding to the output value of the activation function in the process of processing the training sample by using the neural network, and obtains the fixed point number corresponding relation. Taking the example of training the neural network in fig. 6 as an example, the data processing apparatus may record the correspondence between C3 'and D3'; wherein C3 ' (corresponding to the first fixed point number) is the fixed point number corresponding to the input value C3 of the activation function, i.e., C3 can be obtained by inverse quantizing C3 ', and D3 ' (corresponding to the second fixed point number) is the fixed point number corresponding to the output value D3 of the activation function, i.e., D3 can be obtained by quantizing D3. In some embodiments, the data processing apparatus may train the neural network using a large number of training samples, and record a correspondence between a fixed-point number corresponding to an input value of the activation function and a fixed-point number corresponding to an output value of the activation function in a process of processing the training samples using the neural network, to obtain a fixed-point number correspondence. For example, when the data processing apparatus trains the neural network by using the first training sample, the data processing apparatus records the corresponding relationship between the fixed point number corresponding to the input value of 100 groups of activation functions and the fixed point number corresponding to the output value of the activation function; when the second training sample is used for training the neural network, recording the corresponding relation (namely a new corresponding relation) between the fixed point number corresponding to the input value of the activation function which is not recorded before and the fixed point number corresponding to the output value of the activation function; and repeating the training for multiple times until the corresponding relation between the fixed point number corresponding to the input value of the new activation function and the fixed point number corresponding to the output value of the activation function cannot be obtained. It should be understood that when the data processing apparatus trains the neural network by using a large number of training samples, it is possible to record the correspondence between the fixed-point number corresponding to an arbitrary input value of the activation function and the fixed-point number corresponding to the output value of the activation function, that is, the fixed-point number correspondence.
The fixed-point calculation problem of the nonlinear activation function can be solved by converting inverse quantization operation and quantization operation before and after the activation function in the process of pseudo fixed-point quantization training (such as the training flow of fig. 6) into a fixed-point table look-up after mathematical equivalent transformation in an off-line state.
The foregoing embodiment describes a scheme in which fixed-point operations are all adopted in the prediction processing process using a neural network, that is, a full-flow fixed-point scheme. IN some embodiments, the data processing apparatus needs to perform a normalization operation, such as BN, LN, IN, GN, or the like, IN the course of performing the prediction processing using the neural network. For example, when performing a prediction task using RNN and its variants, the data processing apparatus usually needs to perform a normalization operation, such as LN and BN, to improve the stability of the neural network model. Because the normalization operations such as LN and the like need on-line calculation and can not be decomposed into multiplication addition and activation functions off line like BN, fixed-point data flow is interrupted, and quantization and inverse quantization operations are added. Some possible implementations of obtaining the above objective result based on the above second fixed point number are described below.
One general normalization formula is as follows:
Figure BDA0002610835930000131
where μ denotes the mean, σ, of a given data set2Representing the variance of a given data set, gamma representing a scaling factor (corresponding to the scaling), beta representing an offset value (corresponding to the offset), x representing any number in the given data set, x' representing the value obtained by normalizing x, gamma and beta being learned by the neural network during the training phase, and e being oneA fixed value. BN is calculated by taking NHW (corresponding to a given data set) for each channel separately and normalizing, there being a set of γ and β for each channel, so the learnable parameter is 2 × C; the calculation of LN is to take each CHW (corresponding to a given data set) out of the normalization process separately, and is not influenced by the batch size, and the calculation of LN can be used in the RNN network; the IN calculation is a normalization process that takes each HW (corresponding to a given data set) out separately, independent of the channel and the blocksize; the GN is calculated by dividing the channel C into G groups, then taking each (C/G) HW (corresponding to a given data set) separately out of the normalization process, and finally merging the data after G group normalization into CHW. In some embodiments, the data processing apparatus may calculate a mean and a variance from each batch of data (i.e., a given data set) in the training sample when training the neural network; in the application phase (i.e. the phase in which the neural network is used to perform the prediction task), the data processing device may directly calculate the mean of all the averages (e.g. the batch averages) obtained during the training, for the standard deviation (i.e. the standard deviation)
Figure BDA0002610835930000132
The data processing apparatus may employ unbiased estimates of variance of parties (e.g., variance of batch) obtained during training.
γ in the formula (10) may be incorporated into an activation function, for example, an initial nonlinear activation function (i.e., an original activation function of a neural network) is multiplied by γ to obtain the target nonlinear activation function, and the function of the activation function is implemented by using the table lookup method in the foregoing embodiment; beta can be directly expressed by fixed point number. Thus, only the pair of
Figure BDA0002610835930000133
The whole flow of data processing can be fixed by performing fixed-point processing.
An alternative quantization formula is as follows:
Figure BDA0002610835930000134
where s is a quantization scaling factor, typically 2k-1, where k is the quantization precision and k is an integer greater than 1; o is the offset of 0 bits before and after x quantization. Accordingly, the inverse quantization formula is: x ═ α xq+ o, wherein,
Figure BDA0002610835930000135
x is ═ alphaxq+ o substitution
Figure BDA0002610835930000136
To obtain
Figure BDA0002610835930000137
Wherein the content of the first and second substances,
Figure BDA0002610835930000138
since γ in equation (10) can be incorporated into the initial nonlinear activation function of the neural network to obtain the target nonlinear function and β can directly adopt the fixed point number, the normalization process performed by the data processing apparatus according to equation (10) only needs to be performed on
Figure BDA0002610835930000139
The whole process can be fixed by performing the fixed-point processing. Since x is ═ α xq+ o substitution
Figure BDA0002610835930000141
To obtain
Figure BDA0002610835930000142
Therefore, the data processing device realizes the full-flow fixed-point calculation
Figure BDA0002610835930000143
Corresponding quantization value (i.e. pair)
Figure BDA0002610835930000144
Fixed point numbers obtained by quantization).
In the case of a neural network implemented using computer programming (corresponding to program code), for example, power and open arithmetic do not support fixed point computations, and, thus,the fixed point numbers may be converted directly to floating point numbers during the normalization process (note that the conversion here is not inverse quantization, but is a direct data type conversion). And finally, quantizing the output (corresponding to a fifth fixed point number) again according to the general quantization formula (11) according to the maximum and minimum values of the result (corresponding to the third floating point number) after the normalization calculation is finished. In some embodiments, when a data processing apparatus or other device trains a neural network, fixed point operations may be used to compute the mean μ and/or variance σ for a given data set2This can improve the computational performance. That is, the mean value u of the molecular fractionqFixed-point numerical calculations (corresponding to fixed-point operations) can be used, and the difference between the results of fixed-point calculations and the structure of floating-point calculations is acceptable. The main benefit here is that the integer mean computation has considerable performance benefits over the floating point mean computation. Similarly, the use of an integer mean in the variance calculation process also improves performance.
In a first mode
An alternative implementation of the above target result according to the second fixed point number is as follows: performing first normalization processing on the second fixed point number to obtain a fourth fixed point number; the first normalization processing includes: calculating the value of a first formula to obtain a third floating point number, wherein the parameter in the first formula comprises the floating point number obtained by performing data type conversion on the second fixed point number; quantizing the third floating point number to obtain a fifth fixed point number; calculating the sum of the fifth fixed-point number and the sixth fixed-point number to obtain the fourth fixed-point number; the first formula is a part of a first normalization formula corresponding to the first normalization processing, and the sixth fixed point number is an offset value in the first normalization formula; and obtaining the target result according to the fourth fixed point number.
Optionally, the first normalization formula may satisfy the following formula:
Figure BDA0002610835930000145
wherein the meaning of each parameter in the formula (12) isThe parameters in the formula (11) have the same meaning, x "represents the fourth fixed-point number, γ represents a scaling factor, β represents the sixth fixed-point number (corresponding to an offset value), and xqThe number of the second fixed points is expressed,
Figure BDA0002610835930000146
representing the third floating-point number mentioned above,
Figure BDA0002610835930000147
indicates the fifth fixed point number, float (x)q-uq) Represents that (x) isq-uq) Conversion to floating point number, float (σ)q)2Is expressed as (sigma)q)2Converted to a floating point number, c is a constant (can be computed off-line). The above calculating the value of the first formula to obtain the third floating point number may be: computing
Figure BDA0002610835930000148
To obtain a third floating-point number. Optionally, the target nonlinear activation function is obtained by multiplying the scaling factor α in the first normalization formula by the initial nonlinear activation function. That is, the scaling factor γ is incorporated into the nonlinear activation function, and the data processing apparatus actually calculates
Figure BDA0002610835930000149
(i.e., the third floating point number), requantization
Figure BDA00026108359300001410
A fifth floating-point number is obtained. Optionally, the above calculating the value of the first formula to obtain the third floating point number may be: computing
Figure BDA00026108359300001411
To obtain a third floating-point number, γ, that does not incorporate the activate function. It should be understood that the first normalization process is to normalize the second fixed-point number according to the above equation (12). In some embodiments, the data processing device may pre-determine the quantization parameters (which may include max (x), min (x), s) recorded during the training of the neural network and perform the above operationsParameters required for the normalization process (e.g. mu, sigma)2σ, etc.) to obtain the parameters in equation (12), e.g.
Figure BDA0002610835930000151
And incorporating γ into the activation function to obtain a target nonlinear activation function and quantizing β into fixed-point numbers, wherein,
Figure BDA0002610835930000152
thus, the data processing apparatus can obtain all the parameters necessary for performing the first normalization process on the second fixed-point numbers to obtain the fourth fixed-point numbers. Illustratively, the data processing apparatus converts x by data typeq、μqC and σqThe fixed point numbers in (1) are all converted to floating point numbers (e.g., float (x)q-uq) And float (sigma)q)2) And calculate
Figure BDA0002610835930000153
Obtaining a third floating point number; quantizing the third floating point number to obtain a fifth fixed point number; calculating the sum of the fifth fixed-point number and the sixth fixed-point number (corresponding to β) to obtain the fourth fixed-point number (corresponding to x "). The fourth fixed point number may be a fixed point number obtained by performing a first normalization process on the second fixed point number. That is, in the process of obtaining the fixed point number by performing the first normalization process on the second fixed point number in the above-described manner, only one necessary quantization operation is performed, and it is not necessary to perform an inverse quantization operation.
In the implementation mode, the second fixed point number is normalized through the first normalization formula obtained by performing mathematical transformation and combination calculation on the normalization formula and the inverse quantization formula, only one necessary quantization operation is reserved, the calculation complexity of the normalization operation can be reduced, and the operation efficiency is improved.
Mode two
The above-mentioned implementation manner for obtaining the target result according to the second fixed point number is as follows: carrying out second normalization processing on the second fixed point number to obtain a seventh fixed point number; the second normalization process includes: calculating the value of a second normalization formula to obtain a fourth floating point number, wherein the parameters in the second normalization formula comprise the floating point number obtained by performing data type conversion on the second fixed point number; quantizing said fourth floating-point number to obtain said seventh fixed-point number; and obtaining the target result according to the seventh fixed point number.
Optionally, the second normalization formula may satisfy the following formula:
Figure BDA0002610835930000154
wherein x' ″ in the formula (13) represents the fourth fixed point number, and x in the formula (13)qThe second fixed point number is expressed, and the parameters in formula (13) other than x' ″ have the same meaning as the parameters in formula (12). Calculating the value of the second normalization formula to obtain the fourth floating-point number may be calculating the value of formula (13) to obtain the fourth floating-point number. In some embodiments, γ and β in equation (13) are both floating point numbers. In some embodiments, β in equation (13) is a fixed-point number and γ is a floating-point number. In some embodiments, β in equation (13) is a fixed-point number and γ is incorporated into the initial nonlinear activation function, and γ is a floating-point number.
In the implementation mode, the second fixed point number is normalized through the second normalization formula obtained by performing mathematical transformation and combination calculation on the normalization formula and the inverse quantization formula, only one necessary quantization operation is reserved, the calculation complexity of the normalization operation can be reduced, and the operation efficiency is improved.
The present solution can be applied to a variety of scenarios, and how to perform natural language processing tasks using a data processing apparatus is described below. Fig. 7 is a flowchart of a natural language processing method according to an embodiment of the present application, and as shown in fig. 7, the method may include:
701. the data processing device obtains a natural language text to be processed.
The natural language text to be processed may be a sentence currently to be processed by the data processing apparatus. The data processing apparatus can process the received natural language text or the natural language text obtained by recognizing the voice sentence by sentence.
In the scenarios of fig. 1A and 1C, obtaining the natural language text to be processed may be that the data processing apparatus receives data such as voice or text sent by the user equipment, and obtains the natural language text to be processed according to the received data such as voice or text. For example, a data processing device receives 2 sentences sent by a user equipment, the data processing device obtains a1 st sentence (natural language text to be processed), processes the 1 st sentence by using a neural network obtained by training, and outputs and processes the 1 st sentence to obtain a result; and acquiring a2 nd sentence (natural language text to be processed), processing the 2 nd sentence by utilizing the deep neural network obtained by training, and outputting and processing the 2 nd sentence to obtain a result.
In the scenario of fig. 1B, obtaining the natural language text to be processed may be that the terminal device directly receives data, such as voice or text, input by a user, and obtains the natural language text to be processed according to the received data, such as voice or text. For example, the terminal device receives 2 sentences input by the user, acquires a1 st sentence (natural language text to be processed), processes the 1 st sentence by using the deep neural network obtained by training, and outputs and processes the 1 st sentence to obtain a result; and acquiring a2 nd sentence (natural language text to be processed), processing the 2 nd sentence by utilizing the deep neural network obtained by training, and outputting and processing the 2 nd sentence to obtain a result.
702. And performing target processing on the natural language text by using the trained recurrent neural network, and outputting a target result obtained by processing the natural language text.
The recurrent neural network may be replaced with any of GRU, LSTM, BiLSTM, etc. The target process may be translation, rephrasing, summary generation, etc. The target result is another natural language text resulting from processing the natural language text. For example, the target result is a natural language text resulting from translating the natural language text. As another example, the target result is another natural language text that is a repetition of the natural language text. The natural language text to be processed may be considered an input sequence and the target result (another natural language text) resulting from the data processing means processing the natural language text may be considered a generation sequence. The natural language text to be processed is an example of the input data in fig. 4, and the target process in fig. 7 is the target process in fig. 4. It should be understood that the method flow in fig. 7 is one example of the method flow in fig. 4.
In the embodiment of the application, a second fixed point number corresponding to the first fixed point number is determined according to the corresponding relation of the fixed point numbers; the computational complexity can be reduced, and the reasoning speed can be improved.
The foregoing embodiments describe data processing methods, and the following describes the structure of a data processing apparatus that implements these methods. Fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and as shown in fig. 8, the data processing apparatus may include:
a processing module 801, configured to perform target processing on input data by using the trained neural network to obtain a target result; the input data includes a plurality of computer-processable signals, and the target processing includes: performing first processing on the input data by using the neural network to obtain a first fixed point number; determining a second fixed point number corresponding to the first fixed point number according to the corresponding relation of the fixed point numbers; obtaining the target result according to the second fixed point number; the fixed-point number correspondence relationship includes a correspondence relationship between the first fixed-point number and the second fixed-point number, the second fixed-point number is equal to a third fixed-point number obtained by performing a second process on the first fixed-point number, and the second process includes: performing inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by using a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain the third fixed point number; the target nonlinear activation function is an activation function adopted by the neural network;
an output module 802, configured to output the target result.
In an optional implementation manner, the processing module 801 is specifically configured to search the second fixed point number corresponding to the first fixed point number in the fixed point number correspondence table; the fixed point number corresponding relation table comprises the fixed point number corresponding relation.
In an alternative implementation, the first processing includes performing a matrix multiplication operation on the input data by using weight data, the first processing does not include a quantization operation and an inverse quantization operation, the input data includes fixed-point numbers, and the weight data includes fixed-point numbers.
In an optional implementation manner, the processing module 801 is specifically configured to perform a first normalization process on the second fixed-point numbers to obtain fourth fixed-point numbers; the first normalization processing includes: calculating the value of a first formula to obtain a third floating point number, wherein the parameter in the first formula comprises the floating point number obtained by performing data type conversion on the second fixed point number; quantizing the third floating point number to obtain a fifth fixed point number; calculating the sum of the fifth fixed-point number and the sixth fixed-point number to obtain the fourth fixed-point number; the first formula is a part of a first normalization formula corresponding to the first normalization processing, and the sixth fixed point number is an offset value in the first normalization formula; and obtaining the target result according to the fourth fixed point number.
In an alternative implementation, the target nonlinear activation function is obtained by multiplying the scaling factor in the first normalization formula by the initial nonlinear activation function.
In an optional implementation manner, the processing module 801 is specifically configured to perform a second normalization process on the second fixed-point numbers to obtain seventh fixed-point numbers; the second normalization process includes: calculating the value of a second normalization formula to obtain a fourth floating point number, wherein the parameters in the second normalization formula comprise the floating point number obtained by performing data type conversion on the second fixed point number; quantizing said fourth floating-point number to obtain said seventh fixed-point number; and obtaining the target result according to the seventh fixed point number.
In an alternative implementation, the target nonlinear activation function is obtained by multiplying the scaling factor in the second normalization formula by the initial nonlinear activation function.
In an optional implementation manner, the neural network is any one of a recurrent neural network RNN, a gated recurrent unit GRU, a long-short term memory LSTM and a bidirectional long-short term memory BiLSTM.
In an alternative implementation, the plurality of computer-processable signals includes: at least one of a speech signal, a text signal, or an image signal.
The method executed by the data processing apparatus using the neural network in the foregoing embodiment may be implemented in a neural-Network Processing Unit (NPU). Fig. 9 is a schematic structural diagram of a neural network processor according to an embodiment of the present application.
The neural network processor NPU 90NPU is mounted as a coprocessor on a main CPU (Host CPU) and tasks (e.g., natural language processing tasks) are assigned by the Host CPU. The core portion of the NPU is an arithmetic circuit 90, and the controller 904 controls the arithmetic circuit 903 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 903 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuit 903 is a two-dimensional systolic array. The arithmetic circuit 903 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 903 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 902 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 901 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 908.
The unified memory 906 is used to store input data as well as output data. The weight data is directly transferred to the weight Memory 902 through a Memory Access Controller (DMAC) 905. The input data is also carried into the unified memory 906 by the DMAC.
A Bus Interface Unit (BIU) 510 for the interaction of the AXI Bus with the DMAC and an Instruction Fetch memory (Instruction Fetch Buffer) 909.
The bus interface unit 510 is also used for the instruction fetch memory 909 to fetch instructions from the external memory, and for the memory unit access controller 905 to fetch the original data of the input matrix a or the weight matrix B from the external memory.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 906 or to transfer weight data into the weight memory 902 or to transfer input data into the input memory 901.
The vector calculation unit 907 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/FC layer network calculation in the neural network, such as Pooling (Pooling), Batch Normalization (Batch Normalization), Local Response Normalization (Local Response Normalization) and the like.
In some implementations, the vector calculation unit 907 can store the processed output vectors to the unified buffer 906. For example, the vector calculation unit 907 may apply a non-linear function to the output of the arithmetic circuit 903, such as a vector of accumulated values, to generate the activation values. In some implementations, the vector calculation unit 907 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuit 903, e.g., for use in subsequent layers in a neural network.
An instruction fetch buffer (instruction fetch buffer)909 connected to the controller 904 and configured to store instructions used by the controller 904;
the unified memory 906, the input memory 901, the weight memory 902, and the instruction fetch memory 909 are On-Chip memories.
The operations of the layers in the deep neural network shown in fig. 3 may be performed by the matrix calculation unit 212 or the vector calculation unit 907. It should be understood that the data processing method performed by the data processing apparatus using the neural network in the foregoing embodiments may be implemented in other processors.
The data processing method based on the neural network is realized by the NPU, so that the efficiency of the data processing device for executing the prediction task by using the neural network can be greatly improved.
The data processing apparatus in the embodiment of the present invention is described below from the viewpoint of hardware processing.
Fig. 10 is a block diagram of a partial structure of a terminal device according to an embodiment of the present application. Referring to fig. 10, the terminal device includes: radio Frequency (RF) circuit 1010, memory 1020, input unit 1030, display unit 1040, sensor 1050, audio circuit 1060, wireless fidelity (WiFi) module 1070, System On Chip (SoC) 1080, and power supply 1090. The terminal device in fig. 10 may be an example of the data processing apparatus in the foregoing embodiment.
The memory 1020 includes DDR memory, but may also include high-speed random access memory, or other storage units such as nonvolatile memory, for example, at least one disk storage device, flash memory device, or other volatile solid-state memory device.
Those skilled in the art will appreciate that the terminal device configuration shown in fig. 10 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following specifically describes each constituent component of the terminal device with reference to fig. 10:
the RF circuit 1010 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, for processing the received downlink information of the base station to the SoC 1080; in addition, the uplink data is transmitted to the base station. In general, RF circuit 1010 includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 1010 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), and the like.
The memory 1020 can be used for storing software programs and modules, and the SoC1080 executes various functional applications and data processing of the terminal device by running the software programs and modules stored in the memory 1020. The memory 1020 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, a translation function, a repeat function, and the like), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the terminal device, and the like.
The input unit 1030 may be used to receive input data (e.g., natural language text, voice data, etc.) and generate key signal inputs related to user settings and function control of the terminal device. Specifically, the input unit 1030 may include a touch panel 1031 and other input devices 1032. The touch panel 1031, also referred to as a touch screen, may collect touch operations by a user (e.g., operations by a user on or near the touch panel 1031 using any suitable object or accessory such as a finger, a stylus, etc.) and drive corresponding connection devices according to a preset program. Illustratively, the touch panel 1031 is configured to receive natural language text input by a user and input the natural language text to the SoC 1080. Alternatively, the touch panel 1031 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts the touch information into contact point coordinates, sends the contact point coordinates to the SoC1080, and can receive and execute commands sent by the SoC 1080. In addition, the touch panel 1031 may be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The input unit 1030 may include other input devices 1032 in addition to the touch panel 1031. In particular, other input devices 1032 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, a joystick, a touch screen, a microphone, and the like. Input device 1032 includes a microphone that can receive voice data input by a user and input the voice data to SoC 1080.
The SoC1080 executes the data processing method provided by the present application to perform target processing on the input data input by the input unit 1030 by running the software programs and modules stored in the memory 1020, so as to obtain a target result. For example, the SoC1080 may perform the data processing method provided in the present application to process the natural language text after converting the voice data input by the input unit 1030 into the natural language text, so as to obtain the target result.
The display unit 1040 may be used to display information input by a user or information provided to the user and various menus of the terminal device. The Display unit 1040 may include a Display panel 1041, and optionally, the Display panel 1041 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. The display unit 1040 may be used to display a target result obtained by the SoC1080 processing the input data. Further, the touch panel 1031 can cover the display panel 1041, and when the touch panel 1031 detects a touch operation thereon or nearby, the touch operation is transmitted to the SoC1080 to determine the type of the touch event, and then the SoC1080 provides a corresponding visual output on the display panel 1041 according to the type of the touch event. Although in fig. 10, touch panel 1031 and display panel 1041 are two separate components to implement input and output functions of the terminal device, in some embodiments, touch panel 1031 and display panel 1041 may be integrated to implement input and output functions of the terminal device.
The terminal device may also include at least one sensor 1050, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 1041 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 1041 and/or the backlight when the terminal device moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration) for recognizing the attitude of the terminal device, and related functions (such as pedometer and tapping) for vibration recognition; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured in the terminal device, detailed description is omitted here.
Audio circuitry 1060, speaker 1061, microphone 1062 may provide an audio interface between the user and the terminal device. The audio circuit 1060 can transmit the electrical signal converted from the received audio data to the speaker 1061, and the electrical signal is converted into a sound signal by the speaker 1061 and output; on the other hand, the microphone 1062 converts the collected sound signal into an electrical signal, which is received by the audio circuit 1060 and converted into audio data, and the audio data is output to the SoC1080 for processing, and then is sent to another terminal device via the RF circuit 1010, or the audio data is output to the memory 1020 for further processing.
WiFi belongs to short-distance wireless transmission technology, and the terminal device can help the user send and receive e-mail, browse web pages, access streaming media, etc. through the WiFi module 1070, which provides the user with wireless broadband internet access. Although fig. 10 shows the WiFi module 1070, it is understood that it does not belong to the essential constitution of the terminal device, and may be omitted entirely as needed within the scope not changing the essence of the invention.
The SoC1080 is a control center of the terminal device, connects various parts of the terminal device with various interfaces and lines, and executes various functions of the terminal device and processes data by running or executing software programs and/or modules stored in the memory 1020 and calling data stored in the memory 1020, thereby monitoring the terminal device as a whole. Optionally, SoC1080 may include multiple processing units, such as CPUs or various service processors (e.g., NPUs); SoC1080 may also integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It is to be understood that the modem processor described above may not be integrated into SoC 1080.
The terminal device also includes a power supply 1090 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the SoC1080 via a power management system to manage charging, discharging, and power consumption via the power management system.
Although not shown, the terminal device may further include a camera, a bluetooth module, and the like, which are not described herein.
Fig. 11 is a schematic structural diagram of a server 1100 according to an embodiment of the present disclosure, where the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1122 (e.g., one or more processors) and a memory 1132, and one or more storage media 1130 (e.g., one or more mass storage devices) for storing an application program 1142 or data 1144. Memory 1132 and storage media 1130 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 1122 may be provided in communication with the storage medium 1130 to execute a series of instruction operations in the storage medium 1130 on the server 1100. The server 1100 may be a data processing apparatus as provided herein.
The server 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
The steps performed by the data processing apparatus in the above-described embodiment may be based on the server configuration shown in fig. 11. In particular, the central processor 1122 may implement the functions of the processing module 801 in fig. 8. The wireless network interface 1150 may implement the functions of the output module 802. The wireless network interface 1150 may also receive input data from the terminal devices.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the technical effects of the solutions provided by the embodiments of the present application. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a readable storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned readable storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In addition, the present application also provides a computer program for implementing the operations and/or processes performed by the data processing apparatus in the method embodiments provided by the present application.
The present application also provides a computer-readable storage medium having stored therein computer code, which, when run on a computer, causes the computer to perform the operations and/or processes performed by the data processing apparatus in the method embodiments provided herein.
The present application also provides a computer program product comprising computer code or a computer program which, when run on a computer, causes the operations and/or processes performed by the data processing apparatus in the method embodiments provided herein to be carried out.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims. The present application provides a computer-readable storage medium, which stores a computer program, where the computer program includes software program instructions, and the program instructions, when executed by a processor in a data processing device, implement the data processing method in the foregoing embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (15)

1. A data processing method, comprising:
performing target processing on input data by using the neural network obtained by training to obtain a target result; the input data comprises a plurality of computer processable signals, the target processing comprising: performing first processing on the input data by using the neural network to obtain a first fixed point number; determining a second fixed point number corresponding to the first fixed point number according to the corresponding relation of the fixed point numbers; obtaining the target result according to the second fixed point number; the fixed point number corresponding relation comprises a corresponding relation between the first fixed point number and the second fixed point number, the second fixed point number is equal to a third fixed point number obtained by performing second processing on the first fixed point number, and the second processing comprises the following steps: carrying out inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by utilizing a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain a third fixed point number; the target nonlinear activation function is an activation function adopted by the neural network;
and outputting the target result.
2. The method according to claim 1, wherein the determining a second fixed point number corresponding to the first fixed point number according to the fixed point number correspondence comprises:
searching the second fixed point number corresponding to the first fixed point number in the fixed point number corresponding relation table; the fixed point number corresponding relation table comprises fixed point number corresponding relations, and any corresponding relation in the fixed point number corresponding relation table is a corresponding relation between a fixed point number corresponding to the input value of the target nonlinear activation function and a fixed point number corresponding to the output value of the target nonlinear activation function.
3. The method according to claim 1 or 2, wherein the obtaining the target result according to the second fixed-point number comprises:
performing first normalization processing on the second fixed point number to obtain a fourth fixed point number; the first normalization processing includes: calculating the value of a first formula to obtain a third floating point number, wherein the parameters in the first formula comprise the floating point number obtained by performing data type conversion on the second fixed point number; quantizing the third floating point number to obtain a fifth fixed point number; calculating the sum of the fifth fixed point number and the sixth fixed point number to obtain the fourth fixed point number; the first formula is a part of a first normalization formula corresponding to the first normalization processing, and the sixth fixed point number is an offset value in the first normalization formula;
and obtaining the target result according to the fourth fixed point number.
4. The method of claim 3, wherein the target nonlinear activation function is obtained by multiplying a scaling factor in the first normalization formula by an initial nonlinear activation function.
5. The method according to claim 1 or 2, wherein the obtaining the target result according to the second fixed-point number comprises:
carrying out second normalization processing on the second fixed point number to obtain a seventh fixed point number; the second normalization process includes: calculating a value of a second normalization formula to obtain a fourth floating point number, wherein parameters in the second normalization formula comprise floating point numbers obtained by performing data type conversion on the second fixed point number; quantizing the fourth floating-point number to obtain the seventh fixed-point number;
and obtaining the target result according to the seventh fixed point number.
6. The method of claim 5, wherein the target nonlinear activation function is obtained by multiplying a scaling factor in the second normalization formula by an initial nonlinear activation function.
7. The method of any of claims 1 to 6, wherein the plurality of computer-processable signals comprises: at least one of a speech signal, a text signal, or an image signal.
8. A data processing apparatus, comprising:
the processing module is used for carrying out target processing on the input data by utilizing the neural network obtained by training to obtain a target result; the input data comprises a plurality of computer processable signals, the target processing comprising: performing first processing on the input data by using the neural network to obtain a first fixed point number; determining a second fixed point number corresponding to the first fixed point number according to the corresponding relation of the fixed point numbers; obtaining the target result according to the second fixed point number; the fixed point number corresponding relation comprises a corresponding relation between the first fixed point number and the second fixed point number, the second fixed point number is equal to a third fixed point number obtained by performing second processing on the first fixed point number, and the second processing comprises the following steps: carrying out inverse quantization on the first fixed point number to obtain a first floating point number, processing the first floating point number by utilizing a target nonlinear activation function to obtain a second floating point number, and quantizing the second floating point number to obtain a third fixed point number; the target nonlinear activation function is an activation function adopted by the neural network;
and the output module is used for outputting the target result.
9. The data processing apparatus of claim 8,
the processing module is specifically configured to search the second fixed point number corresponding to the first fixed point number in the fixed point number correspondence table; the fixed point number corresponding relation table comprises fixed point number corresponding relations, and any corresponding relation in the fixed point number corresponding relation table is a corresponding relation between a fixed point number corresponding to the input value of the target nonlinear activation function and a fixed point number corresponding to the output value of the target nonlinear activation function.
10. The data processing apparatus of claim 8 or 9,
the processing module is specifically configured to perform first normalization processing on the second fixed point numbers to obtain fourth fixed point numbers; the first normalization processing includes: calculating the value of a first formula to obtain a third floating point number, wherein the parameters in the first formula comprise the floating point number obtained by performing data type conversion on the second fixed point number; quantizing the third floating point number to obtain a fifth fixed point number; calculating the sum of the fifth fixed point number and the sixth fixed point number to obtain the fourth fixed point number; the first formula is a part of a first normalization formula corresponding to the first normalization processing, and the sixth fixed point number is an offset value in the first normalization formula;
and obtaining the target result according to the fourth fixed point number.
11. The data processing apparatus of claim 10, wherein the target nonlinear activation function is obtained by multiplying a scaling factor in the first normalization formula by an initial nonlinear activation function.
12. The data processing apparatus of claim 8 or 9,
the processing module is specifically configured to perform a second normalization process on the second fixed-point number to obtain a seventh fixed-point number; the second normalization process includes: calculating a value of a second normalization formula to obtain a fourth floating point number, wherein parameters in the second normalization formula comprise floating point numbers obtained by performing data type conversion on the second fixed point number; quantizing the fourth floating-point number to obtain the seventh fixed-point number;
and obtaining the target result according to the seventh fixed point number.
13. The data processing apparatus of claim 12, wherein the target nonlinear activation function is obtained by multiplying a scaling factor in the second normalization formula by an initial nonlinear activation function.
14. The data processing apparatus of claim 13, wherein the plurality of computer-processable signals comprises: at least one of a speech signal, a text signal, or an image signal.
15. A computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the method of any one of claims 1 to 7.
CN202010753701.8A 2020-07-30 2020-07-30 Data processing method and data processing device Pending CN114065900A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010753701.8A CN114065900A (en) 2020-07-30 2020-07-30 Data processing method and data processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010753701.8A CN114065900A (en) 2020-07-30 2020-07-30 Data processing method and data processing device

Publications (1)

Publication Number Publication Date
CN114065900A true CN114065900A (en) 2022-02-18

Family

ID=80227415

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010753701.8A Pending CN114065900A (en) 2020-07-30 2020-07-30 Data processing method and data processing device

Country Status (1)

Country Link
CN (1) CN114065900A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114841325A (en) * 2022-05-20 2022-08-02 安谋科技(中国)有限公司 Data processing method and medium of neural network model and electronic device
CN115328438A (en) * 2022-10-13 2022-11-11 华控清交信息科技(北京)有限公司 Data processing method and device and electronic equipment
CN115879513A (en) * 2023-03-03 2023-03-31 深圳精智达技术股份有限公司 Data hierarchical standardization method and device and electronic equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114841325A (en) * 2022-05-20 2022-08-02 安谋科技(中国)有限公司 Data processing method and medium of neural network model and electronic device
CN115328438A (en) * 2022-10-13 2022-11-11 华控清交信息科技(北京)有限公司 Data processing method and device and electronic equipment
CN115879513A (en) * 2023-03-03 2023-03-31 深圳精智达技术股份有限公司 Data hierarchical standardization method and device and electronic equipment
CN115879513B (en) * 2023-03-03 2023-11-14 深圳精智达技术股份有限公司 Hierarchical standardization method and device for data and electronic equipment

Similar Documents

Publication Publication Date Title
CN110162799B (en) Model training method, machine translation method, and related devices and equipment
CN109543195B (en) Text translation method, information processing method and device
CN108304388B (en) Machine translation method and device
CN110599557A (en) Image description generation method, model training method, device and storage medium
CN111816159B (en) Language identification method and related device
CN110069715B (en) Information recommendation model training method, information recommendation method and device
CN114065900A (en) Data processing method and data processing device
CN109902296B (en) Natural language processing method, training method and data processing equipment
KR102530548B1 (en) neural network processing unit
CN110555337B (en) Method and device for detecting indication object and related equipment
CN110516113B (en) Video classification method, video classification model training method and device
CN113505883A (en) Neural network training method and device
CN113284142A (en) Image detection method, image detection device, computer-readable storage medium and computer equipment
CN113821720A (en) Behavior prediction method and device and related product
CN112748899A (en) Data processing method and related equipment
CN112166441A (en) Data processing method, device and computer readable storage medium
CN115866291A (en) Data processing method and device
CN113822435B (en) Prediction method of user conversion rate and related equipment
CN113948060A (en) Network training method, data processing method and related equipment
CN111709789A (en) User conversion rate determining method and related equipment
CN113569043A (en) Text category determination method and related device
CN117095258B (en) Diffusion model training method and device, electronic equipment and storage medium
CN115905850A (en) Data processing method and related equipment
CN117370657A (en) Method, device, equipment and storage medium for recommending articles
CN115983362A (en) Quantization method, recommendation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination