CN118228789A - Method and system for detecting abnormal electroencephalogram signals by accelerating long-short-term memory network FPGA hardware and capable of being quantized efficiently - Google Patents
Method and system for detecting abnormal electroencephalogram signals by accelerating long-short-term memory network FPGA hardware and capable of being quantized efficiently Download PDFInfo
- Publication number
- CN118228789A CN118228789A CN202410548521.4A CN202410548521A CN118228789A CN 118228789 A CN118228789 A CN 118228789A CN 202410548521 A CN202410548521 A CN 202410548521A CN 118228789 A CN118228789 A CN 118228789A
- Authority
- CN
- China
- Prior art keywords
- short
- long
- quantized
- memory network
- time memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000015654 memory Effects 0.000 title claims abstract description 170
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 33
- 238000013139 quantization Methods 0.000 claims abstract description 93
- 238000004364 calculation method Methods 0.000 claims abstract description 17
- 230000001133 acceleration Effects 0.000 claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims description 70
- 230000006870 function Effects 0.000 claims description 53
- 210000004027 cell Anatomy 0.000 claims description 45
- 125000004122 cyclic group Chemical group 0.000 claims description 25
- 230000008569 process Effects 0.000 claims description 19
- 238000001514 detection method Methods 0.000 claims description 17
- 238000013507 mapping Methods 0.000 claims description 16
- 238000013527 convolutional neural network Methods 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 10
- 239000010410 layer Substances 0.000 claims description 10
- 239000002356 single layer Substances 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 238000007689 inspection Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 3
- 230000005856 abnormality Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000004044 response Effects 0.000 abstract description 2
- 238000013528 artificial neural network Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000006403 short-term memory Effects 0.000 description 4
- 210000004556 brain Anatomy 0.000 description 3
- 238000002790 cross-validation Methods 0.000 description 3
- 230000007787 long-term memory Effects 0.000 description 3
- 238000011002 quantification Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 238000000700 time series analysis Methods 0.000 description 1
Landscapes
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention relates to a method and a system for detecting acceleration of long-short-term memory network FPGA hardware and abnormal electroencephalogram signals, which can be quantified efficiently, and comprises the following steps: the method comprises the steps of quantizing a long-short-time memory network capable of being quantized efficiently, deploying the quantized long-short-time memory network into programmable logic PL accelerated by FPGA hardware, compiling the quantized long-short-time memory network into Verilog codes, and generating an IP core for acceleration; the input signal, the quantization bias and the weight are transmitted through an AXI bus; after the quantized long-short-term memory network finishes calculation, the output data is transmitted back to the ARM processor unit PS accelerated by FPGA hardware through the same AXI bus. The invention can obviously reduce the memory occupation space of the long-short-time memory network, reduce the operation power consumption of the network, is beneficial to the deployment and the high-efficiency operation of the long-short-time memory network on the low-power-consumption edge hardware equipment, and promotes the real-time processing and response.
Description
Technical Field
The invention relates to a method for accelerating the hardware of a long-short-term memory network FPGA (field programmable gate array) and a method and a system for detecting abnormal electroencephalogram signals, which belong to the technical fields of neural networks, artificial intelligence and FPGA.
Background
The cyclic neural network (Recurrent Neural Network, RNN), especially a Long Short time memory network based on a Long Short time memory unit (Long Short-TermMemory, LSTM), is a neural network which is specially designed for processing time series data and can be trained end to end, has excellent performance and wide application in the fields of time series analysis, natural language processing and the like, and is also becoming a research hotspot in the field of electroencephalogram signal analysis.
Compared with the traditional time sequence modeling method, the long-short-time memory network can integrate and learn the brain electrical characteristics with resolution from the original brain electrical signals better. However, the parameter amount of the long-short-term memory network is large, and all parameters are generally stored in a computer by using 32-bit or 16-bit floating point numbers, so that the occupied memory space is too large, and the parameters are difficult to be deployed in a mobile phone or other low-power-consumption edge computing hardware. Currently, the main approach to solve this problem is to quantize the parameters of the model into low-bit-width integers, so as to facilitate FPGA hardware deployment. However, the existing model parameter low-order-width quantization method generally needs training calibration during quantization or data calibration after quantization to keep the original precision of the model, has complicated flow, and limits the deployment and application of long-time memory networks in the FPGA. Therefore, aiming at the problems of long-short time memory networks, a novel long-short time memory network capable of being quantized efficiently and an FPGA hardware acceleration method thereof are provided and are used for detecting abnormal electroencephalogram signals.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a long-short-time memory network FPGA hardware acceleration method capable of being quantized efficiently. The invention provides the long-short time memory unit with normalized cell state (CELL STATE) output, which is convenient for long-short time memory network quantification and avoids data calibration operation in the traditional post-quantification method. Meanwhile, a nonlinear activation function quantization scheme of a long-short-time memory unit based on a Sigmoid piecewise function lookup table is provided, and the piecewise function is adopted to approximate the nonlinear activation function, so that quantization without losing precision can be realized under the condition of low quantization bit width; and a long-short-term memory network FPGA hardware accelerator capable of being quantized efficiently is provided.
The invention further provides an abnormal electroencephalogram signal detection method and system based on the long-short-term memory network FPGA hardware accelerator capable of being quantized efficiently.
Term interpretation:
1. Quantification: a process of mapping a set of parameters to another value range by mathematical transformation. Mapping transformation is usually implemented by using a linear mapping method. The original values of the parameters are typically floating point numbers, while the quantized parameter values are typically integers.
3. Training: and (3) inputting a group of data into the neural network, comparing the obtained output result with a label corresponding to the group of data, calculating an error, obtaining a gradient value of each parameter through a back propagation algorithm, and updating the gradient value.
4. Long and short time memory unit: the long and short time memory unit is a special form of a recurrent neural network, and each long and short time memory unit comprises three gates: forget gates (deciding which information to discard), input gates (deciding which new information to update to cell state), and output gates (deciding the next hidden state). The gating structure enables long-distance maintenance and state transmission between long-time and short-time memory units, and long-distance dependency relationship is effectively learned and memorized.
5. Long and short term memory network: the neural network is formed by sequentially connecting a plurality of long and short time memory units according to a specific sequence and is used for processing, predicting and classifying time sequence data. The long and short term memory network maintains and updates information in each cell through its unique gating mechanism, allowing the network to maintain and communicate information over long distances among the cells as time series data is processed.
6. Electroencephalogram signal feature extraction based on convolutional neural network: and extracting the characteristics of the brain electrical signals by using a Convolutional Neural Network (CNN). The convolutional neural network comprises one or more convolutional layers, can receive an original electroencephalogram signal as an input, and automatically extracts characteristics of the electroencephalogram signal through the convolutional layers, the activating layers, the pooling layers and the like of the network.
The technical scheme of the invention is as follows:
A long-short-time memory network FPGA hardware acceleration method capable of being quantized efficiently runs in an FPGA hardware accelerator, and the FPGA hardware accelerator comprises an ARM processor unit PS and a programmable logic unit PL; comprising the following steps:
the long-short time memory network capable of being quantized efficiently consists of a plurality of long-short time memory units which are connected in sequence and capable of being quantized efficiently;
Performing parameter quantization on the trained long-short-time memory network with floating point parameters and capable of being quantized efficiently to obtain a quantized long-short-time memory network;
compiling the quantized long-short-time memory network into Verilog codes, generating IP cores for acceleration, and deploying the IP cores in a programmable logic unit PL;
the ARM processor unit PS is responsible for data preparation and preprocessing work and also for softmax mapping operation;
The quantized configuration parameters of each unit of the long-short-time memory network comprise the number of input/output channels, the dimension of feature vectors, the number of neurons and quantization coefficients are firstly transmitted to a programmable logic unit PL at the PS end of an ARM processor unit through a simplified advanced extension interface (AXI Lite) bus;
the input signals, quantization bias and weights are transmitted to the programmable logic unit PL at the ARM processor unit PS end via an advanced extension interface (AXI) bus; after the quantized long-short-time memory unit finishes calculation at the programmable logic unit PL end, the output data is transmitted back to the ARM processor unit PS end through the same AXI bus;
parameter configuration of a next quantized long-short-time memory unit is carried out, a new round of feature vector calculation is started, and the process is repeated until the whole quantized long-short-time memory network calculation is completed;
And outputting a calculation result.
According to the invention, the segment activation function of the high-efficiency quantifiable long-short-time memory unit
Wherein T a is an adjustable Sigmoid function approximation parameter;
Further preferably, T a =2.5;
According to the invention, the calculation process of the cell state parameter in the long-short-time memory unit capable of being quantified with high efficiency is as follows:
Wherein, the ≡is Hadamard product, c t is the cell state of the t time step in the long and short time memory unit which can be quantized efficiently, i t、ft、gt is the input gate control value, forget gate control value and cell candidate unit value of the t time step in the long and short time memory unit which can be quantized efficiently.
According to the invention, the parameter quantization process of the trained long-short-time memory network with floating point parameters comprises the following steps:
(1) Initializing quantization parameters, including:
Let the overall weight matrix w= [ W f;Wi;Wo;Wg ], then W
Let the total cyclic weight matrix r= [ R f;Ri;Ro;Rg ], then
Initializing all weight quantization bit widths B WR =8;
Initializing a piecewise function lookup table quantization bit width B LUT =4;
initializing a fixed point digital width B Fix =24;
Initializing a weight quantization factor
Initializing a weight matrix quantization scale Q W=max(|W|)/MWR;
initializing a cyclic weight matrix quantization scale Q R=max(|R|)/MWR; wherein, the max (·) function is a maximum value function, and the |·| function is an absolute value function;
initializing an input quantization scale Q X=1/MWR;
Initializing hidden unit quantization scales Initializing input weight quantization scalesInitializing input cyclic weight quantization scale/>Initializing candidate unit quantization scale/>
(2) Generating a quantization look-up table; comprising the following steps:
First, with-T a as the start value, T a as the end value, the increment (i.e., step size) is set to Generating an arithmetic sequence I d; then, calculate the piecewise function/>Is provided for the quantization look-up table: /(I)
(3) Quantizing the weight matrix; comprising the following steps:
the quantized forgetting gate weight matrix is shown as formula (3):
the quantization input gate weight matrix is as shown in formula (4):
the quantization output gate weight matrix is as shown in formula (5):
the cell candidate weight matrix is quantified as shown in formula (6):
the quantized forgetting gate cyclic weight matrix is shown in formula (7):
the quantization is input into the gate cyclic weight matrix, as shown in formula (8):
The quantization output gate cycle weight matrix is as shown in formula (9):
quantifying a cell candidate cyclic weight matrix as shown in formula (10):
quantifying the forgetting gate bias as shown in equation (11):
the input gate bias is quantized as shown in equation (12):
The quantization output gate bias is as shown in equation (13):
quantifying the cell candidate bias as shown in formula (14):
(4) Calculating quantized gate control and state parameter values; comprising the following steps:
calculating a quantized forget gate gating value:
Calculating a quantized input gate control value:
Calculating a quantized output gate control value:
Calculating a quantized cell candidate unit value:
calculating quantitative cell state parameter values:
calculating quantized hidden state parameter values:
a long and short time memory network FPGA hardware accelerator capable of being quantized efficiently comprises an ARM processor unit PS and a programmable logic unit PL which are connected through an advanced extension interface (AXI Lite) bus, and the long and short time memory network FPGA hardware acceleration method capable of being quantized efficiently is achieved.
An abnormal electroencephalogram signal detection method based on a long-short-term memory network FPGA hardware accelerator capable of being quantized efficiently comprises the following steps:
a data acquisition module consisting of an electroencephalogram amplifier and an A/D converter is adopted to acquire an electroencephalogram signal to be detected and store the electroencephalogram signal into a computer;
training a long-short-time memory network capable of being quantized efficiently;
the trained long-short-time memory network capable of being quantized efficiently is deployed on the FPGA hardware accelerator of the long-short-time memory network capable of being quantized efficiently;
extracting characteristics of an electroencephalogram signal to be detected;
inputting the characteristics of the electroencephalogram signals to be detected into a long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently to obtain output values;
Inverse quantization is carried out on the output value of the long-short-term memory network FPGA hardware accelerator which can be quantized efficiently, floating point number characteristics after inverse quantization are obtained, and inspection results (abnormal electroencephalogram or normal electroencephalogram) are output after softmax function mapping;
If the detection result is abnormal electroencephalogram, alarming is carried out through an alarm module.
According to the invention, the training process of the long-short-time memory network capable of being quantified efficiently comprises the following steps:
firstly, acquiring electroencephalogram data for training by a data acquisition module consisting of an electroencephalogram amplifier and an A/D converter;
secondly, initializing parameters and weight matrixes of floating point formats of all units according to the set quantity of units in the long-short-time memory network capable of being quantized efficiently, wherein the parameters and weight matrixes comprise:
cell state parameter at initial time step t=0 All initialized to 0;
hidden state parameter when initial time step t=0 All initialized to 0;
forgetting the gate weight matrix Input gate weight matrix/>Outputting a gate weight matrixCell candidate weight matrix/>Initializing to be a random number;
Cycle weight matrix for forgetting gate Input gate cyclic weight matrix/>Output gate cycle weight matrix/>Cell candidate cyclic weight matrix/>Initializing to be a random number;
Biasing a forgetting door Input gate bias/>Output gate bias/>Cell candidate biasAll initialized to 0; then, according to the parameter and the weight matrix, calculating various gating and state parameter values, including:
For the data u t of the t-th time step in the extracted electroencephalogram signal characteristics, calculating floating point parameter values of each gating according to the following formula:
Calculating a forget gate gating value f t as shown in formula (21):
wherein h t-1 is a hidden state parameter of the t-1 time step;
the input gate-control value i t is calculated as shown in equation (22):
the output gate gating value o t is calculated as shown in equation (23):
calculating a cell candidate unit value g t as shown in formula (24):
Calculating a cell state parameter c t as shown in formula (25):
Calculating a hidden state parameter h t as shown in formula (26):
ht=ot⊙ct (26)
Then, calculating a loss function of the high-efficiency quantized long-short-time memory network formed by a plurality of high-efficiency quantized long-short-time memory units which are sequentially connected, and carrying out repeated iterative updating on floating point parameters according to the value of the loss function and combining a counter-propagation algorithm; after the maximum iteration times are reached, the long-short-time memory network capable of efficiently quantizing stops iteration updating, floating point parameter values of all parameters and weight matrixes are fixed, and the floating point parameter values are stored.
According to the invention, the feature of the floating point number after inverse quantization is calculated; as shown in formula (27):
Wherein: for the quantized integer feature output by the long-short-time memory network at the t-th time step, Q X is the input quantization scale.
According to a preferred embodiment of the invention, a loss value is calculated; comprising the following steps:
calculating a current loss value through a loss function E according to the output hidden layer characteristic h t; the loss function E is defined as follows:
Wherein θ represents all the learnable parameters in the long-short-time memory network that can be quantified efficiently; theta-related The j-th eigenvalue of the hidden layer representing t time steps of the i-th sample, and m represents the number of samples in the back propagation optimization process;
According to a preferred embodiment of the present invention, the parameter updating includes:
Updating all the learnable parameters in the high-efficiency quantifiable long-short-term memory network according to the formula (29) from the calculated loss values:
wherein μ is the learning rate; θ v represents all the learnable parameters in the long-short-time memory network that can be efficiently quantized at the v-th iteration, Is the gradient value of the loss function E (θ v) to θ v; if v=N max,Nmax=200,Nmax is the set maximum iteration number, stopping the iterative updating by the network, and fixing the parameter value of the weight matrix; otherwise, let v add 1 and continue to calculate the gate and state parameter value, and carry out iterative update.
According to the invention, the electroencephalogram signal characteristic extraction process comprises the following steps:
Performing electroencephalogram signal feature extraction by adopting a single-layer convolutional neural network, wherein the single-layer convolutional neural network comprises 8 single-channel one-dimensional convolutional kernels with the convolutional kernel length of 5; the single-layer convolutional neural network maps the original electroencephalogram signal of each time step into an electroencephalogram signal characteristic with a dimension of 1024.
An abnormal electroencephalogram signal detection system based on a long-short-term memory network FPGA hardware accelerator capable of being quantified efficiently comprises:
a data acquisition module configured to: collecting an electroencephalogram signal to be detected through an electroencephalogram amplifier and an A/D converter;
a feature extraction module configured to: extracting features of the electroencephalogram signals to be detected, and mapping the original electroencephalogram signals into feature vectors with certain dimensions;
An abnormal electroencephalogram signal detection module configured to: inputting the feature vector into a long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently, dequantizing the output value of the long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently to obtain the feature of the floating point number after dequantization, and outputting the inspection result (abnormal electroencephalogram or normal electroencephalogram) after mapping by a softmax function;
an electroencephalogram abnormality alarm module configured to: and alarming the detected abnormal electroencephalogram according to the class label output by the abnormal electroencephalogram detection module.
The beneficial effects of the invention are as follows:
The floating point data format parameters of the long-short-time memory network which can be quantized efficiently are quantized into signed integer parameters with low bit width, so that the memory occupied space of the long-short-time memory network can be remarkably reduced, the operation power consumption of the network is reduced, the deployment and efficient operation of the long-short-time memory network on low-power-consumption edge hardware equipment are facilitated, and the real-time processing and response are promoted. In addition, the long-short-time memory network capable of being quantized efficiently provided by the invention can finish quantization without carrying out data calibration in the traditional quantization method and any extra data, thereby enhancing the flexibility of long-short-time memory network quantization.
Drawings
FIG. 1 is a schematic diagram of a high-efficiency quantifiable long and short memory network FPGA hardware accelerator;
FIG. 2 is a schematic diagram of a high-efficiency quantifiable long-short-term memory unit with floating-point parameters according to the present invention;
FIG. 3 is a schematic diagram of a quantized long-short-term memory unit with high efficiency after parameter quantization according to the present invention;
FIG. 4 is a schematic diagram of an efficient quantifiable long-short-time memory network consisting of a plurality of sequentially connected efficient quantifiable long-short-time memory units;
FIG. 5 is a schematic diagram of a training process flow of the long and short term memory network of the present invention;
FIG. 6 is a schematic diagram of ten-fold cross-validation accuracy at different quantization bit widths;
FIG. 7 is a schematic diagram of an abnormal EEG signal detection system based on a high-efficiency quantifiable long-short-term memory network FPGA hardware accelerator;
Detailed Description
The invention is further illustrated in the following drawings and examples, to which the invention is not limited;
Example 1
A method for accelerating the FPGA hardware of a long-short-time memory network, which can be quantized efficiently, runs in an FPGA hardware accelerator, as shown in figure 1, wherein the FPGA hardware accelerator adopts Xilinx Zynq Zedboard, and comprises an ARM processor unit PS of Zynq-7000 SoC and a 7-series programmable logic unit PL; comprising the following steps:
the long-short time memory network capable of being quantized efficiently consists of a plurality of long-short time memory units which are connected in sequence and capable of being quantized efficiently;
Performing parameter quantization on the trained long-short-time memory network with floating point parameters and capable of being quantized efficiently to obtain a quantized long-short-time memory network;
compiling the quantized long-short-time memory network into Verilog codes, generating IP cores for acceleration, and deploying the IP cores in a programmable logic unit PL;
The ARM processor unit PS is responsible for data preparation and preprocessing, and comprises the steps of receiving input data, loading weights and configuration; also responsible for softmax mapping operations;
The quantized configuration parameters of each unit of the long-short-time memory network comprise the number of input/output channels, the dimension of feature vectors, the number of neurons and quantization coefficients, and the quantized parameters are transmitted to a programmable logic unit PL at the PS end of an ARM processor unit through a simplified advanced extension interface (AXI Lite) bus;
the input signals, quantization bias and weights are transmitted to the programmable logic unit PL at the ARM processor unit PS end via an advanced extension interface (AXI) bus; after the quantized long-short-time memory unit finishes calculation at the programmable logic unit PL end, the output data is transmitted back to the ARM processor unit PS end through the same AXI bus;
parameter configuration of a next quantized long-short-time memory unit is carried out, a new round of feature vector calculation is started, and the process is repeated until the whole quantized long-short-time memory network calculation is completed;
And outputting a calculation result.
The quantized long-short-time memory units share a quantization lookup table, and the calculated quantization lookup table is stored in a Block Random Access Memory (BRAM) of the programmable logic PL; all weight data are stored in the DDR-3 memory of the ARM processor unit PS. Table 1 shows in detail the resource utilization of the designed long and short memory network FPGA hardware acceleration method running on Xilinx ZynqZedboard. The power consumption on the chip of the whole hardware acceleration system is 1.778W, wherein the ARM processor unit PS part occupies 1.542W, and the power consumption of the programmable logic unit PL part is 0.236W.
TABLE 1
Example 2
According to embodiment 1, the method for accelerating the hardware of the FPGA of the long-short-time memory network is characterized in that:
Segment activation function of long-short-time memory unit capable of being quantized efficiently
Wherein T a is an adjustable Sigmoid function approximation parameter;
Ta=2.5;
the calculation process of the cell state parameters in the long-short-time memory unit capable of being quantified efficiently comprises the following steps:
Wherein, the ≡is Hadamard product, c t is the cell state of the t time step in the long and short time memory unit which can be quantized efficiently, i t、ft、gt is the input gate control value, forget gate control value and cell candidate unit value of the t time step in the long and short time memory unit which can be quantized efficiently. The structure of the high-efficiency quantized long-short-time memory unit with floating point number parameters is shown in fig. 2, the structure of the high-efficiency quantized long-short-time memory unit with low-order wide integer parameters is shown in fig. 3, and the high-efficiency quantized long-short-time memory network formed by a plurality of high-efficiency quantized long-short-time memory units connected in sequence is shown in fig. 4.
The long-short time memory network capable of being quantized with high efficiency is different from the traditional long-short time memory network in that:
The segment activation function of the long-short-time memory network which can be quantized efficiently in the formula (1) is smaller in calculated amount and easier to quantize than the Sigmoid activation function in the traditional long-short-time memory network;
The cell state parameter calculation process of the long-short time memory network capable of being quantified efficiently in the formula (2) is increased by utilizing the piecewise activation function compared with that of the traditional long-short time memory network And (3) an activated process.
The parameter quantization process of the trained long-short-time memory network capable of being quantized efficiently comprises the following steps:
(1) Initializing quantization parameters, including:
let the overall weight matrix w= [ W f;Wi;Wo;Wg ], then
Let the total cyclic weight matrix r= [ R f;Ri;Ro;Rg ], then
Initializing all weight quantization bit widths B WR =8;
Initializing a piecewise function lookup table quantization bit width B LUT =4;
initializing a fixed point digital width B Fix =24;
Initializing a weight quantization factor
Initializing a weight matrix quantization scale Q W=max(|W|)/MWR;
initializing a cyclic weight matrix quantization scale Q R=max(|R|)/MWR; wherein, the max (·) function is a maximum value function, and the |·| function is an absolute value function;
initializing an input quantization scale Q X=1/MWR;
Initializing hidden unit quantization scales
Initializing input weight quantization scales
Initializing input cyclic weight quantization scales
Initializing candidate unit quantization scales
(2) Generating a quantization look-up table; comprising the following steps:
First, with-T a as the start value, T a as the end value, the increment (i.e., step size) is set to Generating an arithmetic sequence I d;
Then, a piecewise function is calculated Is provided for the quantization look-up table: /(I)Here, the ,Ld={0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,67,71,75,79,83,87,91,95,99,103,107,111,115,119,123,127}
(3) Quantizing the weight matrix; comprising the following steps:
the quantized forgetting gate weight matrix is shown as formula (3):
the quantization input gate weight matrix is as shown in formula (4):
the quantization output gate weight matrix is as shown in formula (5):
the cell candidate weight matrix is quantified as shown in formula (6):
the quantized forgetting gate cyclic weight matrix is shown in formula (7):
the quantization is input into the gate cyclic weight matrix, as shown in formula (8):
The quantization output gate cycle weight matrix is as shown in formula (9):
quantifying a cell candidate cyclic weight matrix as shown in formula (10):
quantifying the forgetting gate bias as shown in equation (11):
the input gate bias is quantized as shown in equation (12):
The quantization output gate bias is as shown in equation (13):
quantifying the cell candidate bias as shown in formula (14):
(4) Calculating quantized gate control and state parameter values; comprising the following steps:
calculating a quantized forget gate gating value:
Calculating a quantized input gate control value:
Calculating a quantized output gate control value:
/>
Calculating a quantized cell candidate unit value:
calculating quantitative cell state parameter values:
calculating quantized hidden state parameter values:
Example 3
A long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently comprises an ARM processor unit PS and a programmable logic unit PL of Zynq-7000 SoC which are connected through an advanced extension interface (AXI Lite) bus, and the long-short-time memory network FPGA hardware acceleration method capable of being quantized efficiently is achieved.
Example 4
An abnormal electroencephalogram signal detection method based on a long-short-term memory network FPGA hardware accelerator capable of being quantized efficiently comprises the following steps:
A data acquisition module consisting of an electroencephalogram amplifier and an A/D converter is adopted to acquire an electroencephalogram signal to be detected;
training a long-short-time memory network capable of being quantized efficiently;
The trained long-short-time memory network capable of being quantized efficiently is deployed on the FPGA hardware accelerator of the long-short-time memory network capable of being quantized efficiently, which is described in the embodiment 3;
extracting characteristics of an electroencephalogram signal to be detected;
inputting the characteristics of the electroencephalogram signals to be detected into a long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently to obtain output values;
Inverse quantization is carried out on the output value of the long-short-term memory network FPGA hardware accelerator which can be quantized efficiently, floating point number characteristics after inverse quantization are obtained, and inspection results (abnormal electroencephalogram or normal electroencephalogram) are output after softmax function mapping;
If the detection result is abnormal electroencephalogram, alarming is carried out through an alarm module.
As shown in fig. 5, the training process of the long-short-time memory network capable of being quantified efficiently includes:
firstly, acquiring electroencephalogram data for training by a data acquisition module consisting of an electroencephalogram amplifier and an A/D converter;
secondly, initializing parameters and weight matrixes of floating point formats of all units according to the set quantity of units in the long-short-time memory network capable of being quantized efficiently, wherein the parameters and weight matrixes comprise:
cell state parameter at initial time step t=0 All initialized to 0;
hidden state parameter when initial time step t=0 All initialized to 0;
forgetting the gate weight matrix Input gate weight matrix/>Outputting a gate weight matrixCell candidate weight matrix/>Initializing to be a random number;
Cycle weight matrix for forgetting gate Input gate cyclic weight matrix/>Output gate cycle weight matrix/>Cell candidate cyclic weight matrix/>Initializing to be a random number; /(I)
Biasing a forgetting doorInput gate bias/>Output gate bias/>Cell candidate biasAll initialized to 0;
Then, according to the parameter and the weight matrix, calculating various gating and state parameter values, including:
For the data u t of the t-th time step in the extracted electroencephalogram signal characteristics, calculating floating point parameter values of each gating according to the following formula:
Calculating a forget gate gating value f t as shown in formula (21):
wherein h t-1 is a hidden state parameter of the t-1 time step;
the input gate-control value i t is calculated as shown in equation (22):
the output gate gating value o t is calculated as shown in equation (23):
calculating a cell candidate unit value g t as shown in formula (24):
Calculating a cell state parameter c t as shown in formula (25):
Calculating a hidden state parameter h t as shown in formula (26):
ht=ot⊙ct (26)
Then, calculating a loss function of the high-efficiency quantized long-short-time memory network formed by a plurality of high-efficiency quantized long-short-time memory units which are sequentially connected, and carrying out repeated iterative updating on floating point parameters according to the value of the loss function and combining a counter-propagation algorithm; after the maximum iteration times are reached, the long-short-time memory network capable of efficiently quantizing stops iteration updating, floating point parameter values of all parameters and weight matrixes are fixed, and the floating point parameter values are stored.
Calculating the floating point number characteristics after inverse quantization; as shown in formula (27):
Wherein: for the quantized integer feature output by the long-short-time memory network at the t-th time step, Q X is the input quantization scale.
Calculating a loss value; comprising the following steps:
according to the output hidden layer characteristic h t, calculating a current loss value through a loss function E so as to execute back propagation optimization; the loss function E is defined as follows:
Wherein θ represents all the learnable parameters in the long-short-time memory network that can be quantified efficiently; theta-related The j-th eigenvalue of the hidden layer representing t time steps of the i-th sample, and m represents the number of samples in the back propagation optimization process; /(I)
Parameter updating, comprising:
Updating all the learnable parameters in the high-efficiency quantifiable long-short-term memory network according to the formula (29) from the calculated loss values:
wherein μ is the learning rate; θ v represents all the learnable parameters in the long-short-time memory network that can be efficiently quantized at the v-th iteration, Is the gradient value of the loss function E (θ v) to θ v; if v=N max,Nmax=200,Nmax is the set maximum iteration number, stopping the iterative updating by the network, and fixing the parameter value of the weight matrix; otherwise, let v add 1 and continue to calculate the gate and state parameter value, and carry out iterative update.
The electroencephalogram signal characteristic extraction process comprises the following steps:
Feature extraction is carried out by adopting a single-layer convolutional neural network, wherein the single-layer convolutional neural network comprises 8 single-channel one-dimensional convolutional kernels with the convolutional kernel length of 5; the single-layer convolutional neural network maps the original electroencephalogram signal of each time step into an electroencephalogram signal characteristic with a dimension of 1024.
Floating point number featureAnd outputting the category label after mapping by the softmax function.
The quantized long-short-term memory network is tested on abnormal electroencephalogram classification data by adopting ten-fold cross validation, and different network performances are shown in figure 6. All the accuracies in fig. 6 are the average accuracies of a ten-fold cross-validation. For reference, the accuracy of the original floating point parameter traditional long-short time memory network is 97.67%. It can be seen that the network performance at 5-bit quantization can be improved by about 1% over the conventional long-short-term memory network. Moreover, the quantization lookup table of the long-short-time memory network capable of being quantized efficiently can quantize as low as 2 bits without losing the reasoning precision of the model.
Example 5
An abnormal electroencephalogram signal detection system based on a long-short-term memory network FPGA hardware accelerator capable of being quantified with high efficiency is shown in fig. 7, and comprises:
a data acquisition module configured to: collecting an electroencephalogram signal to be detected through an electroencephalogram amplifier and an A/D converter;
a feature extraction module configured to: extracting features of the electroencephalogram signals to be detected, and mapping the original electroencephalogram signals into feature vectors with certain dimensions;
An abnormal electroencephalogram signal detection module configured to: inputting the feature vector into a long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently, dequantizing the output value of the long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently to obtain the feature of the floating point number after dequantization, and outputting the inspection result (abnormal electroencephalogram or normal electroencephalogram) after mapping by a softmax function;
an electroencephalogram abnormality alarm module configured to: and alarming the detected abnormal electroencephalogram according to the class label output by the abnormal electroencephalogram detection module.
Claims (10)
1. The method is characterized in that the method runs in an FPGA hardware accelerator, and the FPGA hardware accelerator comprises an ARM processor unit PS and a programmable logic unit PL; comprising the following steps:
the long-short time memory network capable of being quantized efficiently consists of a plurality of long-short time memory units which are connected in sequence and capable of being quantized efficiently;
Performing parameter quantization on the trained long-short-time memory network with floating point parameters and capable of being quantized efficiently to obtain a quantized long-short-time memory network;
compiling the quantized long-short-time memory network into Verilog codes, generating IP cores for acceleration, and deploying the IP cores in a programmable logic unit PL;
the ARM processor unit PS is responsible for data preparation and preprocessing work and also for softmax mapping operation;
The quantized configuration parameters of each unit of the long-short-time memory network comprise the number of input or output channels, the dimension of feature vectors, the number of neurons and quantization coefficients, and the quantized parameters are firstly transmitted to a programmable logic unit PL at the PS end of an ARM processor unit through a simplified advanced expansion interface bus;
The input signals, quantization bias and weight are transmitted to the programmable logic unit PL at the ARM processor unit PS end through the advanced expansion interface bus; after the quantized long-short-time memory unit finishes calculation at the programmable logic unit PL end, the output data is transmitted back to the ARM processor unit PS end through the same AXI bus;
parameter configuration of a next quantized long-short-time memory unit is carried out, a new round of feature vector calculation is started, and the process is repeated until the whole quantized long-short-time memory network calculation is completed;
And outputting a calculation result.
2. The method for accelerating the hardware of the FPGA of the long-short-time memory network capable of being quantized efficiently according to claim 1, wherein the piecewise activation function of the long-short-time memory unit capable of being quantized efficiently is characterized in thatThe method comprises the following steps:
Wherein T a is an adjustable Sigmoid function approximation parameter;
further preferably, T a =2.5.
3. The method for accelerating the hardware of the long-short-time memory network FPGA, which is capable of being quantized efficiently, according to claim 2, is characterized in that the calculation process of the cell state parameters in the long-short-time memory unit capable of being quantized efficiently is as follows:
Wherein, the ≡is Hadamard product, c t is the cell state of the t time step in the long and short time memory unit which can be quantized efficiently, i t、ft、gt is the input gate control value, forget gate control value and cell candidate unit value of the t time step in the long and short time memory unit which can be quantized efficiently.
4. The method for accelerating the hardware of the FPGA of the long-short-time memory network capable of being quantized efficiently according to claim 1, wherein the parameter quantization process of the long-short-time memory network capable of being quantized efficiently after training comprises the following steps:
(1) Initializing quantization parameters, including:
let the overall weight matrix w= [ W f;Wi;Wo;Wg ], then
Let the total cyclic weight matrix r= [ R f;Ri;Ro;Rg ], then
Initializing all weight quantization bit widths B WR =8;
Initializing a piecewise function lookup table quantization bit width B LUT =4;
initializing a fixed point digital width B Fix =24;
Initializing a weight quantization factor
Initializing a weight matrix quantization scale Q W=max(|W|)/MWR;
initializing a cyclic weight matrix quantization scale Q R=max(|R|)/MWR; wherein, the max (·) function is a maximum value function, and the |·| function is an absolute value function;
initializing an input quantization scale Q X=1/MWR;
Initializing hidden unit quantization scales
Initializing input weight quantization scales
Initializing input cyclic weight quantization scales
Initializing candidate unit quantization scales
(2) Generating a quantization look-up table; comprising the following steps:
First, with-T a as the start value, T a as the end value, the increment is set Generating an arithmetic sequence I d;
Then, a piecewise function is calculated Is provided for the quantization look-up table: /(I)
(3) Quantizing the weight matrix; comprising the following steps:
the quantized forgetting gate weight matrix is shown as formula (3):
the quantization input gate weight matrix is as shown in formula (4):
the quantization output gate weight matrix is as shown in formula (5):
the cell candidate weight matrix is quantified as shown in formula (6):
the quantized forgetting gate cyclic weight matrix is shown in formula (7):
the quantization is input into the gate cyclic weight matrix, as shown in formula (8):
The quantization output gate cycle weight matrix is as shown in formula (9):
quantifying a cell candidate cyclic weight matrix as shown in formula (10):
quantifying the forgetting gate bias as shown in equation (11):
the input gate bias is quantized as shown in equation (12):
The quantization output gate bias is as shown in equation (13):
quantifying the cell candidate bias as shown in formula (14):
(4) Calculating quantized gate control and state parameter values; comprising the following steps:
calculating a quantized forget gate gating value:
Calculating a quantized input gate control value:
Calculating a quantized output gate control value:
Calculating a quantized cell candidate unit value:
calculating quantitative cell state parameter values:
calculating quantized hidden state parameter values:
5. The long-short-time memory network FPGA hardware accelerator is characterized by comprising an ARM processor unit PS and a programmable logic unit PL which are connected through an advanced expansion interface bus, and the long-short-time memory network FPGA hardware acceleration method capable of being quantized efficiently is realized.
6. An abnormal electroencephalogram signal detection method based on a long-short-term memory network FPGA hardware accelerator capable of being quantized with high efficiency is characterized by comprising the following steps:
A data acquisition module consisting of an electroencephalogram amplifier and an A/D converter is adopted to acquire an electroencephalogram signal to be detected;
training a long-short-time memory network capable of being quantized efficiently;
the trained long-short-time memory network capable of being quantized efficiently is deployed on the FPGA hardware accelerator of the long-short-time memory network capable of being quantized efficiently;
extracting characteristics of an electroencephalogram signal to be detected;
inputting the characteristics of the electroencephalogram signals to be detected into a long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently to obtain output values;
Inverse quantization is carried out on the output value of the long-short-term memory network FPGA hardware accelerator which can be quantized efficiently, floating point number characteristics after inverse quantization are obtained, and inspection results are output after softmax function mapping;
If the detection result is abnormal electroencephalogram, alarming is carried out through an alarm module.
7. The method for detecting abnormal electroencephalogram signals based on the high-efficiency quantifiable long-short-time memory network FPGA hardware accelerator according to claim 6, wherein the training process of the high-efficiency quantifiable long-short-time memory network comprises the following steps:
firstly, acquiring electroencephalogram data for training by a data acquisition module consisting of an electroencephalogram amplifier and an A/D converter;
secondly, initializing parameters and weight matrixes of floating point formats of all units according to the set quantity of units in the long-short-time memory network capable of being quantized efficiently, wherein the parameters and weight matrixes comprise:
cell state parameter at initial time step t=0 All initialized to 0;
hidden state parameter when initial time step t=0 All initialized to 0;
forgetting the gate weight matrix Input gate weight matrix/>Outputting a gate weight matrixCell candidate weight matrix/>Initializing to be a random number;
Cycle weight matrix for forgetting gate Input gate cyclic weight matrix/>Output gate cycle weight matrix/>Cell candidate cyclic weight matrix/>Initializing to be a random number;
Biasing a forgetting door Input gate bias/>Output gate bias/>Cell candidate biasAll initialized to 0;
Then, according to the parameter and the weight matrix, calculating various gating and state parameter values, including:
For the data u t of the t-th time step in the extracted electroencephalogram signal characteristics, calculating floating point parameter values of each gating according to the following formula:
Calculating a forget gate gating value f t as shown in formula (21):
wherein h t-1 is a hidden state parameter of the t-1 time step;
the input gate-control value i t is calculated as shown in equation (22):
the output gate gating value o t is calculated as shown in equation (23):
calculating a cell candidate unit value g t as shown in formula (24):
Calculating a cell state parameter c t as shown in formula (25):
Calculating a hidden state parameter h t as shown in formula (26):
ht=ot⊙ct (26)
then, calculating a loss function of the high-efficiency quantized long-short-time memory network formed by a plurality of high-efficiency quantized long-short-time memory units which are sequentially connected, and carrying out repeated iterative updating on floating point parameters according to the value of the loss function and combining a counter-propagation algorithm; after the maximum iteration times are reached, stopping iteration updating by the long-short-time memory network capable of efficiently quantizing, fixing floating point parameter values of each parameter and weight matrix, and storing;
Further preferably, the inverse quantized floating point number features are calculated; as shown in formula (27):
Wherein: for the quantized integer feature output by the long-short-time memory network at the t-th time step, Q X is the input quantization scale.
8. The method for detecting abnormal electroencephalogram signals based on the high-efficiency quantifiable long-short-term memory network FPGA hardware accelerator, which is characterized by calculating a loss value; comprising the following steps:
calculating a current loss value through a loss function E according to the output hidden layer characteristic h t; the loss function E is defined as follows:
Wherein θ represents all the learnable parameters in the long-short-time memory network that can be quantified efficiently; theta-related The j-th eigenvalue of the hidden layer representing t time steps of the i-th sample, and m represents the number of samples in the back propagation optimization process;
further preferably, the parameter updating includes:
Updating all the learnable parameters in the high-efficiency quantifiable long-short-term memory network according to the formula (29) from the calculated loss values:
wherein μ is the learning rate; θ v represents all the learnable parameters in the long-short-time memory network that can be efficiently quantized at the v-th iteration, Is the gradient value of the loss function E (θ v) to θ v; if v=N max,Nmax=200,Nmax is the set maximum iteration number, stopping the iterative updating by the network, and fixing the parameter value of the weight matrix; otherwise, let v add 1 and continue to calculate the gate and state parameter value, and carry out iterative update.
9. The method for detecting abnormal electroencephalogram signals based on the high-efficiency quantized long-short-term memory network FPGA accelerator according to claim 6, wherein the electroencephalogram signal feature extraction process comprises the following steps:
Feature extraction is carried out by adopting a single-layer convolutional neural network, wherein the single-layer convolutional neural network comprises 8 single-channel one-dimensional convolutional kernels with the convolutional kernel length of 5; the single-layer convolutional neural network maps the original electroencephalogram signal of each time step into an electroencephalogram signal characteristic with a dimension of 1024.
10. An abnormal electroencephalogram signal detection system based on a long-short-term memory network FPGA hardware accelerator capable of being quantified efficiently is characterized by comprising:
a data acquisition module configured to: collecting an electroencephalogram signal to be detected through an electroencephalogram amplifier and an A/D converter;
a feature extraction module configured to: extracting features of the electroencephalogram signals to be detected, and mapping the original electroencephalogram signals into feature vectors with certain dimensions;
An abnormal electroencephalogram signal detection module configured to: inputting the feature vector into a long-short-time memory network FPGA hardware accelerator capable of being quantized efficiently, dequantizing the output value of the long-short-time memory network FPGA hardware accelerator to obtain dequantized floating point number features, and outputting a checking result after softmax function mapping;
an electroencephalogram abnormality alarm module configured to: and alarming the detected abnormal electroencephalogram according to the class label output by the abnormal electroencephalogram detection module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410548521.4A CN118228789A (en) | 2024-05-06 | 2024-05-06 | Method and system for detecting abnormal electroencephalogram signals by accelerating long-short-term memory network FPGA hardware and capable of being quantized efficiently |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410548521.4A CN118228789A (en) | 2024-05-06 | 2024-05-06 | Method and system for detecting abnormal electroencephalogram signals by accelerating long-short-term memory network FPGA hardware and capable of being quantized efficiently |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118228789A true CN118228789A (en) | 2024-06-21 |
Family
ID=91499613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410548521.4A Pending CN118228789A (en) | 2024-05-06 | 2024-05-06 | Method and system for detecting abnormal electroencephalogram signals by accelerating long-short-term memory network FPGA hardware and capable of being quantized efficiently |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118228789A (en) |
-
2024
- 2024-05-06 CN CN202410548521.4A patent/CN118228789A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20200265301A1 (en) | Incremental training of machine learning tools | |
US20200210840A1 (en) | Adjusting precision and topology parameters for neural network training based on a performance metric | |
US20200264876A1 (en) | Adjusting activation compression for neural network training | |
CN105488563A (en) | Deep learning oriented sparse self-adaptive neural network, algorithm and implementation device | |
CN112200296B (en) | Network model quantization method and device, storage medium and electronic equipment | |
CN112233675B (en) | Voice wake-up method and system based on separated convolutional neural network | |
CN111832228A (en) | Vibration transmission system based on CNN-LSTM | |
CN115759237A (en) | End-to-end deep neural network model compression and heterogeneous conversion system and method | |
CN114239861A (en) | Model compression method and system based on multi-teacher combined guidance quantification | |
CN114170512A (en) | Remote sensing SAR target detection method based on combination of network pruning and parameter quantification | |
CN113988357A (en) | High-rise building wind-induced response prediction method and device based on deep learning | |
Jakaria et al. | Comparison of classification of birds using lightweight deep convolutional neural networks | |
CN114295967A (en) | Analog circuit fault diagnosis method based on migration neural network | |
CN110288002B (en) | Image classification method based on sparse orthogonal neural network | |
Ullah et al. | L2L: A highly accurate Log_2_Lead quantization of pre-trained neural networks | |
US20230008856A1 (en) | Neural network facilitating fixed-point emulation of floating-point computation | |
CN118228789A (en) | Method and system for detecting abnormal electroencephalogram signals by accelerating long-short-term memory network FPGA hardware and capable of being quantized efficiently | |
CN115564987A (en) | Training method and application of image classification model based on meta-learning | |
CN113804833A (en) | Universal electronic nose drift calibration method based on convex set projection and extreme learning machine | |
CN112285565A (en) | Method for predicting SOH (State of health) of battery by transfer learning based on RKHS (remote keyless entry) domain matching | |
WO2020234602A1 (en) | Identifying at least one object within an image | |
US20240028895A1 (en) | Switchable one-sided sparsity acceleration | |
Lin et al. | Optimization for a neural network based on input-vectors correlation and its application to a truck scale | |
CN116959489B (en) | Quantization method and device for voice model, server and storage medium | |
CN117934963B (en) | Gas sensor drift compensation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |