CN112529799A - Optical aberration distortion correction system based on FPGA convolutional neural network structure - Google Patents

Optical aberration distortion correction system based on FPGA convolutional neural network structure Download PDF

Info

Publication number
CN112529799A
CN112529799A CN202011418118.8A CN202011418118A CN112529799A CN 112529799 A CN112529799 A CN 112529799A CN 202011418118 A CN202011418118 A CN 202011418118A CN 112529799 A CN112529799 A CN 112529799A
Authority
CN
China
Prior art keywords
module
neural network
convolutional neural
convolution
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011418118.8A
Other languages
Chinese (zh)
Inventor
刘国栋
胡流森
吴小龑
吴凌远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Fluid Physics of CAEP
Original Assignee
Institute of Fluid Physics of CAEP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Fluid Physics of CAEP filed Critical Institute of Fluid Physics of CAEP
Priority to CN202011418118.8A priority Critical patent/CN112529799A/en
Publication of CN112529799A publication Critical patent/CN112529799A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an optical aberration distortion correction system based on an FPGA (field programmable gate array) convolutional neural network structure, which comprises a detection camera, a correction component and an FPGA convolutional neural network model, wherein the detection camera is a CDD (compact disc-division) camera, and the correction component comprises a deformable mirror, a convex lens and a semi-transparent semi-reflecting mirror; the FPGA convolutional neural network model comprises a convolution module, a nonlinear function sigmoid module, a pooling module, an intermediate quantity storage module and a full connection layer module, data are subjected to convolution operation and pooling operation through the convolution module and the pooling module, and activated connection between layers is achieved through the nonlinear function sigmoid module. Compared with the prior art, the optical aberration distortion correction system provided by the invention has the advantages of low power consumption, high running speed, high efficiency and the like.

Description

Optical aberration distortion correction system based on FPGA convolutional neural network structure
Technical Field
The invention relates to the technical field of optics, in particular to an optical aberration distortion correction system based on an FPGA convolutional neural network structure.
Background
Turbulence effects in the atmosphere can cause light intensity fluctuation, light spot drift and the like when light is transmitted in the atmosphere. These effects can lead to a decrease in the concentration of laser energy transmitted in the far-field transmission of laser light, resulting in a decrease in the resolution of the optical imaging system. Improving the beam quality of the laser system and improving the resolving power of the optical imaging system requires correcting aberration distortion caused by atmospheric turbulence.
The convolutional neural network can learn the target imaging and the turbulent aberration distortion information containing the turbulent aberration information, and the turbulent aberration distortion information is obtained by acquiring the target imaging information, so that the correction of the turbulent aberration distortion is realized. But the convolution neural network has a large calculation amount and is difficult to meet the requirement of high-frequency phase distortion caused by atmospheric turbulence.
Disclosure of Invention
The invention aims to overcome the defects that the calculation amount of a convolutional neural network in the prior art is large and the requirement of high-frequency phase distortion caused by atmospheric turbulence is difficult to meet, and provides an optical aberration distortion correction system based on an FPGA convolutional neural network structure.
The purpose of the invention is mainly realized by the following technical scheme:
an optical aberration distortion correction system based on an FPGA (field programmable gate array) convolutional neural network structure comprises a detection camera, a correction component and an FPGA convolutional neural network model, wherein the detection camera is a CDD (complementary digital display) camera, and the correction component comprises a deformable mirror, a convex lens and a semi-transparent semi-reflective mirror; the FPGA convolutional neural network model comprises a convolution module, a nonlinear function sigmoid module, a pooling module, an intermediate quantity storage module and a full connection layer module, data are subjected to convolution operation and pooling operation through the convolution module and the pooling module, and activated connection between layers is achieved through the nonlinear function sigmoid module.
The technical scheme does not depend on an optical beacon and a wavefront sensor, and realizes wavefront sensing and correction by establishing a deep neural network facing to the real physical process of atmospheric turbulence and directly imaging a target through an optical system, thereby finally realizing detection and reconstruction of wavefront aberration. The technical scheme adopts an FPGA convolutional neural network model and utilizes a convolution module to carry out convolution operation; the nonlinear function sigmoid module is used for realizing a sigmoid function in the convolutional neural network; the pooling module is used for further pooling the output of the convolution layer to realize the dimensionality reduction of characteristic output; the intermediate quantity storage module is used for storing intermediate variables in the convolutional neural network calculation; and the full connection layer module is used for realizing the calculation function of the convolutional neural network sensor and realizing output.
What needs to be explained is that the detection camera in the optical aberration distortion correction system of the technical scheme realizes the detection imaging of the target object; the correction component mainly compensates aberration distortion of atmospheric turbulence, the FPGA is used as a carrier to realize a convolutional neural network model, and the control voltage of the deformable mirror is calculated by adopting a depth neural network model through calculating pictures collected by a detection camera; the detection camera is 2048 multiplied by 2048 pixels, the frame frequency is 90fps, the deformable mirror is a piezoelectric deformable mirror and comprises 40 actuators, the highest refresh rate is 4kHz, the deformable mirror comprises a mirror with a pupil diameter of 10mm, a silver film with a protective layer, and the highest refresh rate is 4 kHz.
Furthermore, the network structure of the FPGA convolutional neural network model is divided into 9 layers, each layer comprises 3 convolutional modules, 1 bit adding module and 1 convolutional module which are connected in sequence, and different modules of each layer are connected in a pooling or interpolation mode.
The FPGA convolutional neural network model designed by the technical scheme comprises 9 different processing layers, the different layers are connected in a pooling or interpolation mode, and the working frequency of the system in the technical scheme is 200MHz, so that the processing frame frequency of the system is as follows: 200M/636315-314 fps, when the operating frequency is 300MHz, then: 471 fps. When the working frequency is above 320MHz, the following can be achieved: 502fps, the calculation speed is obviously accelerated. It should be noted that, in the present technical solution, each internal operation layer of each layer can implement complete parallelism. Preferably, in this embodiment, the convolution modules at layers 1 and 9 include 16 channels, the convolution modules at layers 2 and 8 include 32 channels, the convolution module at layer 3 includes 64 channels, the convolution modules at layers 4 and 6 include 128 channels, and the convolution modules at layers 5 and 7 include 256 channels.
Furthermore, layers 1 to 4 of the FPGA convolutional neural network model are down-sampling layers, layers 5 are bridging layers, layers 6 to 9 are up-sampling layers, up-sampling is realized by transposition convolution, the size of the middle layer is enlarged to be one time of that of the previous layer through one-time up-sampling operation, and the number of channels is controlled to be reduced by half; the 1 st to 5 th layers are connected through maximum pooling, the 6 th to 9 th layers are connected through up-convolution, the 1 st to 4 th layers and the 6 th to 9 th layers are connected through residual errors in a one-to-one correspondence mode, part of middle layers in the down-sampling process are copied to the up-sampling layer to participate in the up-sampling process, and the output of the 6 th to 9 th layers is subjected to one-time convolution operation to obtain a final output image.
Further, during maximum pooling and down-sampling, a maximum is determined as a result from 4 pixels at a time; in the above-mentioned application process, a line buffer structure is adopted, and the required result can be obtained by means of calculation of adjacent pixels.
For the operation of maximum pooling or down-sampling, the technical scheme determines a maximum result from 4 pixels each time in a mode of 4 input comparators; for upsampling, a structure similar to a convolution line buffer is adopted, and a required result is obtained through calculation of adjacent pixels. Preferably, by setting the comparator to perform the Relu operation, BN can be resolved by means of a look-up table.
Further, the convolution module is composed of 3 row registers with the length of 28, 3 row registers with the length of 12 and a multiplication and addition array of 3 multiplied by 3.
The row registers of the convolution module are connected end to end, and the tail parts of the row registers are connected to one row of the multiplication and addition array through the data selector. Each structural unit of the multiplication and addition array comprises 2 registers and 1 multiplier, wherein the 2 registers respectively store one element of a convolution kernel and one pixel of an input image. This structure can perform convolution operations with convolution kernel sizes of 3 × 3 when images of sizes 28 × 28 and 12 × 12, respectively, are input.
Further, the non-linear function sigmoid module stores a sigmoid function value corresponding to an argument in a ROM or a RAM in advance, wherein the argument is used as an address input, and the function value is used as an output of the module, so that the sigmoid function is realized.
Further, the pooling module is composed of 2 row registers with a length of 24, 2 row registers with a length of 8 and a multiply-add array of 2 × 2.
In the technical scheme, the row registers of the pooling module are connected end to end, and the tail parts of the row registers are connected to one row of the multiplication and addition array through the data selector. Each structural unit of the multiplication and addition array comprises 2 registers and 1 multiplier, wherein the 2 registers respectively store one element of a convolution kernel and one pixel of an input image. The module may pool 24 × 24 output feature maps into 12 × 12 feature maps, or may pool 8 × 8 output feature maps into 4 × 4 feature maps.
Furthermore, the intermediate quantity storage module is used for storing the result generated by each pooling module, and after the connection state of the convolution module is changed, the temporarily stored intermediate result is read out again and input to the changed convolution module.
The intermediate quantity storage module in the technical scheme is used for storing intermediate variables generated in the calculation process of the convolutional neural network, and comprises ram and ram _ control, wherein the ram is used for data storage, and the ram _ control is used for storage of the control module. The module stores the results generated after each pooling layer, and after the state changes, i.e., the connection of the convolution module changes, the temporarily stored intermediate results are read out again and input to the changed convolution module.
Furthermore, the full-connection layer module comprises 10 multiply-accumulate units, data and weight parameters are input into the full-connection layer module, and output results of the full-connection layer module can be obtained after 192 clock cycles by using the 10 multiply-accumulate units.
The full connection layer in the technical scheme is used for realizing the calculation function of a single-layer perceptron in the convolutional neural network and completing full connection between 192 inputs and 10 outputs. And inputting the data corresponding to the weight parameters by using 10 multiply accumulators, and obtaining the output result of the full connection layer after 192 clock cycles. Since 10 multiply-accumulate results need to pass through the sigmoid function, only one sigmoid module is used for saving resources. Therefore, 10 multiply-accumulate results need to be temporarily stored, serially input to the sigmoid module, and finally, the fully-connected layer also serially outputs 10 results.
Further, the convolution calculation of the input image in the system adopts a parallel calculation structure of 3x3 cycles, the parallel calculation structure comprises 3 line buffers, 3 registers are arranged after each line buffer, when the input image is input through 1 line buffer, 3 line buffers pre-fetch 3 lines of data to be processed, 3 registers after each line buffer access the first 3 pixels of the line buffer output data simultaneously, and output results of 9 registers are merged; in the data processing process, 3 line buffers continuously pre-fetch data to be processed, all output results of 9 registers are combined through pixel shifting, and the convolution calculation result of the input image is obtained through the combination.
Due to the limitations of FPGA resources, it is not possible to implement fully pipelined hardware for the network architecture described above. After research, the inventor finds that more calculations are performed in a hardware multiplexing mode to be a more effective method than a hardware architecture required for realizing full-flow hardware on a network structure and realizing one layer or even part of calculations in one layer in an FPGA. According to the technical scheme, convolution calculation is adopted in a large number of FPGA convolution neural network models, all convolution kernels are 3x3, and if the convolution of each pixel point can be completed only by 3x3 of cycles according to a serial calculation mode on a general CPU, the calculation time is long. Therefore, the FPGA convolutional neural network model of the technical scheme adopts a parallel computing structure in the convolution computing process, so that 9 multiplications related to each convolution are simultaneously carried out, and the convolutions of a plurality of channels can also be simultaneously carried out according to the condition of resources. For example, if 1 input image is convolved with convolution kernels of 16 channels to generate 16 results, and if the serial operation requires 128 × 3 × 16 × 2359296 times of calculation time, while the parallel calculation structure of the present embodiment requires about 128 × 128+128 × 2 × 16640 times of calculation time if 16 channels are parallel, the time is shortened by about 141 times by the parallel calculation of the present embodiment under the same operating frequency. In addition, the technical scheme also ensures that after the convolution corresponding to one pixel is calculated, the convolution value corresponding to the subsequent pixel can be continuously output through the design of the pipeline structure. That is, when the pipeline is full, each beat can output a convolution calculation result; by designing 3 line buffers, the data to be processed of 3 lines can be prefetched in place, and then the first 3 pixels of each line are accessed simultaneously by combining 9 registers, so that 9 required pixel values can be taken for the next convolution operation after each pixel shifting, and by the structure, after the delay of one pipeline is full, the convolution result can be continuously output. In addition, the proper structure can be designed among different layers and different sub-layers of each layer, so that the water can flow fully, and the overall processing performance is improved. After the system is in a parallel computing structure, the system also comprises a data rearrangement function, and the system also comprises an off-chip memory, wherein the convolution calculation result of the input image is stored in the off-chip memory, and the data rearrangement function is carried out in the off-chip memory; due to the limitation of on-chip resources of the FPGA, an intermediate result of calculation must be written back to an off-chip memory, and the next-stage calculation is read from the off-chip memory; due to the design of the parallel structure, the data to be read may not need one complete picture/data and then one complete picture/data, but need the same part of a plurality of pictures/data at the same time; therefore, through the rearrangement of the data, the data to be processed simultaneously can be arranged on the storage space, and one-time reading is convenient.
In conclusion, compared with the prior art, the invention has the following beneficial effects:
1. the invention does not depend on an optical beacon and a wavefront sensor, realizes wavefront sensing and correction by establishing a deep neural network facing to the real physical process of atmospheric turbulence and directly imaging a target through an optical system, and finally realizes the detection and reconstruction of wavefront aberration; the invention adopts an FPGA convolution neural network model and utilizes a convolution module to carry out convolution operation; the nonlinear function sigmoid module is used for realizing a sigmoid function in the convolutional neural network; the pooling module is used for further pooling the output of the convolution layer to realize the dimensionality reduction of characteristic output; the intermediate quantity storage module is used for storing intermediate variables in the convolutional neural network calculation; and the full connection layer module is used for realizing the calculation function of the convolutional neural network sensor and realizing output.
2. The FPGA convolutional neural network model adopts a parallel computing structure in the convolutional computing process, so that 9 multiplications related to each convolution are simultaneously carried out, and the convolutions of a plurality of channels can also be simultaneously carried out according to the condition of resources; through the design of the pipeline structure, after the convolution corresponding to one pixel is calculated, the convolution value corresponding to the subsequent pixel can be continuously output. Suitable structures can be designed among different layers and different sub-layers of each layer, so that the water can flow fully, and the overall processing performance is improved. .
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is an FPGA convolutional neural network model of the present invention;
FIG. 2 is a parallel computing architecture for convolution computation of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Example 1:
as shown in fig. 1 and fig. 2, the present embodiment includes a detection camera, a correction component and an FPGA convolutional neural network model, where the detection camera is a CDD camera, and the correction component includes a deformable mirror, a convex lens and a half-mirror; the FPGA convolutional neural network model comprises a convolution module, a nonlinear function sigmoid module, a pooling module, an intermediate quantity storage module and a full connection layer module, data are subjected to convolution operation and pooling operation through the convolution module and the pooling module, and activated connection between layers is achieved through the nonlinear function sigmoid module.
Preferably, the network structure of the FPGA convolutional neural network model is divided into 9 layers, each layer includes 3 convolutional modules, 1 bit addition module and 1 convolutional module connected in sequence, and different modules of each layer are connected by pooling or interpolation.
Preferably, the 1 st to 4 th layers of the FPGA convolutional neural network model are down-sampling layers, the 5 th layer is a bridging layer, the 6 th to 9 th layers are up-sampling layers, up-sampling is realized by transposition convolution, the size of the middle layer is expanded to be one time of that of the previous layer through one-time up-sampling operation, and the number of channels is controlled to be reduced by half; the 1 st to 5 th layers are connected through maximum pooling, the 6 th to 9 th layers are connected through up-convolution, the 1 st to 4 th layers and the 6 th to 9 th layers are connected through residual errors in a one-to-one correspondence mode, part of middle layers in the down-sampling process are copied to the up-sampling layer to participate in the up-sampling process, and the output of the 6 th to 9 th layers is subjected to one-time convolution operation to obtain a final output image.
Preferably, during maximum pooling and lower adoption, a maximum is determined as a result from 4 pixels at a time; in the above-mentioned application process, a line buffer structure is adopted, and the required result can be obtained by means of calculation of adjacent pixels.
Preferably, the convolution module is composed of 3 row registers with a length of 28, 3 row registers with a length of 12 and a 3 × 3 multiply-add array.
Preferably, the non-linear function sigmoid module stores a sigmoid function value corresponding to an argument in advance in a ROM or a RAM, wherein the argument is used as an address input, and the function value is used as an output of the module, so as to realize the sigmoid function.
Preferably, the pooling module is composed of 2 line registers of length 24, 2 line registers of length 8, and a multiply-add array of 2 × 2.
Preferably, the intermediate quantity storage module is used for storing the result generated by each pooling module, and after the connection state of the convolution module is changed, the temporarily stored intermediate result is read again and input to the changed convolution module.
Preferably, the fully-connected layer module includes 10 multiply-accumulators, the data and the weight parameter are input into the fully-connected layer module, and the output result of the fully-connected layer module can be obtained after 192 clock cycles by using the 10 multiply-accumulators.
Preferably, the convolution calculation of the input image in the system adopts a parallel calculation structure of 3 × 3 cycles, the parallel calculation structure includes 3 line buffers, 3 registers are respectively arranged after each line buffer, when the input image is input through 1 line buffer, 3 line buffers prefetch data to be processed of 3 lines, 3 registers after each line buffer access the first 3 pixels of the line buffer output data simultaneously, and output results of 9 registers are merged; in the data processing process, 3 line buffers continuously pre-fetch data to be processed, all output results of 9 registers are combined through pixel shifting, and the convolution calculation result of the input image is obtained through the combination.
In this embodiment, in fig. 1, Conv 3x3 ReLU + BN is a 3x3 convolution module and an activation function, ReLU + BN is the activation function, Add is an addition, max pool 2x2 is the maximum pooling of 2x2, and up-Conv2x2 is the convolution of 2x 2; reg in fig. 2 is a register, buffer is a buffer, mul represents multiplication, h11, h12, h13, h21, h22, h23, h31, h32 and h33 are data stored in 9 registers, k11, k12, k13, k21, k22, k23, k31, k32 and k33 are data multiplied by data in 9 registers, and Add is addition. The activation function of the nonlinear function sigmoid module is ReLU + BN.
The optical aberration distortion correction system based on the FPGA convolutional neural network structure provided by the embodiment has the following working process:
s1, establishing an FPGA convolutional neural network model based on deep learning training target images and aberration distortion;
s2, after the construction of the FPGA convolutional neural network model is completed, inputting an original target imaging graph and a target imaging graph subjected to turbulence distortion as the FPGA convolutional neural network model, and outputting atmospheric turbulence phase distortion as the FPGA convolutional neural network model;
s3, the wavefront corrector loads the driving signals to each driver of the wavefront calibrator to enable the wavefront corrector to generate a deformation amount conjugated with the wavefront to be corrected so as to correct the aberration of the wavefront caused by atmospheric turbulence distortion and finish the correction of the wavefront to be corrected;
s4, training the FPGA convolutional neural network by using a main control computer to load a low-price Zernike coefficient by using a liquid crystal phase screen to generate phase distortion for describing a Kolmogorov turbulence spectrum as network output, and loading the phase distortion into a target imaging light path to obtain a simulated turbulence distortion imaging graph of a target and an original target imaging graph as input;
s5, learning the parameters of the established convolutional neural network by reducing the function value of the loss function by adopting a random gradient descent algorithm during training of the FPGA convolutional neural network; the loss function is:
Figure BDA0002820904370000071
where Nx and Ny denote the number of pixels in the x and y directions, respectively, Yij denotes the pixel value of the actually loaded phase screen at coordinate (i, j),
Figure BDA0002820904370000072
the pixel value at coordinate (i, j) of the phase screen representing the output of the network model. Since the network processes grayscale images, the pixel value ranges are all [0,255 ]]。
The data rearrangement method for the convolution calculation result of the input image comprises the following steps:
s1, acquiring all convolution calculation results to be rearranged, and establishing an MxN basic data set according to a preset row number M and a preset column number N;
s2, taking any convolution calculation result in the basic data set as a target object, and carrying out similarity calculation on the target object and all data in the basic data set one by one;
s3, establishing a set of convolution calculation results with similarity calculation results larger than a preset value in the basic data set as a similar data set of the target object;
s4, extracting the feature information of the similar data collection of all convolution calculation results, and establishing a feature information collection;
s5, obtaining the mapping relation between the basic data collection and the characteristic information collection;
s6, generating a plurality of two-dimensional data rearrangement paths for the feature information collection, screening an optimal data rearrangement path with the shortest rearrangement element distance from the plurality of two-dimensional data rearrangement paths, and rearranging the feature information collection according to the optimal data rearrangement path;
and S7, rearranging the data of the basic data set according to the mapping relation between the basic data set and the characteristic information set and the rearrangement result of the characteristic information set.
According to the data rearrangement method, the basic data set is established, the similarity calculation is carried out, the interference of irrelevant information is removed, the image data volume participating in the matching calculation is reduced by extracting the characteristic information, the redundant data in the image processing process is reduced under the condition that the effective information is completely acquired, and the detection speed is improved under the condition that the accuracy is ensured.
In order to verify the processing effect of the optical aberration distortion correction system provided in this embodiment on optical aberration distortion, the inventor analyzes the computation performance of the FPGA convolutional neural network model of the system, and since different layers in the network structure are connected by pooling or interpolation, the computation performance of each layer can be analyzed first, and assuming that all operation layers in each layer can be completely parallel, the analysis of each layer is as follows:
a first layer: the input data size is 128x128, and after the data is completely parallel, the number of required multipliers (multipliers), adders (adders) and buffers (buffers) is as follows:
1-1:1conv16:multiplier:9x16=144、buffer:128x3=384B、adder:8x16=128
cycles to process: because the convolution of 16 channels can be performed simultaneously, and 9 multiplications can be performed simultaneously during convolution, after a certain delay and the first 16 results come out, 16 results can be taken out every cycle, and the time of this layer of operation can be calculated as: (128x2+15) +128x128 ═ 16655cycles
1-2:16conv16:multiplier:9x16x16=2304、buffer:128x3x16=6144、adder:8x16x16=2048
Cycles to process: because the convolution of 16 channels of 16 data can be performed simultaneously, and 9 multiplications can be performed simultaneously during the convolution, after a certain delay, 16 results can be obtained from each cycle after the first 16 results are obtained, and the time for this layer of operation can be calculated to be about: (128x2+19) +128x128 ═ 16659cycles
1-3:16conv16:multiplier:9x16x16=2304、buffer:128x3x16=6144、adder:8x16x16=2048
Cycles to process: (128x2+19) +128x128 ═ 16659cycles
1-4:adder:16
Cycles to process: the point-to-point addition operation of 2 data is carried out in the operation, and the operation can be combined with 1-3 by design, and can be completed by only adding 1-2 cycles.
1-5:16conv16:multiplier:9x16x16=2304、buffer:128x3x16=6144、adder:128x16=2048
Cycles to process: (128x2+19) +128x128 ═ 16659cycles
If the FPGA has various resources such as SLC 600; memory 32 Mb; 2520 DSP; 328I/O, wherein each DSP can realize a multiplier of 25bit x18bit, thus the network parameter needs to be quantized. Layer1 cannot fully implement the pipeline approach due to the limited number of DSPs. Thus, the multiplication required for 1 convolution per multiplier can be completed, and if no further pipelining is considered between the internal layers, the operation time required for the first layer is about: 16655+16659+16659+16659 is 66632 cycles.
A second layer: is obtained by pooling on the basis of the first Layer, and the input data size is 64x64, namely the 1/4 size of Layer 1.
2-1:16conv32:multiplier:9x16x32=4608、buffer:64x3x16=3072、adder:8x16x32=4096
Cycles to process: if the layer needs to be fully parallel, 4608 multipliers are needed, the resource of the FPGA is exceeded, so that multiplexing is needed, the operation is completed through 2 rounds of use, and the 2 rounds of use can be pipelined, so that the processing time is as follows:
(64x2+15)+(64x64)+(64x64)=8335cycles
2-2:32conv32:multiplier:9x32x32=9216、buffer:64x3x32=6144、adder:8x32x32=8192
cycles to process: if the layer is to be in full parallel, 9216 multipliers are needed, resources of the FPGA are exceeded, multiplexing is needed, operations are completed through 4 rounds of use, and the 4 rounds of use can be pipelined, so that processing time is as follows:
(64x2+15)+(64x64)x4=16527cycles
2-3:32conv32:multiplier:9x32x32=9216、buffer:64x3x32=6144、adder:8x32x32=8192、cycles:16527cycles
2-4:adder:32
cycles to process: the operation is a point-to-point addition operation of 2 data, and the operation can be combined with 2-3 by design, and can be completed by only adding 1-2 cycles.
2-5:32conv32:multiplier:9x32x32=9216、buffer:64x3x32=6144、adder:8x32x32=8192、cycles:16527cycles
Due to the limited number of DSPs, the second layer cannot fully implement the pipeline approach. Thus, the multiplication required for 1 convolution per multiplier can be completed, and if no further pipelining is considered between the internal layers, the operation time required for the second layer is about: 8335+16527+16527+16527 ═ 57916 cycles.
The same principle is that: it can be calculated that: and a third layer: 57660 cycles; a fourth layer: 57532 cycles; and a fifth layer: not exceeding 57532 cycles; a sixth layer: 32956+49293 ═ 82249; a seventh layer: 24813+49389 ═ 74202; an eighth layer: 33340+ 49581-82921; a ninth layer: 16655x2+16659x3 ═ 83287; and (3) estimating the overall time: 1 st to 9 th layers: 636315cycles
If the designed system operating frequency is 200MHz, the processing frame rate of the system is: 200M/636315-314 fps; if the operating frequency is 300MHz, then: 471 fps. When the working frequency is above 320MHz, the following can be achieved: 502 fps.
Through the above analysis, it is demonstrated that the system employing the present embodiment has excellent performance; the above analysis only considers the fix-point operation on floating-point numbers, but does not change the network model. If the flow design is carried out between the layers and the working frequency is further improved, the requirement of 500fps is expected to be met.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. An optical aberration distortion correction system based on an FPGA convolutional neural network structure is characterized by comprising a detection camera, a correction component and an FPGA convolutional neural network model, wherein the detection camera is a CDD camera, and the correction component comprises a deformable mirror, a convex lens and a semi-transparent semi-reflective mirror; the FPGA convolutional neural network model comprises a convolution module, a nonlinear function sigmoid module, a pooling module, an intermediate quantity storage module and a full connection layer module, data are subjected to convolution operation and pooling operation through the convolution module and the pooling module, and activated connection between layers is achieved through the nonlinear function sigmoid module.
2. The optical aberration distortion correction system based on the FPGA convolutional neural network structure as claimed in claim 1, wherein the network structure of the FPGA convolutional neural network model is divided into 9 layers, each layer comprises 3 convolutional modules, 1 bit-adding module and 1 convolutional module which are connected in sequence, and different modules of each layer are connected in a pooling or interpolation manner.
3. The optical aberration distortion correction system based on the FPGA convolutional neural network structure as claimed in claim 2, wherein the 1 st to 4 th layers of the FPGA convolutional neural network model are down-sampling layers, the 5 th layer is a bridging layer, the 6 th to 9 th layers are up-sampling layers, up-sampling is realized by transposition convolution, the middle layer is subjected to one-time up-sampling operation, the size is enlarged to be one time of that of the previous layer, and the number of channels is controlled to be reduced by half; the 1 st to 5 th layers are connected through maximum pooling, the 6 th to 9 th layers are connected through up-convolution, the 1 st to 4 th layers and the 6 th to 9 th layers are connected through residual errors in a one-to-one correspondence mode, part of middle layers in the down-sampling process are copied to the up-sampling layer to participate in the up-sampling process, and the output of the 6 th to 9 th layers is subjected to one-time convolution operation to obtain a final output image.
4. The optical aberration distortion correction system based on the FPGA convolutional neural network structure of claim 3, wherein during maximum pooling and down-sampling, a maximum is determined as a result from 4 pixels at a time; in the above-mentioned application process, a line buffer structure is adopted, and the required result can be obtained by means of calculation of adjacent pixels.
5. The optical aberration distortion correction system based on the FPGA convolutional neural network structure of claim 1, wherein said convolutional module is composed of 3 length 28 line registers, 3 length 12 line registers and 3x3 multiply-add array.
6. The optical aberration distortion correction system based on the FPGA convolutional neural network structure as claimed in claim 1, wherein the nonlinear function sigmoid module stores the sigmoid function value corresponding to the argument in ROM or RAM in advance, wherein the argument is used as the address input, and the function value is used as the output of the module, so as to realize the sigmoid function.
7. The optical aberration distortion correction system based on the FPGA convolutional neural network structure of claim 1, wherein the pooling module is composed of 2 line registers of length 24, 2 line registers of length 8 and a multiply-add array of 2x 2.
8. The optical aberration distortion correction system based on the FPGA convolutional neural network structure as claimed in claim 1, wherein the intermediate quantity storage module is used for storing the result generated by each pooling module, and after the connection state of the convolutional module is changed, the temporarily stored intermediate result is read out again and input to the changed convolutional module.
9. The optical aberration distortion correction system based on the FPGA convolutional neural network structure of claim 1, wherein the fully-connected layer module comprises 10 multiply-accumulators, the data and the input fully-connected layer module corresponding to the weight parameter are input, and the output result of the fully-connected layer module can be obtained after 192 clock cycles by using the 10 multiply-accumulators.
10. The optical aberration distortion correction system based on the FPGA convolutional neural network structure, as claimed in claim 1, wherein the convolution calculation of the input image in the system adopts a parallel calculation structure of 3 × 3 cycles, the parallel calculation structure includes 3 line buffers, 3 registers are respectively arranged after each line buffer, when the input image is input through 1 line buffer, 3 line buffers prefetch the data to be processed of 3 lines, 3 registers after each line buffer access the first 3 pixels of the output data of the line buffer at the same time, and the output results of 9 registers are merged; in the data processing process, 3 line buffers continuously pre-fetch data to be processed, the convolution calculation result of the input image is obtained by pixel shifting and combining all output results of 9 registers, and data rearrangement is carried out on the convolution calculation result of the input image.
CN202011418118.8A 2020-12-07 2020-12-07 Optical aberration distortion correction system based on FPGA convolutional neural network structure Pending CN112529799A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011418118.8A CN112529799A (en) 2020-12-07 2020-12-07 Optical aberration distortion correction system based on FPGA convolutional neural network structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011418118.8A CN112529799A (en) 2020-12-07 2020-12-07 Optical aberration distortion correction system based on FPGA convolutional neural network structure

Publications (1)

Publication Number Publication Date
CN112529799A true CN112529799A (en) 2021-03-19

Family

ID=74997166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011418118.8A Pending CN112529799A (en) 2020-12-07 2020-12-07 Optical aberration distortion correction system based on FPGA convolutional neural network structure

Country Status (1)

Country Link
CN (1) CN112529799A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114862654A (en) * 2022-04-15 2022-08-05 山东浪潮科学研究院有限公司 Method and system for realizing real-time template convolution on FPGA

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076915A1 (en) * 2008-09-25 2010-03-25 Microsoft Corporation Field-Programmable Gate Array Based Accelerator System
US20180046897A1 (en) * 2016-08-12 2018-02-15 Beijing Deephi Intelligence Technology Co., Ltd. Hardware accelerator for compressed rnn on fpga
CN108805267A (en) * 2018-05-28 2018-11-13 重庆大学 The data processing method hardware-accelerated for convolutional neural networks
CN108805274A (en) * 2018-05-28 2018-11-13 重庆大学 The hardware-accelerated method and system of Tiny-yolo convolutional neural networks based on FPGA
CN109032781A (en) * 2018-07-13 2018-12-18 重庆邮电大学 A kind of FPGA parallel system of convolutional neural networks algorithm
CN109031654A (en) * 2018-09-11 2018-12-18 安徽农业大学 A kind of adaptive optics bearing calibration and system based on convolutional neural networks
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm
CN110084739A (en) * 2019-03-28 2019-08-02 东南大学 A kind of parallel acceleration system of FPGA of the picture quality enhancement algorithm based on CNN
CN110648298A (en) * 2019-11-01 2020-01-03 中国工程物理研究院流体物理研究所 Optical aberration distortion correction method and system based on deep learning
CN110651277A (en) * 2019-08-08 2020-01-03 京东方科技集团股份有限公司 Computer-implemented method, computer-implemented diagnostic method, image classification apparatus, and computer program product
US20200284883A1 (en) * 2019-03-08 2020-09-10 Osram Gmbh Component for a lidar sensor system, lidar sensor system, lidar sensor device, method for a lidar sensor system and method for a lidar sensor device
CN111967468A (en) * 2020-08-10 2020-11-20 东南大学 FPGA-based lightweight target detection neural network implementation method

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100076915A1 (en) * 2008-09-25 2010-03-25 Microsoft Corporation Field-Programmable Gate Array Based Accelerator System
US20180046897A1 (en) * 2016-08-12 2018-02-15 Beijing Deephi Intelligence Technology Co., Ltd. Hardware accelerator for compressed rnn on fpga
CN108805267A (en) * 2018-05-28 2018-11-13 重庆大学 The data processing method hardware-accelerated for convolutional neural networks
CN108805274A (en) * 2018-05-28 2018-11-13 重庆大学 The hardware-accelerated method and system of Tiny-yolo convolutional neural networks based on FPGA
CN109032781A (en) * 2018-07-13 2018-12-18 重庆邮电大学 A kind of FPGA parallel system of convolutional neural networks algorithm
CN109031654A (en) * 2018-09-11 2018-12-18 安徽农业大学 A kind of adaptive optics bearing calibration and system based on convolutional neural networks
CN109948784A (en) * 2019-01-03 2019-06-28 重庆邮电大学 A kind of convolutional neural networks accelerator circuit based on fast filtering algorithm
US20200284883A1 (en) * 2019-03-08 2020-09-10 Osram Gmbh Component for a lidar sensor system, lidar sensor system, lidar sensor device, method for a lidar sensor system and method for a lidar sensor device
CN110084739A (en) * 2019-03-28 2019-08-02 东南大学 A kind of parallel acceleration system of FPGA of the picture quality enhancement algorithm based on CNN
CN110651277A (en) * 2019-08-08 2020-01-03 京东方科技集团股份有限公司 Computer-implemented method, computer-implemented diagnostic method, image classification apparatus, and computer program product
CN110648298A (en) * 2019-11-01 2020-01-03 中国工程物理研究院流体物理研究所 Optical aberration distortion correction method and system based on deep learning
CN111967468A (en) * 2020-08-10 2020-11-20 东南大学 FPGA-based lightweight target detection neural network implementation method

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
XIAO MENG 等: "An FPGA-based accelerator platform implements for convolutional neural network", 《HP3C 19: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS》 *
庞阔: "基于微透镜阵列的光学成像***设计与应用的研究", 《中国博士论文全文数据库 工程科技II辑》 *
张祖扬: "深度神经网络硬件加速研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王保峰: "基于自适应光学的激光精跟踪技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈辰 等: "基于Zynq7000 FPGA异构平台的YOLOv2加速器设计与实现", 《计算机科学与探索》 *
马利: "计算机视觉中深度信息估计算法的研究", 《中国博士论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114862654A (en) * 2022-04-15 2022-08-05 山东浪潮科学研究院有限公司 Method and system for realizing real-time template convolution on FPGA

Similar Documents

Publication Publication Date Title
JP7329533B2 (en) Method and accelerator apparatus for accelerating operations
US9411726B2 (en) Low power computation architecture
US10540574B2 (en) Image compression method and related device
EP3944157A1 (en) Device and method for performing training of convolutional neural network
US20180197067A1 (en) Methods and apparatus for matrix processing in a convolutional neural network
CN108629406B (en) Arithmetic device for convolutional neural network
CN109121435A (en) Processing unit and processing method
US6947916B2 (en) IC for universal computing with near zero programming complexity
EP4276690A1 (en) Vector computation unit in a neural network processor
US20210357735A1 (en) Split accumulator for convolutional neural network accelerator
JP2021521516A (en) Accelerators and systems for accelerating operations
EP0570359B1 (en) Heuristic processor
CN107633297B (en) Convolutional neural network hardware accelerator based on parallel fast FIR filter algorithm
US11244028B2 (en) Neural network processor and convolution operation method thereof
JPH03131965A (en) Two-dimensional contraction array and method for neural network
CN111260020B (en) Convolutional neural network calculation method and device
US10755169B2 (en) Hybrid non-uniform convolution transform engine for deep learning applications
US11983616B2 (en) Methods and apparatus for constructing digital circuits for performing matrix operations
JP2020107338A (en) Method and apparatus for processing convolution operation in neural network
JP2022541721A (en) Systems and methods that support alternate number formats for efficient multiplication
JP2022510237A (en) Camera self-calibration network
CN110377874B (en) Convolution operation method and system
CN112529799A (en) Optical aberration distortion correction system based on FPGA convolutional neural network structure
KR20230081697A (en) Method and apparatus for accelerating dilatational convolution calculation
US11526305B2 (en) Memory for an artificial neural network accelerator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination