CN112529799A

CN112529799A - Optical aberration distortion correction system based on FPGA convolutional neural network structure

Info

Publication number: CN112529799A
Application number: CN202011418118.8A
Authority: CN
Inventors: 刘国栋; 胡流森; 吴小龑; 吴凌远
Original assignee: Institute of Fluid Physics of CAEP
Current assignee: Institute of Fluid Physics of CAEP
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-19

Abstract

The invention discloses an optical aberration distortion correction system based on an FPGA (field programmable gate array) convolutional neural network structure, which comprises a detection camera, a correction component and an FPGA convolutional neural network model, wherein the detection camera is a CDD (compact disc-division) camera, and the correction component comprises a deformable mirror, a convex lens and a semi-transparent semi-reflecting mirror; the FPGA convolutional neural network model comprises a convolution module, a nonlinear function sigmoid module, a pooling module, an intermediate quantity storage module and a full connection layer module, data are subjected to convolution operation and pooling operation through the convolution module and the pooling module, and activated connection between layers is achieved through the nonlinear function sigmoid module. Compared with the prior art, the optical aberration distortion correction system provided by the invention has the advantages of low power consumption, high running speed, high efficiency and the like.

Description

Optical aberration distortion correction system based on FPGA convolutional neural network structure

Technical Field

The invention relates to the technical field of optics, in particular to an optical aberration distortion correction system based on an FPGA convolutional neural network structure.

Background

Turbulence effects in the atmosphere can cause light intensity fluctuation, light spot drift and the like when light is transmitted in the atmosphere. These effects can lead to a decrease in the concentration of laser energy transmitted in the far-field transmission of laser light, resulting in a decrease in the resolution of the optical imaging system. Improving the beam quality of the laser system and improving the resolving power of the optical imaging system requires correcting aberration distortion caused by atmospheric turbulence.

The convolutional neural network can learn the target imaging and the turbulent aberration distortion information containing the turbulent aberration information, and the turbulent aberration distortion information is obtained by acquiring the target imaging information, so that the correction of the turbulent aberration distortion is realized. But the convolution neural network has a large calculation amount and is difficult to meet the requirement of high-frequency phase distortion caused by atmospheric turbulence.

Disclosure of Invention

The invention aims to overcome the defects that the calculation amount of a convolutional neural network in the prior art is large and the requirement of high-frequency phase distortion caused by atmospheric turbulence is difficult to meet, and provides an optical aberration distortion correction system based on an FPGA convolutional neural network structure.

The purpose of the invention is mainly realized by the following technical scheme:

an optical aberration distortion correction system based on an FPGA (field programmable gate array) convolutional neural network structure comprises a detection camera, a correction component and an FPGA convolutional neural network model, wherein the detection camera is a CDD (complementary digital display) camera, and the correction component comprises a deformable mirror, a convex lens and a semi-transparent semi-reflective mirror; the FPGA convolutional neural network model comprises a convolution module, a nonlinear function sigmoid module, a pooling module, an intermediate quantity storage module and a full connection layer module, data are subjected to convolution operation and pooling operation through the convolution module and the pooling module, and activated connection between layers is achieved through the nonlinear function sigmoid module.

The technical scheme does not depend on an optical beacon and a wavefront sensor, and realizes wavefront sensing and correction by establishing a deep neural network facing to the real physical process of atmospheric turbulence and directly imaging a target through an optical system, thereby finally realizing detection and reconstruction of wavefront aberration. The technical scheme adopts an FPGA convolutional neural network model and utilizes a convolution module to carry out convolution operation; the nonlinear function sigmoid module is used for realizing a sigmoid function in the convolutional neural network; the pooling module is used for further pooling the output of the convolution layer to realize the dimensionality reduction of characteristic output; the intermediate quantity storage module is used for storing intermediate variables in the convolutional neural network calculation; and the full connection layer module is used for realizing the calculation function of the convolutional neural network sensor and realizing output.

What needs to be explained is that the detection camera in the optical aberration distortion correction system of the technical scheme realizes the detection imaging of the target object; the correction component mainly compensates aberration distortion of atmospheric turbulence, the FPGA is used as a carrier to realize a convolutional neural network model, and the control voltage of the deformable mirror is calculated by adopting a depth neural network model through calculating pictures collected by a detection camera; the detection camera is 2048 multiplied by 2048 pixels, the frame frequency is 90fps, the deformable mirror is a piezoelectric deformable mirror and comprises 40 actuators, the highest refresh rate is 4kHz, the deformable mirror comprises a mirror with a pupil diameter of 10mm, a silver film with a protective layer, and the highest refresh rate is 4 kHz.

Furthermore, the network structure of the FPGA convolutional neural network model is divided into 9 layers, each layer comprises 3 convolutional modules, 1 bit adding module and 1 convolutional module which are connected in sequence, and different modules of each layer are connected in a pooling or interpolation mode.

The FPGA convolutional neural network model designed by the technical scheme comprises 9 different processing layers, the different layers are connected in a pooling or interpolation mode, and the working frequency of the system in the technical scheme is 200MHz, so that the processing frame frequency of the system is as follows: 200M/636315-314 fps, when the operating frequency is 300MHz, then: 471 fps. When the working frequency is above 320MHz, the following can be achieved: 502fps, the calculation speed is obviously accelerated. It should be noted that, in the present technical solution, each internal operation layer of each layer can implement complete parallelism. Preferably, in this embodiment, the convolution modules at layers 1 and 9 include 16 channels, the convolution modules at layers 2 and 8 include 32 channels, the convolution module at layer 3 includes 64 channels, the convolution modules at layers 4 and 6 include 128 channels, and the convolution modules at layers 5 and 7 include 256 channels.

Furthermore, layers 1 to 4 of the FPGA convolutional neural network model are down-sampling layers, layers 5 are bridging layers, layers 6 to 9 are up-sampling layers, up-sampling is realized by transposition convolution, the size of the middle layer is enlarged to be one time of that of the previous layer through one-time up-sampling operation, and the number of channels is controlled to be reduced by half; the 1 st to 5 th layers are connected through maximum pooling, the 6 th to 9 th layers are connected through up-convolution, the 1 st to 4 th layers and the 6 th to 9 th layers are connected through residual errors in a one-to-one correspondence mode, part of middle layers in the down-sampling process are copied to the up-sampling layer to participate in the up-sampling process, and the output of the 6 th to 9 th layers is subjected to one-time convolution operation to obtain a final output image.

Further, during maximum pooling and down-sampling, a maximum is determined as a result from 4 pixels at a time; in the above-mentioned application process, a line buffer structure is adopted, and the required result can be obtained by means of calculation of adjacent pixels.

For the operation of maximum pooling or down-sampling, the technical scheme determines a maximum result from 4 pixels each time in a mode of 4 input comparators; for upsampling, a structure similar to a convolution line buffer is adopted, and a required result is obtained through calculation of adjacent pixels. Preferably, by setting the comparator to perform the Relu operation, BN can be resolved by means of a look-up table.

Further, the convolution module is composed of 3 row registers with the length of 28, 3 row registers with the length of 12 and a multiplication and addition array of 3 multiplied by 3.

The row registers of the convolution module are connected end to end, and the tail parts of the row registers are connected to one row of the multiplication and addition array through the data selector. Each structural unit of the multiplication and addition array comprises 2 registers and 1 multiplier, wherein the 2 registers respectively store one element of a convolution kernel and one pixel of an input image. This structure can perform convolution operations with convolution kernel sizes of 3 × 3 when images of sizes 28 × 28 and 12 × 12, respectively, are input.

Further, the non-linear function sigmoid module stores a sigmoid function value corresponding to an argument in a ROM or a RAM in advance, wherein the argument is used as an address input, and the function value is used as an output of the module, so that the sigmoid function is realized.

Further, the pooling module is composed of 2 row registers with a length of 24, 2 row registers with a length of 8 and a multiply-add array of 2 × 2.

In the technical scheme, the row registers of the pooling module are connected end to end, and the tail parts of the row registers are connected to one row of the multiplication and addition array through the data selector. Each structural unit of the multiplication and addition array comprises 2 registers and 1 multiplier, wherein the 2 registers respectively store one element of a convolution kernel and one pixel of an input image. The module may pool 24 × 24 output feature maps into 12 × 12 feature maps, or may pool 8 × 8 output feature maps into 4 × 4 feature maps.

Furthermore, the intermediate quantity storage module is used for storing the result generated by each pooling module, and after the connection state of the convolution module is changed, the temporarily stored intermediate result is read out again and input to the changed convolution module.

The intermediate quantity storage module in the technical scheme is used for storing intermediate variables generated in the calculation process of the convolutional neural network, and comprises ram and ram _ control, wherein the ram is used for data storage, and the ram _ control is used for storage of the control module. The module stores the results generated after each pooling layer, and after the state changes, i.e., the connection of the convolution module changes, the temporarily stored intermediate results are read out again and input to the changed convolution module.

Furthermore, the full-connection layer module comprises 10 multiply-accumulate units, data and weight parameters are input into the full-connection layer module, and output results of the full-connection layer module can be obtained after 192 clock cycles by using the 10 multiply-accumulate units.

The full connection layer in the technical scheme is used for realizing the calculation function of a single-layer perceptron in the convolutional neural network and completing full connection between 192 inputs and 10 outputs. And inputting the data corresponding to the weight parameters by using 10 multiply accumulators, and obtaining the output result of the full connection layer after 192 clock cycles. Since 10 multiply-accumulate results need to pass through the sigmoid function, only one sigmoid module is used for saving resources. Therefore, 10 multiply-accumulate results need to be temporarily stored, serially input to the sigmoid module, and finally, the fully-connected layer also serially outputs 10 results.

Further, the convolution calculation of the input image in the system adopts a parallel calculation structure of 3x3 cycles, the parallel calculation structure comprises 3 line buffers, 3 registers are arranged after each line buffer, when the input image is input through 1 line buffer, 3 line buffers pre-fetch 3 lines of data to be processed, 3 registers after each line buffer access the first 3 pixels of the line buffer output data simultaneously, and output results of 9 registers are merged; in the data processing process, 3 line buffers continuously pre-fetch data to be processed, all output results of 9 registers are combined through pixel shifting, and the convolution calculation result of the input image is obtained through the combination.

Due to the limitations of FPGA resources, it is not possible to implement fully pipelined hardware for the network architecture described above. After research, the inventor finds that more calculations are performed in a hardware multiplexing mode to be a more effective method than a hardware architecture required for realizing full-flow hardware on a network structure and realizing one layer or even part of calculations in one layer in an FPGA. According to the technical scheme, convolution calculation is adopted in a large number of FPGA convolution neural network models, all convolution kernels are 3x3, and if the convolution of each pixel point can be completed only by 3x3 of cycles according to a serial calculation mode on a general CPU, the calculation time is long. Therefore, the FPGA convolutional neural network model of the technical scheme adopts a parallel computing structure in the convolution computing process, so that 9 multiplications related to each convolution are simultaneously carried out, and the convolutions of a plurality of channels can also be simultaneously carried out according to the condition of resources. For example, if 1 input image is convolved with convolution kernels of 16 channels to generate 16 results, and if the serial operation requires 128 × 3 × 16 × 2359296 times of calculation time, while the parallel calculation structure of the present embodiment requires about 128 × 128+128 × 2 × 16640 times of calculation time if 16 channels are parallel, the time is shortened by about 141 times by the parallel calculation of the present embodiment under the same operating frequency. In addition, the technical scheme also ensures that after the convolution corresponding to one pixel is calculated, the convolution value corresponding to the subsequent pixel can be continuously output through the design of the pipeline structure. That is, when the pipeline is full, each beat can output a convolution calculation result; by designing 3 line buffers, the data to be processed of 3 lines can be prefetched in place, and then the first 3 pixels of each line are accessed simultaneously by combining 9 registers, so that 9 required pixel values can be taken for the next convolution operation after each pixel shifting, and by the structure, after the delay of one pipeline is full, the convolution result can be continuously output. In addition, the proper structure can be designed among different layers and different sub-layers of each layer, so that the water can flow fully, and the overall processing performance is improved. After the system is in a parallel computing structure, the system also comprises a data rearrangement function, and the system also comprises an off-chip memory, wherein the convolution calculation result of the input image is stored in the off-chip memory, and the data rearrangement function is carried out in the off-chip memory; due to the limitation of on-chip resources of the FPGA, an intermediate result of calculation must be written back to an off-chip memory, and the next-stage calculation is read from the off-chip memory; due to the design of the parallel structure, the data to be read may not need one complete picture/data and then one complete picture/data, but need the same part of a plurality of pictures/data at the same time; therefore, through the rearrangement of the data, the data to be processed simultaneously can be arranged on the storage space, and one-time reading is convenient.

In conclusion, compared with the prior art, the invention has the following beneficial effects:

1. the invention does not depend on an optical beacon and a wavefront sensor, realizes wavefront sensing and correction by establishing a deep neural network facing to the real physical process of atmospheric turbulence and directly imaging a target through an optical system, and finally realizes the detection and reconstruction of wavefront aberration; the invention adopts an FPGA convolution neural network model and utilizes a convolution module to carry out convolution operation; the nonlinear function sigmoid module is used for realizing a sigmoid function in the convolutional neural network; the pooling module is used for further pooling the output of the convolution layer to realize the dimensionality reduction of characteristic output; the intermediate quantity storage module is used for storing intermediate variables in the convolutional neural network calculation; and the full connection layer module is used for realizing the calculation function of the convolutional neural network sensor and realizing output.

2. The FPGA convolutional neural network model adopts a parallel computing structure in the convolutional computing process, so that 9 multiplications related to each convolution are simultaneously carried out, and the convolutions of a plurality of channels can also be simultaneously carried out according to the condition of resources; through the design of the pipeline structure, after the convolution corresponding to one pixel is calculated, the convolution value corresponding to the subsequent pixel can be continuously output. Suitable structures can be designed among different layers and different sub-layers of each layer, so that the water can flow fully, and the overall processing performance is improved. .

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is an FPGA convolutional neural network model of the present invention;

FIG. 2 is a parallel computing architecture for convolution computation of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example 1:

as shown in fig. 1 and fig. 2, the present embodiment includes a detection camera, a correction component and an FPGA convolutional neural network model, where the detection camera is a CDD camera, and the correction component includes a deformable mirror, a convex lens and a half-mirror; the FPGA convolutional neural network model comprises a convolution module, a nonlinear function sigmoid module, a pooling module, an intermediate quantity storage module and a full connection layer module, data are subjected to convolution operation and pooling operation through the convolution module and the pooling module, and activated connection between layers is achieved through the nonlinear function sigmoid module.

Preferably, the network structure of the FPGA convolutional neural network model is divided into 9 layers, each layer includes 3 convolutional modules, 1 bit addition module and 1 convolutional module connected in sequence, and different modules of each layer are connected by pooling or interpolation.

Preferably, the 1 st to 4 th layers of the FPGA convolutional neural network model are down-sampling layers, the 5 th layer is a bridging layer, the 6 th to 9 th layers are up-sampling layers, up-sampling is realized by transposition convolution, the size of the middle layer is expanded to be one time of that of the previous layer through one-time up-sampling operation, and the number of channels is controlled to be reduced by half; the 1 st to 5 th layers are connected through maximum pooling, the 6 th to 9 th layers are connected through up-convolution, the 1 st to 4 th layers and the 6 th to 9 th layers are connected through residual errors in a one-to-one correspondence mode, part of middle layers in the down-sampling process are copied to the up-sampling layer to participate in the up-sampling process, and the output of the 6 th to 9 th layers is subjected to one-time convolution operation to obtain a final output image.

Preferably, during maximum pooling and lower adoption, a maximum is determined as a result from 4 pixels at a time; in the above-mentioned application process, a line buffer structure is adopted, and the required result can be obtained by means of calculation of adjacent pixels.

Preferably, the convolution module is composed of 3 row registers with a length of 28, 3 row registers with a length of 12 and a 3 × 3 multiply-add array.

Preferably, the non-linear function sigmoid module stores a sigmoid function value corresponding to an argument in advance in a ROM or a RAM, wherein the argument is used as an address input, and the function value is used as an output of the module, so as to realize the sigmoid function.

Preferably, the pooling module is composed of 2 line registers of length 24, 2 line registers of length 8, and a multiply-add array of 2 × 2.

Preferably, the intermediate quantity storage module is used for storing the result generated by each pooling module, and after the connection state of the convolution module is changed, the temporarily stored intermediate result is read again and input to the changed convolution module.

Preferably, the fully-connected layer module includes 10 multiply-accumulators, the data and the weight parameter are input into the fully-connected layer module, and the output result of the fully-connected layer module can be obtained after 192 clock cycles by using the 10 multiply-accumulators.

Preferably, the convolution calculation of the input image in the system adopts a parallel calculation structure of 3 × 3 cycles, the parallel calculation structure includes 3 line buffers, 3 registers are respectively arranged after each line buffer, when the input image is input through 1 line buffer, 3 line buffers prefetch data to be processed of 3 lines, 3 registers after each line buffer access the first 3 pixels of the line buffer output data simultaneously, and output results of 9 registers are merged; in the data processing process, 3 line buffers continuously pre-fetch data to be processed, all output results of 9 registers are combined through pixel shifting, and the convolution calculation result of the input image is obtained through the combination.

In this embodiment, in fig. 1, Conv 3x3 ReLU + BN is a 3x3 convolution module and an activation function, ReLU + BN is the activation function, Add is an addition, max pool 2x2 is the maximum pooling of 2x2, and up-Conv2x2 is the convolution of 2x 2; reg in fig. 2 is a register, buffer is a buffer, mul represents multiplication, h11, h12, h13, h21, h22, h23, h31, h32 and h33 are data stored in 9 registers, k11, k12, k13, k21, k22, k23, k31, k32 and k33 are data multiplied by data in 9 registers, and Add is addition. The activation function of the nonlinear function sigmoid module is ReLU + BN.

The optical aberration distortion correction system based on the FPGA convolutional neural network structure provided by the embodiment has the following working process:

s1, establishing an FPGA convolutional neural network model based on deep learning training target images and aberration distortion;

s2, after the construction of the FPGA convolutional neural network model is completed, inputting an original target imaging graph and a target imaging graph subjected to turbulence distortion as the FPGA convolutional neural network model, and outputting atmospheric turbulence phase distortion as the FPGA convolutional neural network model;

s3, the wavefront corrector loads the driving signals to each driver of the wavefront calibrator to enable the wavefront corrector to generate a deformation amount conjugated with the wavefront to be corrected so as to correct the aberration of the wavefront caused by atmospheric turbulence distortion and finish the correction of the wavefront to be corrected;

s4, training the FPGA convolutional neural network by using a main control computer to load a low-price Zernike coefficient by using a liquid crystal phase screen to generate phase distortion for describing a Kolmogorov turbulence spectrum as network output, and loading the phase distortion into a target imaging light path to obtain a simulated turbulence distortion imaging graph of a target and an original target imaging graph as input;

s5, learning the parameters of the established convolutional neural network by reducing the function value of the loss function by adopting a random gradient descent algorithm during training of the FPGA convolutional neural network; the loss function is:

where Nx and Ny denote the number of pixels in the x and y directions, respectively, Yij denotes the pixel value of the actually loaded phase screen at coordinate (i, j),

the pixel value at coordinate (i, j) of the phase screen representing the output of the network model. Since the network processes grayscale images, the pixel value ranges are all [0,255 ]]。

The data rearrangement method for the convolution calculation result of the input image comprises the following steps:

s1, acquiring all convolution calculation results to be rearranged, and establishing an MxN basic data set according to a preset row number M and a preset column number N;

s2, taking any convolution calculation result in the basic data set as a target object, and carrying out similarity calculation on the target object and all data in the basic data set one by one;

s3, establishing a set of convolution calculation results with similarity calculation results larger than a preset value in the basic data set as a similar data set of the target object;

s4, extracting the feature information of the similar data collection of all convolution calculation results, and establishing a feature information collection;

s5, obtaining the mapping relation between the basic data collection and the characteristic information collection;

s6, generating a plurality of two-dimensional data rearrangement paths for the feature information collection, screening an optimal data rearrangement path with the shortest rearrangement element distance from the plurality of two-dimensional data rearrangement paths, and rearranging the feature information collection according to the optimal data rearrangement path;

and S7, rearranging the data of the basic data set according to the mapping relation between the basic data set and the characteristic information set and the rearrangement result of the characteristic information set.

According to the data rearrangement method, the basic data set is established, the similarity calculation is carried out, the interference of irrelevant information is removed, the image data volume participating in the matching calculation is reduced by extracting the characteristic information, the redundant data in the image processing process is reduced under the condition that the effective information is completely acquired, and the detection speed is improved under the condition that the accuracy is ensured.

In order to verify the processing effect of the optical aberration distortion correction system provided in this embodiment on optical aberration distortion, the inventor analyzes the computation performance of the FPGA convolutional neural network model of the system, and since different layers in the network structure are connected by pooling or interpolation, the computation performance of each layer can be analyzed first, and assuming that all operation layers in each layer can be completely parallel, the analysis of each layer is as follows:

a first layer: the input data size is 128x128, and after the data is completely parallel, the number of required multipliers (multipliers), adders (adders) and buffers (buffers) is as follows:

1-1：1conv16：multiplier：9x16＝144、buffer：128x3＝384B、adder：8x16＝128

cycles to process: because the convolution of 16 channels can be performed simultaneously, and 9 multiplications can be performed simultaneously during convolution, after a certain delay and the first 16 results come out, 16 results can be taken out every cycle, and the time of this layer of operation can be calculated as: (128x2+15) +128x128 ═ 16655cycles

1-2：16conv16：multiplier：9x16x16＝2304、buffer：128x3x16＝6144、adder：8x16x16＝2048

Cycles to process: because the convolution of 16 channels of 16 data can be performed simultaneously, and 9 multiplications can be performed simultaneously during the convolution, after a certain delay, 16 results can be obtained from each cycle after the first 16 results are obtained, and the time for this layer of operation can be calculated to be about: (128x2+19) +128x128 ═ 16659cycles

1-3：16conv16：multiplier：9x16x16＝2304、buffer：128x3x16＝6144、adder：8x16x16＝2048

Cycles to process: (128x2+19) +128x128 ═ 16659cycles

1-4：adder：16

Cycles to process: the point-to-point addition operation of 2 data is carried out in the operation, and the operation can be combined with 1-3 by design, and can be completed by only adding 1-2 cycles.

1-5：16conv16：multiplier：9x16x16＝2304、buffer：128x3x16＝6144、adder：128x16＝2048

Cycles to process: (128x2+19) +128x128 ═ 16659cycles

If the FPGA has various resources such as SLC 600; memory 32 Mb; 2520 DSP; 328I/O, wherein each DSP can realize a multiplier of 25bit x18bit, thus the network parameter needs to be quantized. Layer1 cannot fully implement the pipeline approach due to the limited number of DSPs. Thus, the multiplication required for 1 convolution per multiplier can be completed, and if no further pipelining is considered between the internal layers, the operation time required for the first layer is about: 16655+16659+16659+16659 is 66632 cycles.

A second layer: is obtained by pooling on the basis of the first Layer, and the input data size is 64x64, namely the 1/4 size of Layer 1.

2-1：16conv32：multiplier：9x16x32＝4608、buffer：64x3x16＝3072、adder：8x16x32＝4096

Cycles to process: if the layer needs to be fully parallel, 4608 multipliers are needed, the resource of the FPGA is exceeded, so that multiplexing is needed, the operation is completed through 2 rounds of use, and the 2 rounds of use can be pipelined, so that the processing time is as follows:

(64x2+15)+(64x64)+(64x64)＝8335cycles

2-2：32conv32：multiplier：9x32x32＝9216、buffer：64x3x32＝6144、adder：8x32x32＝8192

cycles to process: if the layer is to be in full parallel, 9216 multipliers are needed, resources of the FPGA are exceeded, multiplexing is needed, operations are completed through 4 rounds of use, and the 4 rounds of use can be pipelined, so that processing time is as follows:

(64x2+15)+(64x64)x4＝16527cycles

2-3：32conv32：multiplier：9x32x32＝9216、buffer：64x3x32＝6144、adder：8x32x32＝8192、cycles：16527cycles

2-4：adder：32

cycles to process: the operation is a point-to-point addition operation of 2 data, and the operation can be combined with 2-3 by design, and can be completed by only adding 1-2 cycles.

2-5：32conv32：multiplier：9x32x32＝9216、buffer：64x3x32＝6144、adder：8x32x32＝8192、cycles：16527cycles

Due to the limited number of DSPs, the second layer cannot fully implement the pipeline approach. Thus, the multiplication required for 1 convolution per multiplier can be completed, and if no further pipelining is considered between the internal layers, the operation time required for the second layer is about: 8335+16527+16527+16527 ═ 57916 cycles.

The same principle is that: it can be calculated that: and a third layer: 57660 cycles; a fourth layer: 57532 cycles; and a fifth layer: not exceeding 57532 cycles; a sixth layer: 32956+49293 ═ 82249; a seventh layer: 24813+49389 ═ 74202; an eighth layer: 33340+ 49581-82921; a ninth layer: 16655x2+16659x3 ═ 83287; and (3) estimating the overall time: 1 st to 9 th layers: 636315cycles

If the designed system operating frequency is 200MHz, the processing frame rate of the system is: 200M/636315-314 fps; if the operating frequency is 300MHz, then: 471 fps. When the working frequency is above 320MHz, the following can be achieved: 502 fps.

Through the above analysis, it is demonstrated that the system employing the present embodiment has excellent performance; the above analysis only considers the fix-point operation on floating-point numbers, but does not change the network model. If the flow design is carried out between the layers and the working frequency is further improved, the requirement of 500fps is expected to be met.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An optical aberration distortion correction system based on an FPGA convolutional neural network structure is characterized by comprising a detection camera, a correction component and an FPGA convolutional neural network model, wherein the detection camera is a CDD camera, and the correction component comprises a deformable mirror, a convex lens and a semi-transparent semi-reflective mirror; the FPGA convolutional neural network model comprises a convolution module, a nonlinear function sigmoid module, a pooling module, an intermediate quantity storage module and a full connection layer module, data are subjected to convolution operation and pooling operation through the convolution module and the pooling module, and activated connection between layers is achieved through the nonlinear function sigmoid module.

2. The optical aberration distortion correction system based on the FPGA convolutional neural network structure as claimed in claim 1, wherein the network structure of the FPGA convolutional neural network model is divided into 9 layers, each layer comprises 3 convolutional modules, 1 bit-adding module and 1 convolutional module which are connected in sequence, and different modules of each layer are connected in a pooling or interpolation manner.

3. The optical aberration distortion correction system based on the FPGA convolutional neural network structure as claimed in claim 2, wherein the 1 st to 4 th layers of the FPGA convolutional neural network model are down-sampling layers, the 5 th layer is a bridging layer, the 6 th to 9 th layers are up-sampling layers, up-sampling is realized by transposition convolution, the middle layer is subjected to one-time up-sampling operation, the size is enlarged to be one time of that of the previous layer, and the number of channels is controlled to be reduced by half; the 1 st to 5 th layers are connected through maximum pooling, the 6 th to 9 th layers are connected through up-convolution, the 1 st to 4 th layers and the 6 th to 9 th layers are connected through residual errors in a one-to-one correspondence mode, part of middle layers in the down-sampling process are copied to the up-sampling layer to participate in the up-sampling process, and the output of the 6 th to 9 th layers is subjected to one-time convolution operation to obtain a final output image.

4. The optical aberration distortion correction system based on the FPGA convolutional neural network structure of claim 3, wherein during maximum pooling and down-sampling, a maximum is determined as a result from 4 pixels at a time; in the above-mentioned application process, a line buffer structure is adopted, and the required result can be obtained by means of calculation of adjacent pixels.

5. The optical aberration distortion correction system based on the FPGA convolutional neural network structure of claim 1, wherein said convolutional module is composed of 3 length 28 line registers, 3 length 12 line registers and 3x3 multiply-add array.

6. The optical aberration distortion correction system based on the FPGA convolutional neural network structure as claimed in claim 1, wherein the nonlinear function sigmoid module stores the sigmoid function value corresponding to the argument in ROM or RAM in advance, wherein the argument is used as the address input, and the function value is used as the output of the module, so as to realize the sigmoid function.

7. The optical aberration distortion correction system based on the FPGA convolutional neural network structure of claim 1, wherein the pooling module is composed of 2 line registers of length 24, 2 line registers of length 8 and a multiply-add array of 2x 2.

8. The optical aberration distortion correction system based on the FPGA convolutional neural network structure as claimed in claim 1, wherein the intermediate quantity storage module is used for storing the result generated by each pooling module, and after the connection state of the convolutional module is changed, the temporarily stored intermediate result is read out again and input to the changed convolutional module.

9. The optical aberration distortion correction system based on the FPGA convolutional neural network structure of claim 1, wherein the fully-connected layer module comprises 10 multiply-accumulators, the data and the input fully-connected layer module corresponding to the weight parameter are input, and the output result of the fully-connected layer module can be obtained after 192 clock cycles by using the 10 multiply-accumulators.

10. The optical aberration distortion correction system based on the FPGA convolutional neural network structure, as claimed in claim 1, wherein the convolution calculation of the input image in the system adopts a parallel calculation structure of 3 × 3 cycles, the parallel calculation structure includes 3 line buffers, 3 registers are respectively arranged after each line buffer, when the input image is input through 1 line buffer, 3 line buffers prefetch the data to be processed of 3 lines, 3 registers after each line buffer access the first 3 pixels of the output data of the line buffer at the same time, and the output results of 9 registers are merged; in the data processing process, 3 line buffers continuously pre-fetch data to be processed, the convolution calculation result of the input image is obtained by pixel shifting and combining all output results of 9 registers, and data rearrangement is carried out on the convolution calculation result of the input image.