CN113378388A

CN113378388A - Asymmetric light polarization device structure based on asynchronous reinforcement learning and design method thereof

Info

Publication number: CN113378388A
Application number: CN202110656127.9A
Authority: CN
Inventors: 高雅玙; 易楚翘; 杜庆国; 陈志伟
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-09-10

Abstract

The invention belongs to the technical field of design of polarizing devices, and discloses an asymmetric light polarizing device structure based on asynchronous reinforcement learning and a design method thereof, wherein a simulation data set is preprocessed, and a transmissivity prediction network is built and initialized; and (3) carrying out transmissivity prediction network training and optimizing the structure of the asymmetric polarization conversion device by using an asynchronous reinforcement learning algorithm. The invention utilizes a deep neural network, accurately predicts the transmissivity attribute from the structural data through a transmissivity prediction network, and designs the structure of the asymmetric light polarization conversion device by the efficient time-saving reverse optimization of an asynchronous reinforcement learning algorithm. The transmittance prediction network and the asynchronous reinforcement learning algorithm of the deep neural network based on the residual structure enable the transmittance prediction network to be effectively trained and optimally design the structure of the asymmetric light polarization conversion device by utilizing the asynchronous reinforcement learning algorithm through effectively down-sampling data and reasonably dividing, so that the design efficiency is improved and the maximum transmittance of the device is also improved.

Description

Asymmetric light polarization device structure based on asynchronous reinforcement learning and design method thereof

Technical Field

The invention belongs to the technical field of polarization device design, and particularly relates to an asymmetric light polarization device structure based on asynchronous reinforcement learning and a design method thereof.

Background

At present, polarization is an important attribute of electromagnetic waves, and the polarization has wide research values in the aspects of imaging, military, navigation, satellite communication and the like. However, conventional polarization control is mainly achieved using half-wave plates and dichroic crystals. The principle is that when electromagnetic waves propagate inside, light with mutually perpendicular polarization directions accumulates phase differences with increasing propagation distance, thereby causing polarization conversion. However, the conventional method has undesirable effects, such as low conversion efficiency and narrow bandwidth.

In recent decades, with the rapid development of artificial electromagnetic structures (metamaterials), the increasingly unique properties of the artificial electromagnetic structures provide a new method for solving the problems brought by the traditional materials. The metamaterial is used for reshaping one or more sub-wavelength units in space according to a certain combination mode, so that the electromagnetic waves can be controlled randomly in a sub-wavelength scale. Meanwhile, the characteristics which are not possessed by natural media, such as negative refractive index characteristics, optical rotation, reverse Doppler effect and the like, can be realized.

In general, all optical systems composed of artificial subwavelength structures can become metamaterials. Two hot-gate branches, namely anisotropic metamaterials (metasurfaces) and chiral metamaterials, are typically included in metamaterials with polarization-tuning properties. Wherein, the anisotropic metamaterial (super surface) mainly introduces phase difference which is different in two orthogonal directions, thereby independently controlling the response of polarization in different directions. The chiral metamaterial has no symmetry in structure, so that electromagnetic waves are electromagnetically coupled in the structure, and the electromagnetic waves can be effectively regulated and controlled.

In the conventional design scheme of the polarization conversion super-surface structure, the design is mainly performed by a numerical simulation tool and a manual adjustment two-cycle process as shown in fig. 6. For the designed polarization conversion super-surface, firstly, the structure type (whether the structure is an anisotropic structure, a hand-shaped structure or a combination of the hand-shaped structure and the anisotropy), the period size of the structure (determining the working waveband of the polarizing device) and the materials of all layers need to be determined. After the structure type, the material and the period are determined, each super-surface structure parameter (including the thickness of each layer, the refractive index of a medium layer and the like) is initialized to be a reasonable parameter value at will, and then a mathematical model required by numerical simulation is established by using a computer language according to the requirements of a simulation tool (such as FDTD software of the Lumerical company). And then, selecting one structural parameter from the structural parameters according to the sequence of each structural parameter or other logic sequences to serve as a waiting parameter adjusting layer, wherein the rest structural parameters are fixed. And (3) simulating the current structure by using a numerical simulation tool so as to obtain the cross polarization transmittance and the same-direction polarization transmittance of the polarization conversion super-surface. Further judging whether the performance of the current polarization conversion super-surface is optimal or not, and if not, guiding how to adjust the value of the current structural parameter to be optimized through the prior knowledge of experts; if the optimal value is obtained, the next structural parameter to be adjusted is continuously selected for adjustment. When all the structural parameters have been adjusted to be optimal, the design process is complete.

The disadvantages of the conventional polarization conversion super-surface structure design are mainly reflected in the following aspects:

the selection of the structural type and the structural period of the polarization conversion super-surface and the material of each layer depends on a large amount of engineering experience of early experiments, and meanwhile, the rationality of the selection of each parameter is judged by the intervention of experts, so that a large amount of manpower and material resources are wasted.

After each adjustment of one structure parameter in the polarization conversion super-surface structure, a different simulation model must be newly established, which requires intervention of a professional, and consumes unnecessary time for establishing and fine-tuning the theoretical model.

The structure of the polarization conversion super surface is adjusted each time, not only a simulation model needs to be reestablished, but also a large amount of time needs to be consumed again to carry out simulation calculation and model solution so as to obtain the cross polarization transmittance T of the polarization conversion super surface_yxWith co-polarizationTransmittance T_yy。

The evaluation and adjustment of the polarization conversion super-surface simulation result both need manual intervention, and the structure data of the super-surface is not single dimension, so that the structure is difficult to be adjusted simultaneously and jointly by manpower. After one variable is adjusted for optimization once, the parameter is fixed to adjust other parameters, so that the performance of the polarization conversion super-surface obtained by adjustment is easy to fall into local optimization, and a global optimal structure is difficult to find.

Through the above analysis, the problems and defects of the prior art are as follows: in the traditional design process, the simulation time is too long, the intervention of professionals is needed, and the manual adjustment efficiency is low.

The difficulty in solving the above problems and defects is:

the invention can use the reinforcement learning intelligence to automatically select the structural attribute and the material attribute of the polarization conversion super surface, and solves the difficulty depending on experts. And further, the quality of one-time selection does not need to be judged every time of adjustment or selection.

The transmittance property can be directly obtained from the structural property of the polarization conversion super-surface depending on the deep neural network, and the defect that the FDTD needs to be modeled and simulated again each time a new structure needs to be simulated can be overcome.

By adopting reinforcement learning and deep neural network for optimization, the defect that the variables are optimized one by one in the traditional optimization method are trapped in local parts can be solved, meanwhile, the structure or material parameters do not need to be manually adjusted, the adjustment time is saved, and the final result can reach global optimization.

The significance of solving the problems and the defects is as follows:

compared with the traditional optimization process, the structure or material attribute parameters of the polarization conversion super-surface are automatically optimized by using reinforcement learning, so that a large amount of time for experts to judge can be saved, meanwhile, the reinforcement learning algorithm has a uniform evaluation standard for the optimization result, and the method has complete objectivity by canceling human subjectivity.

The deep neural network can directly obtain the transmissivity property from the structural property of the polarization conversion super surface, and further does not need to use an expensive server to simulate the device, so that a large amount of computing resources and computing time can be saved, and a faster way is provided for rapidly verifying the performance of the device.

The method has the advantages that the parameters of each structural material of the device are optimized by adopting reinforcement learning and a deep neural network, the purpose of global optimization can be achieved, the problem of local optimization is avoided, and the polarization efficiency of the polarization conversion super-surface is superior to that obtained by a traditional optimization method.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an asymmetric light polarization device structure based on asynchronous reinforcement learning and a design method thereof. The invention relates to a design method for optimally designing the structural parameters of an asymmetric light polarization conversion device to maximize the transmittance of the asymmetric light polarization conversion device, and a super-surface device structure which meets the expected efficiency can be conveniently and efficiently designed.

The invention is realized in such a way that an asymmetric light polarization device structure design method based on asynchronous reinforcement learning comprises the following steps:

firstly, preprocessing a simulation data set;

step two, building and initializing a transmissivity prediction network;

step three, training a transmissivity prediction network;

and step four, optimizing the structure of the asymmetric polarization conversion device by using an asynchronous reinforcement learning algorithm.

Further, in the step one, the specific process of the simulation data set preprocessing is as follows:

and for the data set preprocessing part, normalizing the obtained device structure data to be between 0 and 1, and simultaneously carrying out 3:1 down-sampling on the obtained 83-dimensional transmission spectrum data to be 27-dimensional data.

Further, in the second step, the concrete process of building and optimizing the transmittance prediction network is as follows:

a transmissivity prediction network of a deep neural network based on a residual structure receives the structural parameters of the asymmetric light polarization conversion device as input data to predict corresponding transmissivity values in a very short time;

the basic units adopted in all the modules of the transmissivity prediction network are a full connection layer and a Relu activation layer; and then carrying out batch normalization, and simultaneously adopting a residual structure as one path of error back propagation.

Further, the basic structure of the transmittance prediction network is divided into three parts: series input SIN, transmission T_yyPredicted output PYN, transmittance T_yxThe prediction output of (3) PXN;

inputting the structural parameters of the asymmetric optical polarization conversion device and outputting the transmissivity property T of the asymmetric optical polarization conversion device_yyAnd T_yxAmong SIN, four full-link layers density are used first, and all full-link layers are activated using the Relu function (f (x) ═ max (0, x)), and then the batch normalization layer is used; at a transmittance of T_yyPredicted output PYN and transmittance T_yxIn the prediction output PXN, the full connection layer FC and the activation function Relu are formed in the same way, a residual structure is added thereto, and finally, a leak Relu function is used:

activating; the use of a residual structure facilitates the back propagation of errors, avoiding gradient extinction and gradient explosion.

Further, the serial input part SIN connects 4 full connection layers after receiving the structure of the asymmetric optical polarization conversion device as input data, ends with batch normalization, extracts the characteristics of the structure data, uses a larger training rate parameter, and uses more batches of training data at a time during training; the extracted feature data is used as the input of the PYN part and the PXN part after the batch normalization layer;

PYN is composed of 6 basic units and output parts, each basic unit comprises a full connection layer Dense and a Relu activation layer, the number of neurons of a hidden layer contained in the basic units is 200,500, 200 and 28, and the Relu function is uniformly used as an activation layer function; finally, the output part is firstly a full connection layer of 28 neurons and then LeakyRelu activation output;

the PXN consists of 5 basic units and an output part, wherein each basic unit comprises a full connection layer Dense and a Relu activation layer, the number of neurons of a hidden layer contained in the basic units is 200,500, 200 and 28, and Relu functions are uniformly used by activation layer functions; the final output part is a full connection layer of 28 neurons first, followed by a LeakyRelu activation output.

Further, in the third step, the concrete process of the transmittance prediction network training is as follows:

for the training of the network, 80% of the simulation data set which is obtained is used as a training set, the rest 20% is used as a test set, and all data are subjected to packet training and testing in turn in a 5-fold cross validation mode.

Further, in the fourth step, the asynchronous reinforcement learning algorithm optimizes the structure of the asymmetric polarization conversion device, specifically:

the structure of the asymmetric light polarization conversion device is subjected to self-adaptive optimization design by adopting an asynchronous reinforcement learning algorithm, the asynchronous reinforcement learning algorithm takes a transmissivity prediction network as a proxy function, the structure of the asymmetric light polarization conversion device as an independent variable, and the transmissivity T is obtained_yy,T_yxThe average value of the light polarization conversion device is used as a target value to be optimized, the structure of the asymmetric light polarization conversion device is continuously adjusted through the maximum optimization target value, and an optimal adjustment strategy is continuously explored in the iteration process; and until the asynchronous reinforcement learning algorithm converges, the structure of the asymmetric light polarization conversion device is optimal.

Further, the asynchronous reinforcement learning algorithm specifically includes:

initializing maximum iteration times and boundary conditions, and adding a limiting condition; randomly initializing the structure of the asymmetric optical polarization device; during initialization, enabling the structure parameters to be in a preset range;

obtaining the transmittance T from the structure of the asymmetric light polarization device through a transmittance prediction network in an asynchronous reinforcement learning algorithm_yyAnd T_yxThen by calculation

Further calculating a return value in the asynchronous reinforcement learning algorithm;

the specific calculation method is that when the structural value of the asymmetric light polarization device exceeds the limit, the return value r is-0.5; when T is_yyAnd T_yxIs less than 0.3, when r is exp (max (T)_yy)+max(T_yx) -1) -1, when T_yyAnd T_yxR is 0 when the maximum values of Tyy are all between 0.3 and 0.5, while T is between 0.3 and 0.5_yx0.5 or more or T_yxIs between 0.3 and 0.5 and T_yyWhen the content of the carbon dioxide is not less than 0.5,

when T is_yyAnd T_yxR is exp (max (T) when the maximum value of (d) is greater than 0.5_yy)+max(T_yx)-1)。

Further, the adaptive termination condition of the asynchronous reinforcement learning algorithm is designed as follows:

after finding the optimal value in the iteration, comparing the optimal value with the optimal value found in the previous iteration, and if the optimal value found in the iteration is more optimal, continuing the iteration;

if not more optimal than before, the number of iterations is checked and if the number of iterations is greater than 2500 and no more optimal target value is found in the last 300 iterations, the optimization is ended.

Another object of the present invention is to provide an asymmetric light polarization device structure using the asymmetric light polarization device structure design method based on the asynchronous reinforcement learning, the asymmetric light polarization device structure is provided with a double-layered anisotropic super surface, and the double-layered anisotropic super surface includes: a pair of metal resonance rods and a sub-wavelength metal grating, and a dielectric layer is filled between the two layers of structures.

By combining all the technical schemes, the invention has the advantages and positive effects that:

a residual error structure is used in the construction process of the transmissivity prediction network, so that the problem of disappearance of the gradient reverse propagation of the network can be solved, and the problem of network degradation is solved. And further, the training speed of the transmissivity prediction network is accelerated. Meanwhile, as the transmissivity is all positive, the activation functions output by the network all adopt LeakyRelu, and a batch normalization layer is used, so that the training and convergence speed of the network is accelerated, the gradient explosion is controlled to prevent the gradient from disappearing, and overfitting during network training is prevented.

(3) From the effect of claim 3.

Dividing the network into three parts of SIN, PYN and PXN is beneficial to more accurately predicting the transmissivity T_yyAnd T_yx. The main function of SIN is to extract information about structural parameters and to transfer the extracted information to the next layer. Due to the transmissivity T_yyAnd T_yxThe method is relatively independent, so that the PYN and PXN two output prediction networks are used for prediction separately, parameters of the network needing to be trained can be reduced, and the later training speed is accelerated.

(4) Technical effect or experimental effect of comparison.

By using the methods set forth in claim 1, claim 2, and claim 3, the transmittance prediction network can be made to be T-specific_yyAnd T_yxThe prediction accuracy reaches 96.6 percent and 95.5 percent. And the transmittance property prediction of the asymmetric light polarization device is completed within millisecond level. And the optimization process can be completed in much less time than that consumed by conventional optimization methods.

The invention provides a design method of a residual structure-based transmissivity prediction network combined with an asynchronous reinforcement learning algorithm, which utilizes the characteristic that a deep neural network can approach a nonlinear function at any precision, can accurately predict the transmissivity attribute from structural data through the transmissivity prediction network, and designs the structure of an asymmetric light polarization conversion device by reverse optimization of the asynchronous reinforcement learning algorithm with high efficiency and time saving. The transmittance prediction network and the asynchronous reinforcement learning algorithm of the deep neural network based on the residual structure enable the transmittance prediction network to be effectively trained and the structure of the asymmetric light polarization conversion device to be optimally designed by utilizing the asynchronous reinforcement learning algorithm through effectively down-sampling data and reasonably dividing, so that the design efficiency is improved, and the efficiency of the final asymmetric light polarization conversion device is superior to that of the conventional design method.

The invention also has the following advantages:

the polarization conversion super surface provided by the invention is composed of a pair of metal resonance rods and a sub-wavelength metal grating, can realize asymmetric polarization conversion of a blue light wave band, concentrates electromagnetic wave energy distributed in two orthogonal linear polarization states on one polarization state, and provides a feasible technical realization way for a plurality of low-loss photoelectric applications.

And secondly, the transmittance is predicted by using the transmittance prediction network instead of a traditional numerical simulation tool, so that a large amount of simulation time and calculation resources can be saved. Taking 12500 sets of data of the asymmetric optical polarization conversion device as an example, the transmittance prediction network only needs 0.59 second, and the traditional FDTD simulation needs 762514 seconds, so that the former can save about 130 times of time. And the latter needs to complete calculation simulation on a high-performance server or a computer cluster, while the design scheme of the invention can complete the design process on a common household computer.

The invention provides an asynchronous reinforcement learning algorithm, and the structure of the asymmetric light polarization conversion device can be designed in a self-adaptive reverse optimization mode by combining the transmissivity prediction network, so that the efficiency of the asymmetric light polarization conversion device reaches the expected efficiency. The asynchronous reinforcement learning algorithm is used for replacing manual adjustment, adjustment time can be saved, meanwhile, joint adjustment can be carried out to find the optimal discrete structure of the asymmetric light polarization conversion device, the defect that the transmissivity of the asymmetric light polarization conversion device is easy to fall into local optimization due to the structure designed in the prior art is solved, and the efficiency of the asymmetric light polarization conversion device is enabled to achieve global optimization. Taking the design of the asymmetric light polarization conversion device structure of the four-dimensional adjustable structural data as an example, the optimal average transmittance of the asymmetric polarization conversion device designed by adopting the deep neural network based on the residual structure and the asynchronous reinforcement learning algorithm is 21% higher than that of the asymmetric polarization conversion device designed by combining the traditional manual adjustment and the FDTD simulation.

The structural design method of the asymmetric light polarization conversion device has strong universality, can be used for structural optimization of asymmetric polarization devices with different structural types aiming at different materials, and can quickly find the optimal structure without the intervention of professionals in the design process.

Drawings

Fig. 1 is a flowchart of a method for designing an asymmetric optical polarization device structure based on asynchronous reinforcement learning according to an embodiment of the present invention.

Fig. 2 is a flow chart of the structural design of the asymmetric optical polarization conversion device according to the embodiment of the present invention.

Fig. 3 is a schematic diagram of a transmittance prediction network according to an embodiment of the present invention.

Fig. 4 is a flowchart of designing a structure of an asymmetric optical polarization conversion device by using an asynchronous reinforcement learning algorithm in combination with a transmittance prediction network according to an embodiment of the present invention.

Fig. 5 is a flowchart of an asynchronous reinforcement learning algorithm according to an embodiment of the present invention.

Fig. 6 is a flow chart of a conventional asymmetric optical polarization conversion device design provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Aiming at the problems in the prior art, the invention provides an asymmetric light polarization device structure based on asynchronous reinforcement learning and a design method thereof, and the invention is described in detail below with reference to the accompanying drawings.

Those skilled in the art can also implement the method of designing the asymmetric light polarization device structure based on the asynchronous reinforcement learning, and the method of designing the asymmetric light polarization device structure based on the asynchronous reinforcement learning provided by the present invention in fig. 1 is only one specific example.

As shown in fig. 1, an asymmetric optical polarization device structure design method based on asynchronous reinforcement learning according to an embodiment of the present invention

S101: preprocessing a simulation data set;

s102: building and initializing a transmissivity prediction network;

s103: training a transmittance prediction network;

s104: and optimizing the structure of the asymmetric polarization conversion device by using an asynchronous reinforcement learning algorithm.

In S101 provided by the embodiment of the present invention, a specific process of preprocessing a simulation data set is as follows:

and for the data set preprocessing part, normalizing the obtained device structure data to be between 0 and 1, and simultaneously carrying out 3:1 down-sampling on the obtained 83-dimensional transmission spectrum data to be 27-dimensional data. The normalization process is as follows:

wherein X_normFor the result after normalization, X_minAnd X_maxRepresenting the minimum and maximum values of the same dimension in the same batch of structure or material property data. After normalization, T is_yyAnd T_yxThe transmission spectrum data is down-sampled, for example, by indexing 0,1,2 …,81, 82, and the data in the spectral lines with data indices of 0,3,6 …,81 are retained after down-sampling.

In S102 provided by the embodiment of the present invention, the concrete process of building and optimizing the transmittance prediction network is as follows:

The basic structure of the transmittance prediction network is divided into three parts: series input SIN, transmission T_yyPrediction of (2)Output PYN, transmittance T_yxThe prediction output PXN.

Inputting the structural parameters of the asymmetric optical polarization conversion device and outputting the transmissivity property T of the asymmetric optical polarization conversion device_yyAnd T_yxIn SIN, four fully-connected layers density are used first, and all fully-connected layers are activated using the Relu function (f (x) ═ max (0, x)), and then the batch normalization layer is used. At a transmittance of T_yyPredicted output PYN and transmittance T_yxIn the prediction output PXN, the full connection layer FC and the activation function Relu are formed in the same way, a residual structure is added thereto, and finally, a leak Relu function is used:

activation is performed. The use of a residual structure facilitates the back propagation of errors, avoiding gradient extinction and gradient explosion.

Wherein, the serial input part SIN is connected with 4 full connection layers after receiving the structure of the asymmetric optical polarization conversion device as input data, and ends with batch normalization, extracts the characteristics of the structure data, uses larger training speed parameters, and uses more batch training data at one time during training. The extracted feature data is used as input for both PYN and PXN portions after the batch normalization layer. PYN is composed of 6 basic cells and an output section. Each basic unit comprises a full connection layer Dense and a Relu activation layer, the number of hidden layer neurons contained in the basic units is 200,500, 200 and 28, and the Relu function is uniformly used by an activation layer function. The final output part is a full connection layer of 28 neurons first, followed by a LeakyRelu activation output. PXN consists of 5 basic cells and an output section. Each basic unit comprises a full connection layer Dense and a Relu activation layer, the number of hidden layer neurons contained in the basic units is 200,500, 200 and 28, and the Relu function is uniformly used by an activation layer function. The final output part is a full connection layer of 28 neurons first, followed by a LeakyRelu activation output. The initialization of all weight parameters in the network adopts Glorot uniform distribution, and the bias parameters adopt all-0 uniform distribution.

In S103 provided by the embodiment of the present invention, the concrete process of the transmittance prediction network training is as follows:

for the training of the network, 80% of the simulation data set which is obtained is used as a training set, the rest 20% is used as a test set, and all data are subjected to packet training and testing in turn in a 5-fold cross validation mode. For 10000 groups of data, 2000 groups of data are selected as a verification set each time, and other data are selected as a training set.

In S104 provided by the embodiment of the present invention, the asynchronous reinforcement learning algorithm optimizes the asymmetric polarization conversion device structure, specifically:

the structure of the asymmetric light polarization conversion device is subjected to self-adaptive optimization design by adopting an asynchronous reinforcement learning algorithm, the asynchronous reinforcement learning algorithm takes a transmissivity prediction network as a proxy function, the structure of the asymmetric light polarization conversion device as an independent variable, and the transmissivity T is obtained_yy,T_yxThe average value of the light polarization conversion device is used as a target value to be optimized, the structure of the asymmetric light polarization conversion device is continuously adjusted through the maximum optimization target value, and an optimal adjustment strategy is continuously explored in an iteration process. And until the asynchronous reinforcement learning algorithm converges, the structure of the asymmetric light polarization conversion device is optimal.

The asynchronous reinforcement learning algorithm specifically comprises the following steps:

initializing the maximum iteration number and boundary conditions, and adding a limiting condition. The structure of the asymmetric optical polarization device is randomly initialized. At initialization, it is necessary to ensure that the configuration parameters are within a preset range.

And further calculating a return value in the asynchronous reinforcement learning algorithm, wherein the specific calculation method is that when the structural value of the asymmetric light polarization device exceeds the limit, the return value r is-0.5. When T is_yyAnd T_yxMaximum ofWhen the value is less than 0.3, r is exp (max (T)_yy)+max(T_yx) -1) -1, when T_yyAnd T_yxR is 0 when T is between 0.3 and 0.5_yyIs between 0.3 and 0.5 and T_yx0.5 or greater or Tyx maximum between 0.3 and 0.5 and Tyy is 0.5 or greater,

when T is_yyAnd T_yxR is exp (max (T) when the maximum value of (d) is greater than 0.5_yy)+max(T_yx) -1). At max (T)_yy) 0.15 and max (T)_yx) For example, r ═ exp (0.15+0.4-1) -1 ═ 0.36. Max (T) again_yy) 0.55 and max (T)_yx) For example, 0.67, r ═ exp (0.55+0.67-1) ═ 1.25, where the value of the reward obtained by the asynchronous reinforcement learning is larger, so that the algorithm is favorable to move to a better regulation strategy.

The self-adaptive termination condition of the asynchronous reinforcement learning algorithm is designed as follows: after finding the optimal value in the current iteration of the algorithm, comparing the optimal value with the optimal value found in the previous iteration, and if the optimal value found in the current iteration is more optimal, continuing the iteration; if not more optimal than before, the number of iterations is checked and if the number of iterations is greater than 2500 and no more optimal target value is found in the last 300 iterations, the optimization is ended.

The technical solution of the present invention will be described in detail with reference to the following specific examples.

The asymmetric light polarization device structure design method based on asynchronous reinforcement learning provided by the embodiment of the invention comprises the following steps: the method comprises the steps of simulation data set preprocessing, transmissivity prediction network building and initializing, transmissivity prediction network training and asynchronous reinforcement learning algorithm optimization of the asymmetric polarization conversion device structure.

Simulation dataset preprocessing

Simulation data set preprocessing module, collectionTo T_yy，T_yxA total of 83 transmittance data at 430nm to 550nm for both directions is followed by a 3:1 down-sampling operation on the 83-dimensional data. And finally, 27 effective data are reserved in each group of data, so that further feature extraction and observation calculation can be conveniently carried out on the data subsequently. And meanwhile, dividing the data into 5 parts (the number of the last equal parts which cannot be divided is reduced or increased according to the situation) so as to facilitate the later verification of 5 folds.

Transmittance prediction network construction and optimization

The transmittance prediction network in (1) is shown in FIG. 3, wherein the BatchNormalization-batch normalization layer, the Dense-full junction layer, 144-144 hidden neurons, the Relu-Relu activation function, and the LeakyRelu-LeakyRelu activation function.

The transmissivity prediction network of the deep neural network based on the residual error structure can accept the structure parameters of the asymmetric optical polarization conversion device as input data to predict the corresponding transmissivity value in a very short time. The basic units adopted in all the modules of the transmissivity prediction network are a full connection layer and a Relu activation layer; and then carrying out batch normalization, and simultaneously adopting a residual structure as one path of error back propagation. Meanwhile, the basic structure of the network is divided into three parts as shown in fig. 3: series input SIN, transmission T_yyPredicted output PYN, transmittance T_yxThe prediction output PXN. FIG. 3 shows the structural parameters of the asymmetric optical polarization conversion device as input and the transmittance property T of the asymmetric optical polarization conversion device as output_yyAnd T_yxIn SIN, four fully-connected layers density are used first, and all fully-connected layers are activated using the Relu function (f (x) ═ max (0, x)), and then the batch normalization layer is used. At a transmittance of T_yyPredicted output PYN and transmittance T_yxIn the prediction output PXN, the full connection layer FC and the activation function Relu are formed in the same way, a residual structure is added thereto, and finally, a leak Relu function is used:

activation is performed. Using residual structure to favor errorsAvoiding gradient extinction and gradient explosion.

Wherein, the serial input part SIN connects 4 full connection layers after receiving the structure of the asymmetric optical polarization conversion device as input data, and ends with batch normalization, in order to better extract the feature of the structure data, can use larger training speed parameter, and can use more batch training data at one time during training. The extracted feature data is used as input for both PYN and PXN portions after the batch normalization layer. PYN is composed of 6 basic cells and an output section. Each basic unit comprises a full connection layer Dense and a Relu activation layer, the number of hidden layer neurons contained in the basic units is 200,500, 200 and 28, and the Relu function is uniformly used by an activation layer function. The final output part is a full connection layer of 28 neurons first, followed by a LeakyRelu activation output. PXN consists of 5 basic cells and an output section. Each basic unit comprises a full connection layer Dense and a Relu activation layer, the number of hidden layer neurons contained in the basic units is 200,500, 200 and 28, and the Relu function is uniformly used by an activation layer function. The final output part is a full connection layer of 28 neurons first, followed by a LeakyRelu activation output.

Transmittance prediction network training

Initialization of other parameters employs randomized initialization in the present invention. For the training of the network, 80% of the simulation data set which is obtained is used as a training set, the rest 20% is used as a test set, and all data are subjected to packet training and testing in turn in a 5-fold cross validation mode.

The transmissivity prediction network of the deep neural network based on the residual error structure can accurately predict the corresponding T by the structure of the asymmetric optical polarization conversion device after training_yy,T_yxTransmittance in two directions. The transmittance prediction network can be used for replacing a professional numerical simulation tool to simulate the transmittance of the asymmetric optical polarization conversion device.

Asynchronous reinforcement learning algorithm optimized asymmetric polarization conversion device structure

The embodiment of the invention adopts an asynchronous reinforcement learning algorithm pairAnd the structure of the asymmetric light polarization conversion device is subjected to self-adaptive optimization design. The process of designing the structure of the four-dimensional parameter adjustable asymmetric optical polarization conversion device by the asynchronous reinforcement learning algorithm is shown in fig. 4, the asynchronous reinforcement learning algorithm takes the transmittance prediction network as a proxy function, takes the structure of the asymmetric optical polarization conversion device as an independent variable, and simultaneously takes the transmittance T_yy,T_yxThe average value of the light polarization conversion device is used as a target value to be optimized, the structure of the asymmetric light polarization conversion device is continuously adjusted through the maximum optimization target value, and an optimal adjustment strategy is continuously explored in an iteration process. And until the asynchronous reinforcement learning algorithm converges, the structure of the asymmetric light polarization conversion device is optimal.

The detailed flowchart of the asynchronous reinforcement learning algorithm is shown in fig. 5, and after initializing the maximum iteration number and the boundary condition and adding the limit condition, the asynchronous reinforcement learning algorithm is executed. The structure of the asymmetric optical polarization device is randomly initialized. At initialization, it is necessary to ensure that the configuration parameters are within a preset range.

The transmittance T can be obtained from the structure of the asymmetric optical polarization device by a transmittance prediction network in an asynchronous reinforcement learning algorithm as shown in fig. 5_yyAnd T_yxThen by calculation

The report value in the asynchronous reinforcement learning algorithm can be further calculated by the specific calculation method that when the structural value of the asymmetric light polarization device exceeds the limit, the report value r is-0.5. When T is_yyAnd T_yxIs less than 0.3, when r ═ exp (max (T)) (T)_yy)+max(T_yx) -1) -1, when T_yyAnd T_yxR is 0 when T is between 0.3 and 0.5_yyIs between 0.3 and 0.5 and T_yx0.5 or more or T_yxIs between 0.3 and 0.5 and T_yyWhen the content of the carbon dioxide is not less than 0.5,

when T is_yyAnd T_yxMaximum value ofWhen both are greater than 0.5, r is exp (max (T)_yy)+max(T_yx) -1). At max (T)_yy) 0.15 and max (T)_yx) For example, r ═ exp (0.15+0.4-1) -1 ═ 0.36. Max (T) again_yy) 0.55 and max (T)_yx) For example, 0.67, r ═ exp (0.55+0.67-1) ═ 1.25, where the value of the reward obtained by the asynchronous reinforcement learning is larger, so that the algorithm is favorable to move to a better regulation strategy. Meanwhile, the self-adaptive termination condition of the asynchronous reinforcement learning algorithm designed by the invention is designed as follows: after finding the optimal value in the current iteration of the algorithm, comparing the optimal value with the optimal value found in the previous iteration, and if the optimal value found in the current iteration is more optimal, continuing the iteration; if not more optimal than before, the number of iterations is checked and if the number of iterations is greater than 2500 and no more optimal target value is found in the last 300 iterations, the optimization is ended.

Two-layer and two-layer anisotropic super surface

The novel device structure provided by the embodiment of the invention, namely the double-layer anisotropic super surface, comprises the following structural units: a pair of metal resonance rods and a sub-wavelength metal grating, and a dielectric layer is filled between the two layers of structures.

For the normally incident x-ray polarized light, the structure has strong polarization conversion and can effectively convert the x-ray polarized light into y-polarized light, and for the normally incident y-ray polarized light, only an extremely weak polarization conversion function exists, and most incident light can maintain the original polarization direction to pass through the metamaterial structure, so that asymmetric polarization conversion can be realized.

The effects of the present invention will be further described below with reference to specific experimental data.

The invention selects an asymmetric polarization structure as an example, which consists of a double-rod resonator (aluminum) and a tangent structure (aluminum) converter structure, wherein two layers of super-surface structures are separated by a layer of dielectric material. The structure to be optimized is the length and width of the double-rod, the thickness of the dielectric layer and the material refractive index of the dielectric layer. After the structure and material parameters are optimized by combining a reinforcement learning algorithm with a deep neural network, the optimal structure material attribute parameters can be obtained as follows: {218nm,53nm,348nm,1.314}, most preferablyThe average refractive index of the light-emitting diode can reach 60.5%, and compared with a traditional 50% reference line, the optimization method can be improved by 21%. Meanwhile, the whole optimization time can be controlled to be about 45 minutes, and compared with the traditional manual optimization, the time can be greatly reduced. Simultaneous transmittance prediction network for T_yyAnd T_yxThe prediction accuracy of the method reaches 96.6% and 95.5%, and the transmittance property prediction of the asymmetric light polarization device is completed in a millisecond level on a household computer without performing simulation on an expensive server, so that complicated FDTD simulation is replaced.

In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for designing a structure of an asymmetric light polarization device based on asynchronous reinforcement learning is characterized by comprising the following steps:

firstly, preprocessing a simulation data set of a device structure;

step two, building and initializing a transmissivity prediction network;

step three, training a transmissivity prediction network;

2. The asymmetric light polarization device structure design method based on asynchronous reinforcement learning as claimed in claim 1, wherein in the step one, the simulation data set preprocessing comprises:

3. The asymmetric light polarization device structure design method based on asynchronous reinforcement learning as claimed in claim 2, wherein in the second step, the concrete process of transmission prediction network construction and optimization is as follows:

4. The asymmetric optical polarization device structure design method based on asynchronous reinforcement learning as claimed in claim 3, wherein the basic structure of the transmittance prediction network is divided into three parts: series input SIN, transmission T_yyPredicted output PYN, transmittance T_yxThe prediction output of (3) PXN;

inputting the structural parameters of the asymmetric optical polarization conversion device and outputting the transmissivity property T of the asymmetric optical polarization conversion device_yyAnd T_yxAmong SIN, four full-link layers density are adopted first, and all full-link layers are activated using the Relu function (f (x) = max (0, x)), and then a batch normalization layer is used; at a transmittance of T_yyPredicted output PYN and transmittance T_yxThe prediction output PXN of (1) is also composed of a full connection layer FC and an activation function Relu, and a residual structure is added thereto, and finally, a LeakyRelu function is used:

5. The asymmetric optical polarization device structure design method based on asynchronous reinforcement learning as claimed in claim 4, wherein the serial input part SIN connects 4 full connection layers after accepting the structure of the asymmetric optical polarization conversion device as input data, and ends with batch normalization, extracts the feature of the structure data, uses larger training rate parameters, and uses more batch training data at a time during training; the extracted feature data is used as the input of the PYN part and the PXN part after the batch normalization layer;

6. The asymmetric light polarization device structure design method based on asynchronous reinforcement learning as claimed in claim 1, wherein in the third step, the concrete process of the transmittance prediction network training is:

7. The method for designing an asymmetric optical polarization device structure based on asynchronous reinforcement learning according to claim 1, wherein in the fourth step, the asynchronous reinforcement learning algorithm optimizes the asymmetric polarization conversion device structure, specifically:

the structure of the asymmetric light polarization conversion device is subjected to self-adaptive optimization design by adopting an asynchronous reinforcement learning algorithm, the asynchronous reinforcement learning algorithm takes a transmissivity prediction network as a proxy function, the structure of the asymmetric light polarization conversion device as an independent variable, and the transmissivity T is obtained_yy,T_yxThe average value of the light polarization conversion device is used as a target value to be optimized, the structure of the asymmetric light polarization conversion device is continuously adjusted through the maximum optimization target value, and an optimal adjustment strategy is continuously explored in the iteration process; straight barAnd finally, converging the asynchronous reinforcement learning algorithm, wherein the structure of the asymmetric light polarization conversion device is optimal.

8. The asymmetric light polarization device structure design method based on asynchronous reinforcement learning as claimed in claim 7, wherein the asynchronous reinforcement learning algorithm is specifically:

the specific calculation method is that when the structural value of the asymmetric light polarization device exceeds the limit, the return value r is-0.5; when T is_yyAnd T_yxIs less than 0.3, when r is exp (max (T)_yy)+max(T_yx) -1) -1, when T_yyAnd T_yxR is 0 when T is between 0.3 and 0.5_yyIs between 0.3 and 0.5 and T_yx0.5 or more or T_yxIs between 0.3 and 0.5 and T_yyWhen the content of the carbon dioxide is not less than 0.5,

9. The asymmetric light polarization device structure design method based on asynchronous reinforcement learning as claimed in claim 8, wherein the adaptive termination condition of the asynchronous reinforcement learning algorithm is designed as follows:

if the number of iterations is greater than 2500 and no more optimal target value is found in the last 300 iterations, the optimization is ended.

10. An asymmetric light polarization device structure using the asymmetric light polarization device structure design method based on asynchronous reinforcement learning according to any one of claims 1 to 9, wherein the asymmetric light polarization device structure is provided with a double-layer anisotropic super surface, and the double-layer anisotropic super surface comprises: a pair of metal resonance rods and a sub-wavelength metal grating, and a dielectric layer is filled between the two layers of structures.