CN110826609B

CN110826609B - Double-current feature fusion image identification method based on reinforcement learning

Info

Publication number: CN110826609B
Application number: CN201911038698.5A
Authority: CN
Inventors: 冯镔; 唐哲; 王豪; 李亚婷; 朱多旺; 刘文予
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2023-03-24
Anticipated expiration: 2039-10-29
Also published as: CN110826609A

Abstract

The invention discloses a double-current feature fusion image identification method based on reinforcement learning. The two models are respectively a texture model and a shape model: the texture model is classified according to texture information of an object in the image, and the shape model is classified according to shape information of the object. Both models enable the network to find the most discriminative region in the whole image in a reinforcement learning mode, and then carry out classification according to the region. The method is simple and easy to implement, has strong popularization capability, finds the area which is easy to distinguish the image, has proper and effective distinguishing area, fully uses the texture and shape information in the image, and can effectively overcome the influence of insufficient utilization of the image information and small difference between the images.

Description

Double-flow feature fusion image identification method based on reinforcement learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a double-flow feature fusion image identification method based on reinforcement learning.

Background

The image recognition has many applications in people's daily life, such as intelligent security, biomedicine, e-commerce shopping, automatic driving, smart home and the like. The image recognition studies how to identify a category corresponding to a sample from a plurality of categories. There are many problems such as less difference between images, greater background effect, and so on.

The current image recognition method generally inputs the image directly to a convolutional neural network for feature extraction and then carries out classification. Although there are various operations after feature extraction, most of the extracted features are actually texture information of the image. Such operations all have a drawback in that the shape information cannot be fully utilized and information advantageous for recognizing the image cannot be completely extracted. In addition, in order to reduce the background interference, the current approach is to generate candidate frames, but the method generates a large number of candidate frames, has a long calculation time, has an unclear target, and cannot find a region really helpful for image classification.

Therefore, there is a need to design an image recognition method for dual-stream feature fusion, which can fuse texture information and shape information of an image and optimize computational efficiency.

Disclosure of Invention

The invention aims to provide a double-flow feature fusion image identification method based on reinforcement learning, which can effectively find the most effective areas respectively containing texture information and shape information. The influence of background and useless information is reduced, and the identification precision is effectively improved. The method comprises the following steps:

(1) Generating a shape data set:

inputting each image into an image conversion model, and outputting n pictures which are similar in shape and different in texture. n is a preset value, the learning effect is better when n is larger, but the time spent on training the model is longer, and generally, the value of n can be tried to be 5-10. The labels of the n converted images are all the same as the labels of the input images. The shape data set is a data set formed by pairing the original data set with a texture information-reduced data set and a shape information-increased data set. The purpose of generating this corresponding data set is to expect that the subsequent model to be trained can learn the shape information of the image.

(2) Training a texture basic model and a shape basic model:

(2.1) performing data enhancement on each image of the original dataset and the shape dataset separately: for one image, m rectangular frames are generated at random positions in the image, and m is a preset value. m may range from 4 to 7, too large of a m may result in too much data and too much time spent. The side length of the frame is generally larger than 1/2 of the side length of the image and is smaller than or equal to the side length of the image. The label of the cropped image is consistent with the label of the original image.

(2.2) training a basic model by using the cut image, wherein the texture basic model is trained by using an original data set, and the shape basic model is trained by using a shape data set; the texture base model and the shape base model have the same structure. An adaptive average pooling layer AdaAvgPool is newly added after the last block in the ResNet50 network, and the adaptive average pooling layer pools images to reduce the size of the feature map. Compressing the Feature map output by AdaAvgPool to one dimension to obtain Feature vector Feature, sending the features before AdaAvgPool to a classifier to obtain a classification prediction probability pred, wherein the classifier can select to use a full connection layer. The basic model outputs Feature and pred, and the function is to extract the features of the input image and predict the classification probability.

(3) Training a texture reinforcement learning model:

(3.1) reading image _global And a corresponding category label c. A rectangular box is initialized to the same size as the size of the image to be read.

(3.2) the position and size of the rectangular frame are changed several times in the whole process, if the size of the rectangular frame is equal to the size of the image, namely, the image _local ＝image _global Then jump to (3.3). If the rectangular frame is smaller than the image size, the image is cut according to the frame size, and then the image is up-sampled to the size same as the original image size to obtain the processed image _local Where upsampling may select bilinear interpolation.

(3.3) image _local And inputting the Feature and the classification prediction probability pred into a texture base model.

(3.4) inputting the Feature into a texture enhancement learning model, wherein the texture enhancement learning model consists of a plurality of fully connected layers, a ReLU function is used as an activation function, and the function is defined as

The last layer is the number of feature dimensions converted into the number of actions in the action space, and the output means the Q value of each action in the action space. The action space is a set formed by combining a series of actions, the purpose of the actions is to change the position or size of the rectangular frame, and the actions can be selected to translate, enlarge, reduce and the like in all directions; the Q value of an action means that a rectangular box changes to another position after the action is taken at one position, and this process gives a quantitative assessment of the impact that our goal (i.e., classification) has. If the Q value is larger, the box after the position is changed can lead the classification effect to be better, and conversely, the Q value is lower, the box after the position is changed can lead the classification effect to be worse.

(3.5) in order to obtain the determined action, the step is divided into two strategies of exploration and development, and one strategy is selected by the two strategies. An explore _ rate ∈ (0, 1) is preset, the explore _ rate represents the probability of selecting exploration, and the corresponding 1-explore _ rate represents the probability of selecting development. Exploration is to randomly select one action from all actions; in the development, the operation corresponding to the maximum Q value obtained in (3.4) is selected as the selected operation. And determining action after selecting one of exploration and development, and changing the size or the position of the frame according to the selected action and the change coefficient alpha to obtain a new rectangular frame box'. Wherein the coefficient of variation α ∈ (0, 1), 0.1 may be selected. The meaning of expression is the ratio of each change. For example, action is selected to be enlarged to the right, which means that box' is obtained by enlarging box to the right by 1.1 times; action is selected to be left-hand smaller, indicating that box' is obtained by left-hand reduction of box by 0.9 times the original.

And (3.6) obtaining another Feature and prediction probability Feature 'pred' according to the process of extracting the Feature in (3.2) and (3.3) by using the new rectangular box obtained in (3.5).

(3.8) based on pred, (pred' in 3.6), and c in (3.1), the following judgment can be made: reward = -1 if the prediction score of pred on category c is higher than the prediction score of pred 'on category c, corresponding reward =1 if the prediction score of pred on category c is lower than the prediction score of pred' on category c.

(3.9) updating the Q value Q of the selected action in (3.5) according to the reward obtained in (3.8) _target . The update mode is Q _target = reward + γ max (Q (s, a)), where Q (s, a) represents the Q value after an action is taken in the s state, i.e. Feature. Where γ is a learning rate of each Q value update, γ is a preset value, and γ =0.9 may be selected.

(3.10) characterization Feature and Q obtained in (3.9) _target And storing the experience into an experience pool. The experience pool is a corresponding measure for reducing the correlation between samples, and paired features and Q are firstly adopted _target There is a pool of experiences. And after the experience pool is stored to a certain amount, randomly selecting data from the experience pool to train the model.

(3.11) regarding the new rectangular frame as the current frame box = box ', regarding the new Feature as the current Feature = Feature ', and regarding the new classification prediction probability as the current classification prediction probability pred = pred '.

And (3.12) repeating the processes from (3.4) to (3.11) to a certain number of times. This is a process of adjusting the size and position of the frame all the time, and can be dynamically adjusted according to the change rate set in (3.5), and the change rate can be changed several times when set to be large, and can be changed several times when set to be small.

(3.13) when a certain number of samples are loaded in the experience pool, randomly selecting a pair of Feature and Q from the experience pool _target The corresponding data of (D) is recorded as Feature _s And Target, inputting the characteristics to the training texture reinforcement learning model, outputting the Q value of the obtained action, and recording the Q value as Q _eval Target and Q _eval The difference between them is taken as loss and propagated backwards, updating the parameters. loss can be selected as a mean square error MSE function, and the expression is loss = (Target-Q) _eval ) ² 。

(4) Training a shape reinforcement learning model:

and (4.1) training the shape reinforcement learning model by using the shape data set according to the step in (3), wherein the training process of the shape reinforcement learning model is the same as that of the texture reinforcement learning process, and the structure of the shape reinforcement learning model is the same as that of the texture reinforcement learning model.

(5) The double-flow prediction and fusion of the test image to be detected by utilizing the two trained models comprises the following substeps:

(5.1) reading image to be detected _global . A rectangular box is initialized to the same size as the size of the image to be read.

And (5.2) carrying out the Feature extraction in the steps (3.2) and (3.3) on the image to obtain the Feature and the classification prediction probability pred of the corresponding position of the frame.

And (5.3) inputting the characteristics obtained in the step (5.2) into a texture reinforcement learning model, outputting the score Q values of all actions, selecting the action with the maximum Q value according to a developed strategy, and changing the size and the position of the frame according to the selected action.

(5.4) repeating the process of (5.2) and (5.3) to a number of times, the number of times of repetition being related to the rate of change of the rectangular frame similarly to the repeated process of (3.12). The last change gives the feature F _texture 。

(5.5) testing the texture reinforcement learning model in a similar process from (5.1) to (5.4) to obtain F _shape

(5.6) two different characteristics F _texture And F _shape Input into the fusion model and output as the final prediction probability p _mix Wherein the fusion model is a trainable model aimed at transforming F _texture And F _shape The fusion is followed by classification. For example, the fusion model may choose to stitch two features together and then output the classification probabilities of all classes using the full-connectivity layer.

(5.7)p _mix The class with the highest probability of correspondence is the predicted class.

Through the technical scheme, compared with the prior art, the invention has the following technical effects:

(1) Simple structure is effective: compared with the prior art that the texture information is extracted by using a convolutional neural network, the method respectively extracts the texture information and the shape information by designing a double-flow structure. The distinguishing areas of the texture and the shape are respectively searched by using a reinforcement learning mode, so that the structure is clear, simple and effective;

(2) The accuracy is high: the method is different from a method for generating the propofol, an optimal region in the image is searched in a reinforcement learning mode, a region with better performance is not selected in the propofol, model learning cost is reduced, the process of searching a distinguishing region is better met, the method is different from the method of only utilizing texture information in the prior art, information contained in the image can be fully mined by utilizing texture and shape information, and the accuracy is higher;

(3) The robustness is strong: the texture reinforcement learning model disclosed by the invention focuses more on texture information, the shape reinforcement learning model focuses more on shape information, and by respectively focusing on the two kinds of information, a network can adapt to different images, and the performance is more robust.

Drawings

FIG. 1 is a flow chart of a double-flow feature fusion image recognition method based on reinforcement learning according to the present invention;

FIG. 2 is a schematic diagram of a reinforcement learning model implementation framework of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The technical terms of the present invention are explained and explained first:

infood dataset: the database is a data set used in a game held in Kaggle and contains 251 fine-grained (pre-made) food categories, totaling 120216 images collected from the web as a training set, 12170 images as a verification set, and 28399 images as a test set, and providing manual verification tags, where each image contains a single category of food.

ResNet-50: a neural network for classification mainly comprises 50 convolutional layers, a pooling layer and a short connecting layer. The convolution layer is used for extracting picture characteristics; the pooling layer has the functions of reducing the dimensionality of the feature vector output by the convolutional layer and reducing overfitting; the shortcut connection layer is used for transferring gradient and solving the problems of extinction and explosion gradient. The network parameters can be updated through a reverse conduction algorithm;

an image conversion model: the style of the image can be transformed but the content is not changed using the structure of the generic adaptive Network, including the generator and the discriminator.

As shown in fig. 1, the present invention provides a double-flow feature fusion image recognition method based on reinforcement learning, which includes the following steps:

(1) Generating a shape data set:

(2) Training a texture basic model and a shape basic model:

(2.1) respectively carrying out data enhancement on each image of the original data set and the shape data set, wherein the specific process is as follows: for one image, m rectangular frames are generated at random positions in the image, and m is a preset value. m may range from 4 to 7, too large of a m may result in too much data and too much time spent. The side length of the frame is generally larger than 1/2 of the side length of the image and is smaller than or equal to the side length of the image. The label of the cropped image is consistent with the label of the original image.

(2.2) training a basic model by using the cut image, training a texture basic model by using an original data set, and training a shape basic model by using a shape data set; the texture base model and the shape base model have the same structure. An adaptive average pooling layer AdaAvgPool is newly added after the last block in the ResNet50 network, and the adaptive average pooling layer pools images to reduce the size of the feature map. Compressing the Feature map output by AdaAvgPool to one dimension to obtain Feature vector Feature, sending the features before AdaAvgPool to a classifier to obtain a classification prediction probability pred, wherein the classifier can select to use a full connection layer. The basic model outputs Feature and pred, and the function is to extract the features of the input image and predict the classification probability.

(3) Training a texture reinforcement learning model:

(3.2) the position and size of the rectangular frame are changed several times in the whole process, if the size of the rectangular frame is equal to the size of the image, namely, the image _local ＝image _global Then jump to (3.3). If the rectangular frame is smaller than the image size, the image is cut according to the frame size, and then the image is up-sampled to the size same as the original image size to obtain the processed image _local Where upsampling may select bilinear interpolation. The purpose of this step is: if the rectangular frame is the initialized rectangular frame, no operation is performed, and the step (3.3) is entered, if the rectangular frame is smaller than the initialized frame, the cropping operation is performed, and the image is up-sampled to the size same as the original input image, so that the images input to the neural network are all the same size.

(3.3) image is shown in FIG. 2 _local Input into the texture base model yields two outputs: feature and class prediction probability pred.

(3.4) directly inputting the Feature into the reinforcement learning model, and outputting actions through various operations of convolution, pooling and the like of forward propagationQ values for each corresponding action in space. Specifically, the Feature is input into a selection action submodule of a texture reinforcement learning model, an agent network is arranged in the selection action submodule, the model is composed of a plurality of full connection layers, a ReLU function is used as an activation function, and the function definition formula is

And (3.5) in the action selection submodule, in order to obtain a determined action, the step is divided into two strategies of exploration and development, and one strategy is selected by the two strategies. An explore _ rate ∈ (0, 1) is preset, the explore _ rate represents the probability of selecting exploration, and the corresponding 1-explore _ rate represents the probability of selecting development. Exploration is to randomly select one action from all actions; in the development, the operation corresponding to the maximum Q value obtained in (3.4) is selected as the selected operation. And determining action after selecting one of exploration and development, and changing the size or the position of the frame according to the selected action and the change coefficient alpha to obtain a new rectangular frame box'. Wherein the coefficient of variation α ∈ (0, 1), 0.1 may be selected. The meaning of expression is the ratio of each change. For example, action is selected to be enlarged to the right, which means that box' is obtained by enlarging box to the right by 1.1 times; action is selected to be left-hand smaller, indicating that box' is obtained by left-hand reduction of box by 0.9 times the original.

And (3.6) obtaining another Feature and prediction probability Feature 'pred' according to the process of extracting the Feature in (3.2) and (3.3) by using the new rectangular box obtained in (3.5). The purpose of this step is that after the position of the rectangular frame is changed, the same operation of extracting features can be adopted, the region corresponding to the frame is cut out, and then the up-sampling operation is performed. And inputting the feature and the prediction probability into a basic model.

(3.8) based on pred, (pred' in 3.6), and c in (3.1), the following judgment can be made: reward = -1 if the prediction score of pred on category c is higher than the prediction score of pred 'on category c, corresponding reward =1 if the prediction score of pred on category c is lower than the prediction score of pred' on category c. Specifically, the prediction probability pred is a prediction score that is common to all categories, and this score may indicate the probability of predicting as one category. So a higher probability for c indicates that the model performs better on the sample prediction pairs. It can be determined whether to award or punish to the model based on both the prediction probability and the label. reward = sign (pred' [ c ] -pred [ c ]).

(3.9) in the update Q value submodule, updating the Q value Q of the action selected in (3.5) according to the reward obtained in (3.8) _target . The update mode is Q _target = reward + γ max (Q (s, a)), where Q (s, a) represents the Q value after an action is taken in the s state, i.e. Feature. Where γ is a learning rate of each Q value update, γ is a preset value, and γ =0.9 may be selected.

(3.11) taking the new rectangular frame as the current frame box = box ', taking the new Feature as the current Feature = Feature ', and taking the new classification prediction probability as the current classification prediction probability pred = pred '.

(3.13) in the Q value evaluation submodule, after a certain number of samples are filled in the experience pool, the Feature and Q in pairs are randomly selected from the experience pool _target The corresponding data of (D) is recorded as Feature _s And Target, inputting the characteristics to the training texture reinforcement learning model, outputting the Q value of the obtained action, and recording the Q value as Q _eval Target and Q _eval The difference between them is taken as loss and propagated backwards, updating the parameters. loss can be selected as a mean square error MSE function, and the expression is loss = (Target-Q) _eval ) ² 。

(4) Training a shape reinforcement learning model:

and (4.1) training the shape reinforcement learning model by using the shape data set according to the step in (3), wherein the training process of the shape reinforcement learning model is the same as that of the texture reinforcement learning process, and the structure of the shape reinforcement learning model is the same as that of the texture reinforcement learning model. And (3) and (4) dividing the information in the data set into texture and shape, and respectively learning two different information of the data set by using two streams of ideas. The model structure of both streams is the same, and the data sets used for training are pairs of data sets that have been pre-processed. The training process is the same.

(5.6) two different characteristics F _texture And F _shape Input into the fusion model and output as the final prediction probability p _mix Wherein the fusion model is a trainable model aimed at transforming F _texture And F _shape The fusion is followed by classification. For example, the fusion model may choose to stitch two features together and then output the classification probabilities of all classes using the full-connectivity layer. Feature fusion can be attempted in various ways, and even fusion can be performed directly on the prediction scores, but fusion performed on the scores does not perform as well as fusion on the features. The fusion model can be obtained by inputting and using two characteristics according to the label corresponding to the image by training the fusion model.

The effectiveness of the invention is proved by the following experimental examples, and the experimental result proves that the invention can improve the identification accuracy of image identification.

The invention compares the iFood dataset with the basic network we use, and table 1 shows the accuracy of the method of the invention on the dataset, where backhaul represents the basic model Resnet50 we use, and DQN represents the reinforcement learning model we use. The larger the numerical value of the result is, the higher the accuracy of image recognition is, and the improvement of the method is very obvious as can be seen from the table.

TABLE 1 precision on iFood dataset

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A double-flow feature fusion image identification method based on reinforcement learning is characterized by comprising the following steps:

(1) Generating a shape data set:

inputting each image into an image conversion model, outputting n images which are similar in shape and different in texture, wherein the labels of the n converted images are the same as those of the input image, and n is a preset value;

(2) Training a texture basic model and a shape basic model:

(2.1) performing data enhancement on each image of the original dataset and the shape dataset separately: for an image, generating m rectangular frames at random positions in the image, wherein the side length of each frame is greater than 1/2 of the side length of the image and less than or equal to the side length of the image, the label of the cut image is consistent with that of the original image, and m is a preset value;

(2.2) training a base model using the cropped images, wherein the texture base model is trained using the original data set and the shape base model is trained using the shape data set; the texture basic model and the shape basic model have the same structure, and are both a self-adaptive average pooling layer AdaAvgPool is newly added after the last block in the ResNet50 network, the self-adaptive average pooling layer is used for pooling images to reduce the size of a feature map, the basic model is trained by using the images and labels, and the basic model is used for extracting features and predicting the images subsequently;

(3) Training a texture reinforcement learning model:

(3.1) reading image _global Initializing a rectangular box with the same size as the size of the read image, with the corresponding category label c;

(3.2) if the size of the rectangular frame is equal to the size of the image, jump to: (3.3 ); if the rectangular frame is smaller than the image size, the image is cut according to the size of the frame, then the image is up-sampled to the size same as the size of the original image, and the processed image is obtained _local ；

(3.3) image _local Inputting the Feature and the classification prediction probability pred into a texture basic model;

(3.4) inputting the Feature into a texture reinforcement learning model, wherein the output of the texture reinforcement learning model is the Q value of each action in the action space;

(3.5) obtaining the determined action by exploring and developing two strategies, wherein the exploring and developing selects one of the two strategies, and the exploring randomly selects one action from all the actions; the development is to select the action corresponding to the maximum value of the Q value obtained in the step (3.4) as a selected action, determine action after selecting one of exploration and development, and change the size or the position of the frame according to the selected action and the change coefficient alpha to obtain a new rectangular frame box';

(3.6) obtaining another Feature and prediction probability Feature 'pred' according to the process of extracting the Feature in (3.2) and (3.3) by using the new rectangular box obtained in (3.5);

(3.8) according to pred, (pred' in 3.6) and c in (3.1), the following judgment is made: reward = -1 if the prediction score of pred on category c is higher than the prediction score of pred 'on category c, reward =1 if the prediction score of pred on category c is lower than the prediction score of pred' on category c;

(3.9) updating the Q value Q of the selected action in (3.5) according to the reward obtained in (3.8) _target The update mode is Q _target = reward + γ max (Q (s, a)), where Q (s, a) represents a Q value in an s state, i.e., after the Feature takes an action, γ is a learning rate per update of the Q value, and γ is a preset value;

(3.10) Feature and Q obtained in (3.9) _target Storing the experience into an experience pool;

(3.11) taking the new rectangular frame as the current frame box = box ', taking the new Feature as the current Feature = Feature', and classifying the new FeatureThe prediction probability is used as the current classification prediction probability pred = pred', the process from (3.4) to (3.11) is repeated for a preset number of times, and after a preset number of samples are loaded in the experience pool, pairs of Feature and Q are randomly selected from the experience pool _target The corresponding data of (D) is recorded as Feature _s And Target, inputting the characteristics into the texture reinforcement learning model, outputting the Q value of the obtained action, and recording the Q value as Q _eval Target and Q _eval The difference between the two is used as loss, and the propagation is carried out reversely, and the parameters are updated;

(4) Training a shape reinforcement learning model:

(4.1) training a shape reinforcement learning model by using the shape data set according to the step in (3), wherein the training process of the shape reinforcement learning model is the same as that of the texture reinforcement learning process, and the structures of the shape reinforcement learning model and the texture reinforcement learning model are the same;

(5) The double-flow prediction and fusion of the test image to be detected by utilizing the two trained reinforced models comprises the following substeps:

(5.1) reading an image to be detected, and initializing a rectangular box with the size same as that of the image to be detected;

(5.2) carrying out the Feature extraction of the steps (3.2) and (3.3) on the image to be detected to obtain the Feature and the classification prediction probability pred of the corresponding position of the frame;

(5.3) inputting the characteristics obtained in the step (5.2) into a texture reinforcement learning model, outputting score Q values of all actions, selecting the action with the maximum Q value according to a developed strategy, and changing the size and the position of a frame according to the selected action;

(5.4) repeating the process of (5.2) and (5.3) to a number of times, the number of times of repetition being related to the rate of change of the rectangular frame similarly to the repeated process of (3.12); obtaining the characteristic F after the last change _texture ；

(5.5) testing the texture reinforcement learning model in a similar process from (5.1) to (5.4) to obtain F _shape ；

(5.6) two different characteristics F _texture And F _shape The final prediction probability p is output after being input into the fusion model _mix Wherein the fusion model is oneThe model can be trained with the aim of combining F _texture And F _shape After fusion, classification is carried out;

(5.7)p _mix the class with the highest corresponding probability is the class predicted by the image to be detected.

2. The method for recognizing the double-flow feature fusion image based on the reinforcement learning according to claim 1, wherein the training process of the basic model in the step (2.2) is specifically as follows: compressing the Feature map output by AdaAvgPool to one dimension to obtain Feature vector Feature, sending the features before AdaAvgPool to a classifier to obtain a classification prediction probability pred, and outputting the Feature and the pred by a basic model to extract the features of an input image and predict the classification probability.

3. The robust learning based dual-stream feature fusion image recognition method according to claim 2, wherein the classifier in step (2.2) selects to use fully connected layers.

4. The method for identifying the double-flow feature fusion image based on the reinforcement learning according to the claim 1 or 2, wherein the step (3.4) is specifically as follows: inputting the Feature into a selection action submodule of the texture reinforcement learning model, wherein the selection action submodule is provided with an agent network, the model consists of a plurality of full connection layers, a ReLU function is used as an activation function, and the function definition formula is

The last layer is to convert the characteristic dimension quantity into the action quantity of the action space, and the output meaning is the Q value of each action in the action space; wherein the action space is a set formed by combining a series of actions, the actions are used for changing the position or size of a rectangular frame, the Q value of the action represents that the rectangular frame is changed to another position after the action is taken at a certain position, the process is quantitative evaluation of the influence generated by a target, and if the Q value is larger, the frame after the position is changed can be used for allowing the frame after the position is changed to be quantitatively evaluatedThe classification effect becomes better, whereas a lower Q value indicates that the box after the position change can make the classification effect worse.

5. The method for dual-stream feature fusion image recognition based on reinforcement learning according to claim 1 or 2, wherein the action in step (3.4) comprises translation, zooming in or zooming out in various directions.

6. The robust learning based dual-stream feature fusion image recognition method according to claim 1 or 2, characterized in that the coefficient of variation α ∈ (0, 1) in step (3.5).

7. The method for identifying the double-flow feature fusion image based on the reinforcement learning according to the claim 1 or 2, wherein the step (3.8) is specifically as follows: the prediction probability pred is a prediction score which is available for all the categories, the score represents the probability of predicting the category, the higher the probability corresponding to c is, the better the model performs the sample prediction pair is, so that whether the model is awarded or punished can be judged according to the prediction probabilities and the labels, and the reward = sign (pred' [ c ] -pred [ c ]).

8. The dual-stream feature fusion image recognition method based on reinforcement learning of claim 1 or 2, wherein the loss in the step (3.11) selects an MSE function, and the expression is loss = (Target-Q) _eval ) ² 。

9. The double-flow feature fusion image identification method based on reinforcement learning according to claim 1 or 2, characterized in that after the fusion model selection in the step (5.6) splices two features together, the classification probabilities of all classes are output by using a full connection layer.

10. The dual-flow feature fusion image recognition method based on reinforcement learning according to claim 1 or 2, wherein the value range of n is 5-10, and the value range of m is 4-7.