CN110826609A

CN110826609A - Double-flow feature fusion image identification method based on reinforcement learning

Info

Publication number: CN110826609A
Application number: CN201911038698.5A
Authority: CN
Inventors: 冯镔; 唐哲; 王豪; 李亚婷; 朱多旺; 刘文予
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2020-02-21
Anticipated expiration: 2039-10-29
Also published as: CN110826609B

Abstract

The invention discloses a double-current feature fusion image identification method based on reinforcement learning. The two models are respectively a texture model and a shape model: the texture model is classified according to texture information of an object in the image, and the shape model is classified according to shape information of the object. Both models enable the network to find the most discriminative region in the whole image in a reinforcement learning mode, and then carry out classification according to the region. The method is simple and easy to implement, has strong popularization capability, finds the area which is easy to distinguish the image, has proper and effective distinguishing area, fully uses the texture and shape information in the image, and can effectively overcome the influence of insufficient utilization of the image information and small difference between the images.

Description

Double-flow feature fusion image identification method based on reinforcement learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a double-flow feature fusion image identification method based on reinforcement learning.

Background

The image recognition has many applications in people's daily life, such as intelligent security, biomedicine, e-commerce shopping, automatic driving, smart home and the like. The image recognition studies how to identify a category corresponding to a sample from a plurality of categories. There are many problems such as less difference between images, greater background effect, and so on.

The current image recognition method generally inputs the image directly to a convolutional neural network for feature extraction and then carries out classification. Although there are various operations after feature extraction, most of the extracted features are texture information of the image. Such operations all have a drawback in that the shape information cannot be fully utilized and information advantageous for recognizing the image cannot be completely extracted. In addition, in order to reduce the background interference, the current approach is to generate candidate frames, but the method generates a large number of candidate frames, has a long calculation time, and has an ambiguous target, and cannot find a region really helpful for image classification.

Therefore, there is a need to design an image recognition method for dual-stream feature fusion, which can fuse texture information and shape information of an image and optimize computational efficiency.

Disclosure of Invention

The invention aims to provide a double-flow feature fusion image identification method based on reinforcement learning, which can effectively find the most effective areas respectively containing texture information and shape information. The influence of background and useless information is reduced, and the identification precision is effectively improved. The method comprises the following steps:

(1) generating a shape data set:

inputting each image into an image conversion model, and outputting n pictures which are similar in shape and different in texture. n is a preset value, the larger n, the better learning effect is, but the longer the time spent on training the model is, and generally, the value of n can be tried to be 5-10. The labels of the n converted images are all the same as the labels of the input images. The shape data set is a data set formed by pairing the original data set with a texture information-reduced data set and a shape information-increased data set. The purpose of generating this corresponding data set is to expect that the subsequent model to be trained can learn the shape information of the image.

(2) Training a texture basic model and a shape basic model:

(2.1) performing data enhancement on each image of the original dataset and the shape dataset separately: for one image, m rectangular frames are generated at random positions in the image, and m is a preset value. m may range from 4 to 7, with too much m resulting in too much data and too much time spent. The frame side length is generally greater than 1/2 and less than or equal to the image side length. The label of the cropped image is consistent with the label of the original image.

(2.2) training a basic model by using the cut image, wherein the texture basic model is trained by using an original data set, and the shape basic model is trained by using a shape data set; the texture base model and the shape base model have the same structure. An adaptive average pooling layer AdaAvgPool is newly added after the last block in the ResNet50 network, and the adaptive average pooling layer pools the images to reduce the size of the feature map. Compressing the Feature map output by AdaAvgPool to one dimension to obtain Feature vector Feature, sending the features before AdaAvgPool to a classifier to obtain a classification prediction probability pred, wherein the classifier can select to use a full connection layer. The basic model outputs Feature and pred, and the function is to extract the features of the input image and predict the classification probability.

(3) Training a texture reinforcement learning model:

(3.1) reading image_globalAnd a corresponding category label c. A rectangular box is initialized to the same size as the size of the image to be read.

(3.2) the position and size of the rectangular frame are changed several times in the whole process, if the size of the rectangular frame is equal to the size of the image, namely, the image_local＝image_globalThen jump to (3.3). If the rectangular frame is smaller than the image size, the image is cut according to the frame size, and then the image is up-sampled to the size same as the original image size to obtain the processed image_localWhere upsampling may select bilinear interpolation.

(3.3) image_localAnd inputting the Feature and the classification prediction probability pred into a texture base model.

(3.4) inputting the Feature into a texture enhancement learning model, wherein the texture enhancement learning model consists of a plurality of fully connected layers, a ReLU function is used as an activation function, and the function is defined as

The last layer is the number of feature dimensions converted into the number of actions in the action space, and the output means the Q value of each action in the action space. The action space is a set formed by combining a series of actions, the purpose of the actions is to change the position or size of the rectangular frame, and the actions can be selected to translate, enlarge, reduce and the like in all directions; the Q value of an action means that a rectangular box changes to another position after the action is taken at one position, and this process gives a quantitative assessment of the impact that our goal (i.e., classification) has. If the Q value is larger, the box after the position is changed can lead the classification effect to be better, and conversely, the Q value is lower, the box after the position is changed can lead the classification effect to be worse.

(3.5) in order to get the determined action, this step is divided into two strategies, search and development, and one of them is selected, wherein an explicit _ rate ∈ (0,1) is preset, the explicit _ rate represents the probability of selecting search, and the corresponding 1-explicit _ rate represents the probability of selecting development, the search randomly selects an action in all actions, the development selects the action corresponding to the maximum value of the Q value obtained in (3.4) as the selected action, the action is determined after selecting one of search and development, and the size or position of the box is changed according to the selected action and the change coefficient α, so as to get a new rectangular box, wherein the change coefficient α ∈ (0,1), and 0.1 can be selected, meaning that the ratio of each change is that the action is selected to be larger to the right, meaning that the box 'is larger to the right by 1.1 times of the original box to the left, and the action is selected to be smaller meaning that the box' is smaller by 0.9 times of the original box to the left.

And (3.6) obtaining another Feature and prediction probability Feature 'pred' according to the process of extracting the Feature in (3.2) and (3.3) by using the new rectangular box obtained in (3.5).

(3.8) based on pred, (pred' in 3.6), and c in (3.1), the following judgment can be made: if the prediction score of the pred on the category c is higher than that of the pred 'on the category c, the reward is-1, and correspondingly, if the prediction score of the pred on the category c is lower than that of the pred' on the category c, the reward is 1.

(3.9) updating the Q value Q of the selected action in (3.5) according to the reward obtained in (3.8)_target. The update mode is Q_targetAnd (c) means forward + γ max (Q (s, a)), where Q (s, a) represents the Q value after the action is taken in the s state, i.e., the Feature. Where γ is a learning rate of each update of the Q value, γ is a preset value, and γ may be 0.9.

(3.10) characterization Feature and Q obtained in (3.9)_targetAnd storing the experience into an experience pool. The experience pool is a corresponding measure for reducing the correlation between samples, and paired features and Q are firstly adopted_targetThere is a pool of experiences. And after the experience pool is stored to a certain amount, randomly selecting data from the experience pool to train the model.

(3.11) setting the new rectangular frame as the current frame box ', setting the new Feature as the current Feature, and setting the new classification prediction probability as the current classification prediction probability pred as pred'.

And (3.12) repeating the processes from (3.4) to (3.11) to a certain number of times. This is a process of adjusting the size and position of the frame all the time, and can be dynamically adjusted according to the change rate set in (3.5), and the change rate can be changed several times when set to be large, and can be changed several times when set to be small.

(3.13) when the experience pool is filled with a certain number of samples,randomly selecting pairs of Feature and Q from an experience pool_targetThe corresponding data of (D) is recorded as Feature_sAnd Target, inputting the characteristics to the training texture reinforcement learning model, outputting the Q value of the obtained action, and recording the Q value as Q_evalTarget and Q_evalThe difference between them is taken as loss and propagated backwards, updating the parameters. loss may be a Mean Square Error (MSE) function, expressed as loss (Target-Q)_eval)²。

(4) Training a shape reinforcement learning model:

and (4.1) training the shape reinforcement learning model by using the shape data set according to the step in (3), wherein the training process of the shape reinforcement learning model is the same as that of the texture reinforcement learning process, and the structure of the shape reinforcement learning model is the same as that of the texture reinforcement learning model.

(5) The double-flow prediction and fusion of the test image to be detected by utilizing the two trained models comprises the following substeps:

(5.1) reading image to be detected_global. A rectangular box is initialized to the same size as the size of the image to be read.

And (5.2) carrying out the Feature extraction in the steps (3.2) and (3.3) on the image to obtain the Feature and the classification prediction probability pred of the corresponding position of the frame.

And (5.3) inputting the characteristics obtained in the step (5.2) into a texture reinforcement learning model, outputting the score Q values of all actions, selecting the action with the maximum Q value according to a developed strategy, and changing the size and the position of the frame according to the selected action.

(5.4) repeating the process of (5.2) and (5.3) to a number of times, the number of times of repetition being related to the rate of change of the rectangular frame similarly to the repeated process of (3.12). The last change gives the feature F_texture。

(5.5) testing the texture reinforcement learning model in a similar process from (5.1) to (5.4) to obtain F_shape

(5.6) two different characteristics F_textureAnd F_shapeInput into the fusion model and output as the final prediction probability p_mixWherein the fusion model is a trainable model intended to beF_textureAnd F_shapeThe fusion is followed by classification. For example, the fusion model may choose to stitch two features together and then output the classification probabilities of all classes using the full-connectivity layer.

(5.7)p_mixThe class with the highest probability of correspondence is the predicted class.

Through the technical scheme, compared with the prior art, the invention has the following technical effects:

(1) simple structure is effective: compared with the prior art that the texture information is extracted by using a convolutional neural network, the method respectively extracts the texture information and the shape information by designing a double-flow structure. The discriminative regions of textures and shapes are respectively searched by using a reinforcement learning mode, so that the structure is clear, simple and effective;

(2) the accuracy is high: the method is different from a method for generating the propofol, an optimal region in the image is searched in a reinforcement learning mode, a region with better performance is not selected in the propofol, model learning cost is reduced, the process of searching a distinguishing region is better met, the method is different from the method of only utilizing texture information in the prior art, information contained in the image can be fully mined by utilizing texture and shape information, and the accuracy is higher;

(3) the robustness is strong: the texture reinforcement learning model disclosed by the invention focuses more on texture information, the shape reinforcement learning model focuses more on shape information, and by respectively focusing on the two kinds of information, a network can adapt to different images, and the performance is more robust.

Drawings

FIG. 1 is a flow chart of a double-flow feature fusion image recognition method based on reinforcement learning according to the present invention;

FIG. 2 is a schematic diagram of a reinforcement learning model implementation framework of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The technical terms of the present invention are explained and explained first:

infood dataset: the database is a data set used in a game held by Kaggle and contains 251 fine-grained (pre-made) food categories, totaling 120216 images collected from the web as a training set, 12170 images as a verification set and 28399 images as a test set, and provides manual verification tags, wherein each image contains a single category of food.

ResNet-50: a neural network for classification mainly comprises 50 convolutional layers, a pooling layer and a short connecting layer. The convolution layer is used for extracting picture characteristics; the pooling layer has the functions of reducing the dimensionality of the feature vector output by the convolutional layer and reducing overfitting; the shortcut connection layer is used for transferring gradient and solving the problems of extinction and explosion gradient. The network parameters can be updated through a reverse conduction algorithm;

an image conversion model: the style of the image can be transformed but the content is not changed using the structure of the generic adaptive Network, including the generator and the discriminator.

As shown in fig. 1, the present invention provides a double-flow feature fusion image recognition method based on reinforcement learning, which includes the following steps:

(1) generating a shape data set:

(2) Training a texture basic model and a shape basic model:

(2.1) respectively carrying out data enhancement on each image of the original data set and the shape data set, wherein the specific process is as follows: for one image, m rectangular frames are generated at random positions in the image, and m is a preset value. m may range from 4 to 7, with too much m resulting in too much data and too much time spent. The frame side length is generally greater than 1/2 and less than or equal to the image side length. The label of the cropped image is consistent with the label of the original image.

(3) Training a texture reinforcement learning model:

(3.2) the position and size of the rectangular frame are changed several times in the whole process, if the size of the rectangular frame is equal to the size of the image, namely, the image_local＝image_globalThen jump to (3.3). If the rectangular frame is smaller than the image size, the image is cut according to the frame size, and then the image is up-sampled to the size same as the original image size to obtain the processed image_localWherein the upsampling may select a bilinear interpolation. The purpose of this step is: if the rectangular frame is the initialized rectangular frame, no operation is performed, and the step (3.3) is entered, if the rectangular frame is smaller than the initialized frame, the cropping operation is performed, and the image is up-sampled to the size same as the original input image, so that the images input to the neural network are all the same size.

(3.3) image is shown in FIG. 2_localInput into the texture base model yields two outputs: feature and class prediction probability pred.

(3.4) directly inputting the Feature into the reinforcement learning model, and outputting the Q value of each corresponding action in the action space through various operations such as convolution, pooling and the like of forward propagation. Specifically, the Feature is input into a selection action submodule of a texture reinforcement learning model, an agent network is arranged in the selection action submodule, the model is composed of a plurality of full connection layers, a ReLU function is used as an activation function, and the function definition formula is

(3.5) in the select action submodule, in order to obtain the determined action, the step is divided into two strategies of searching and developing, wherein one strategy is selected, the search randomly selects one action from all actions, the development selects the action corresponding to the maximum value of the Q value obtained in (3.4) as the selected action, the action is determined after one strategy is selected and developed, and the size or the position of the frame is changed according to the selected action and the change coefficient α, so that a new rectangular frame box ' is obtained, wherein the change coefficient α (0,1) can be selected, 0.1 can be selected, the meaning of each change is that the ratio of each change is determined, for example, the action is selected to be larger to the right, the box ' is obtained to be larger to the right by 1.1 times, and the action is selected to be smaller, the box ' is selected to be smaller to the left by 0.9 times.

And (3.6) obtaining another Feature and prediction probability Feature 'pred' according to the process of extracting the Feature in (3.2) and (3.3) by using the new rectangular box obtained in (3.5). The purpose of this step is that after the position of the rectangular frame is changed, the same operation of extracting features can be adopted, the region corresponding to the frame is cut out, and then the up-sampling operation is performed. And inputting the feature and the prediction probability into a basic model.

(3.8) based on pred, (pred' in 3.6), and c in (3.1), the following judgment can be made: if the prediction score of the pred on the category c is higher than that of the pred 'on the category c, the reward is-1, and correspondingly, if the prediction score of the pred on the category c is lower than that of the pred' on the category c, the reward is 1. Specifically, the prediction probability pred is a prediction score that is common to all categories, and this score may indicate the probability of predicting as one category. So a higher probability for c indicates that the model performs better on the sample prediction pairs. It can be determined whether to award or punish to the model based on both the prediction probability and the label. Signal (pred' c-pred c).

(3.9) in the update Q value submodule, updating the Q value Q of the action selected in (3.5) according to the reward obtained in (3.8)_target. The update mode is Q_targetAnd (c) means forward + γ max (Q (s, a)), where Q (s, a) represents the Q value after the action is taken in the s state, i.e., the Feature. Wherein gamma is the learning rate of each Q value update, gamma is a preset value,γ may be selected to be 0.9.

(3.13) in the Q value evaluation submodule, after a certain number of samples are filled in the experience pool, the Feature and Q in pairs are randomly selected from the experience pool_targetThe corresponding data of (D) is recorded as Feature_sAnd Target, inputting the characteristics to the training texture reinforcement learning model, outputting the Q value of the obtained action, and recording the Q value as Q_evalTarget and Q_evalThe difference between them is taken as loss and propagated backwards, updating the parameters. loss may be a Mean Square Error (MSE) function, expressed as loss (Target-Q)_eval)²。

(4) Training a shape reinforcement learning model:

and (4.1) training the shape reinforcement learning model by using the shape data set according to the step in (3), wherein the training process of the shape reinforcement learning model is the same as that of the texture reinforcement learning process, and the structure of the shape reinforcement learning model is the same as that of the texture reinforcement learning model. And (3) and (4) dividing the information in the data set into textures and shapes, and respectively learning two different kinds of information in the data set by using two streams of ideas. The model structure of both streams is the same, and the data sets used for training are pairs of data sets that have been pre-processed. The training process is the same.

(5.6) two different characteristics F_textureAnd F_shapeInput into the fusion model and output as the final prediction probability p_mixWherein the fusion model is a trainable model aimed at transforming F_textureAnd F_shapeThe fusion is followed by classification. For example, the fusion model may choose to stitch two features together and then output the classification probabilities of all classes using the full-connectivity layer. The fusion of features can be tried in various ways, and can even be performed directly on the prediction scores, but the fusion performed on the scores is not as good as the fusion performed on the features. The fusion model can be obtained by inputting and using two characteristics according to the label corresponding to the image by training the fusion model.

The effectiveness of the invention is proved by the following experimental examples, and the experimental result proves that the invention can improve the identification accuracy of image identification.

The invention is compared with the basic network we use on the iFood dataset, and Table 1 shows the accuracy of the method of the invention on the dataset, wherein Backbone represents the basic model Resnet50 we use, and DQN represents the reinforcement learning model we use. The larger the numerical value of the result is, the higher the accuracy of image recognition is, and the improvement of the method is very obvious as can be seen from the table.

TABLE 1 precision on iFood dataset

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A double-flow feature fusion image identification method based on reinforcement learning is characterized by comprising the following steps:

(1) generating a shape data set:

for each image, the image is input into an image conversion model, and corresponding n images with similar shapes but different textures are output. The labels of the n converted images are the same as the labels of the input images, and n is a preset value;

(2) training a texture basic model and a shape basic model:

(2.1) performing data enhancement on each image of the original dataset and the shape dataset separately: for an image, generating m rectangular frames at random positions in the image, wherein the side length of each frame is greater than 1/2 of the side length of the image and less than or equal to the side length of the image, the label of the cut image is consistent with that of the original image, and m is a preset value;

(2.2) training a base model using the cropped images, wherein the texture base model is trained using the original data set and the shape base model is trained using the shape data set; the texture basic model and the shape basic model have the same structure, and are both a self-adaptive average pooling layer AdaAvgPool is newly added after the last block in the ResNet50 network, the self-adaptive average pooling layer is used for pooling images to reduce the size of a feature map, the basic model is trained by using the images and labels, and the basic model is used for extracting features and predicting the images;

(3) training a texture reinforcement learning model:

(3.1) reading image_globalInitializing a rectangular box with the same size as the size of the read image, with the corresponding category label c;

(3.2) if the size of the rectangular frame is equal to the size of the image, jumping to (3.3); if the rectangular frame is smaller than the image size, the image is cut according to the size of the frame, then the image is up-sampled to the size same as the size of the original image, and the processed image is obtained_local；

(3.3) image_localInputting the Feature and the classification prediction probability pred into a texture basic model;

(3.4) inputting the Feature into a texture reinforcement learning model, wherein the output of the texture reinforcement learning model is the Q value of each action in the action space;

(3.5) obtaining determined actions through searching and developing two strategies, wherein one of the searching and the developing is to select one action randomly from all the actions, the developing is to select the action corresponding to the maximum value of the Q value obtained in the step (3.4) as a selected action, determine an action after selecting one of the searching and the developing, and change the size or the position of the frame according to the selected action and the change coefficient α to obtain a new rectangular frame box';

(3.6) obtaining another Feature and prediction probability Feature 'pred' according to the process of extracting the Feature in (3.2) and (3.3) by using the new rectangular box obtained in (3.5);

(3.8) based on pred, (pred' in 3.6), and c in (3.1), the following judgment is made: reward-1 if the prediction score of pred on category c is higher than the prediction score of pred 'on category c, and reward-1 if the prediction score of pred on category c is lower than the prediction score of pred' on category c;

(3.9) updating the Q value Q of the selected action in (3.5) according to the reward obtained in (3.8)_targetThe update mode is Q_targetThe learning rate is updated every time the Q value is updated, and gamma is a preset value;

(3.10) characterization Feature and Q obtained in (3.9)_targetStoring the experience into an experience pool;

(3.11) taking the new rectangular frame as the current frame box ', taking the new Feature as the current Feature, taking the new classification prediction probability as the current classification prediction probability pred ═ pred', and repeating the processes from (3.4) to (3.11) to a preset number of times. Randomly selecting a pair of Feature and Q from the experience pool after a preset number of samples are loaded in the experience pool_targetThe corresponding data of (D) is recorded as Feature_sAnd Target, inputting the characteristics into the texture reinforcement learning model, outputting the Q value of the obtained action, and recording the Q value as Q_evalTarget and Q_evalThe difference between the two is used as loss, and the loss is propagated reversely, and the parameters are updated;

(4) training a shape reinforcement learning model:

(4.1) training a shape reinforcement learning model by using the shape data set according to the step in (3), wherein the training process of the shape reinforcement learning model is the same as that of the texture reinforcement learning process, and the structures of the shape reinforcement learning model and the texture reinforcement learning model are the same;

(5) the double-flow prediction and fusion of the test image to be detected by utilizing the two trained reinforced models comprises the following substeps:

(5.1) reading an image to be detected, and initializing a rectangular box with the size same as that of the image to be detected;

(5.2) carrying out the Feature extraction of the steps (3.2) and (3.3) on the image to be detected to obtain the Feature and the classification prediction probability pred of the corresponding position of the frame;

(5.3) inputting the characteristics obtained in the step (5.2) into a texture reinforcement learning model, outputting score Q values of all actions, selecting the action with the maximum Q value according to a developed strategy, and changing the size and the position of a frame according to the selected action;

(5.4) repeating the process of (5.2) and (5.3) to a number of times, the number of times of repetition being related to the rate of change of the rectangular frame similarly to the repeated process of (3.12); obtaining the characteristic F after the last change_texture；

(5.5) testing the texture reinforcement learning model in a similar process from (5.1) to (5.4) to obtain F_shape；

(5.6) two different characteristics F_textureAnd F_shapeInput into the fusion model and output as the final prediction probability p_mixWherein the fusion model is a trainable model aimed at transforming F_textureAnd F_shapeAfter fusion, classification is carried out;

(5.7)p_mixthe class with the highest corresponding probability is the class predicted by the image to be detected.

2. The method for recognizing the double-flow feature fusion image based on the reinforcement learning according to claim 1, wherein the training process of the basic model in the step (2.2) is specifically as follows: compressing the Feature diagram output by AdaAvgPool to one dimension to obtain Feature vector Feature, and sending the features before AdaAvgPool to a classifier to obtain a classification prediction probability pred. The basic model outputs Feature and pred, and the function is to extract the features of the input image and predict the classification probability.

3. The robust learning based dual-stream feature fusion image recognition method according to claim 1 or 2, wherein the classifier in step (2.2) selects to use fully connected layers.

4. The method for identifying the double-flow feature fusion image based on the reinforcement learning according to the claim 1 or 2, wherein the step (3.4) is specifically as follows: inputting the Feature into a selection action submodule of the texture reinforcement learning model, wherein the selection action submodule is provided with an agent network, the model consists of a plurality of full connection layers, and ReLU function is used as the modelFor activating a function, the function is defined byThe last layer is to convert the characteristic dimension quantity into the action quantity of the action space, and the output meaning is the Q value of each action in the action space; the action space is a set formed by combining a series of actions, the actions aim at changing the position or size of a rectangular frame, the Q value of the action means that the rectangular frame is changed to another position after the action is taken at a certain position, the process is quantitative evaluation of the influence generated by a target, if the Q value is larger, the frame after the position is changed can enable the classification effect to be better, and otherwise, the lower the Q value is, the frame after the position is changed can enable the classification effect to be worse.

5. The method for dual-stream feature fusion image recognition based on reinforcement learning according to claim 1 or 2, wherein the action in step (3.4) comprises translation, zooming in or zooming out in various directions.

6. The robust learning based dual-stream feature fusion image recognition method according to claim 1 or 2, wherein the variation coefficient α e (0,1) in the step (3.5).

7. The method for identifying the double-flow feature fusion image based on the reinforcement learning according to the claim 1 or 2, wherein the step (3.8) is specifically as follows: the prediction probability pred is a prediction score which is common to all the categories, the score represents the probability of predicting the category, the higher the probability corresponding to c is, the better the model performs the sample prediction pair is, so that whether the model is awarded or punished can be judged according to the prediction probabilities and the labels, and the rewarded is sign (pred' c-pred c).

8. The method for identifying the double-flow feature fusion image based on the reinforcement learning according to the claim 1 or 2, characterized in that the stepsThe loss in step (3.13) selects the MSE function, expressed as loss ═ Q (Target-Q)_eval)²。

9. The double-flow feature fusion image identification method based on reinforcement learning according to claim 1 or 2, characterized in that after the fusion model selection in the step (5.6) splices two features together, the classification probabilities of all classes are output by using a full connection layer.

10. The dual-flow feature fusion image recognition method based on reinforcement learning according to claim 1 or 2, wherein the value range of n is 5-10, and the value range of m is 4-7.