CN109190695B

CN109190695B - Fish image classification method based on deep convolutional neural network

Info

Publication number: CN109190695B
Application number: CN201810984130.1A
Authority: CN
Inventors: 郑海永; 邱晨晨; 俞智斌
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2018-08-28
Filing date: 2018-08-28
Publication date: 2021-08-03
Anticipated expiration: 2038-08-28
Also published as: CN109190695A

Abstract

The embodiment of the application provides a fish image classification method based on a deep convolutional neural network, and relates to the technical field of underwater computer vision. The method comprises the steps of collecting underwater fish images and constructing a fish image data set; selecting a pre-training set ImageNet and a Fish data set Fish4 Knowledge; and constructing a convolutional neural network model based on B-CNNS, sequentially inputting the three data sets into the network model according to the number of samples, and obtaining the category labels of the fish images in the fish image data set through training and iterative feedback. The method can realize ideal fish identification accuracy by means of a data set with a small sample size, and provides a basis for further research of a fish ecosystem.

Description

Fish image classification method based on deep convolutional neural network

Technical Field

The application relates to the technical field of underwater computer vision, in particular to a fish image classification method based on a deep convolutional neural network.

Background

The fish image classification has important significance for the research of marine ecosystems, particularly fish ecosystems. In fish habitats, there are difficulties in obtaining a large number of clear images of fish with equipment. Due to the complex habitat environment and the protective color of the fish, the development of technologies such as fish image classification is restricted.

The correct classification of fish images with the aid of a limited number of fish image data sets is of great importance for the study of fish ecosystems. However, current methods of fish image classification are less and often require a larger training sample size.

Disclosure of Invention

The application provides a fish image classification method based on a deep convolutional neural network, which is used for solving the technical problems that in the prior art, the fish image classification method is few, the requirement on the condition of a data set is high, the high identification accuracy cannot be achieved, and the like.

A first aspect of an embodiment of the present application provides a fish image classification method based on a deep convolutional neural network, including the following steps:

s1: acquiring underwater fish images and constructing a fish image data set;

selecting a pre-training set ImageNet and a Fish data set Fish4 Knowledge;

s2: and constructing a convolutional neural network model based on B-CNNS, sequentially inputting the three data sets into the network model according to the number of samples, and obtaining the category labels of the fish images in the fish image data set through training and iterative feedback.

Further, step S2 specifically includes:

s21: inputting the pre-training set ImageNet into the network model for training until convergence, and taking a final full-connection layer;

s22: inputting the Fish data set Fish4Knowledge into the network model trained in the step S21 to continue training until convergence;

s23: and inputting the fish image data set into the network model trained in the step S22 to fine tune the network parameters.

Further, the network model includes an optimized data enhancement unit and a network structure unit, that is, the three data sets in step S1 are input to the optimized data enhancement unit to obtain a fish image test set with enhanced data, and the fish image test set is input to the network structure unit to obtain a category label of an image in the fish image data set.

Further, the inputting the three data sets in the step S1 into the optimized data enhancement unit to obtain a fish image test set with data enhancement specifically includes: firstly, each image in the three data sets is subjected to super-resolution reconstruction through an SRGAN network, and then the reconstructed image is subjected to data enhancement.

Further, the data enhancement comprises flipping and/or rotating.

Further, the network structure unit includes two identical VGG networks, and the inputting the fish image test set into the network structure unit to obtain the category label of the image in the fish image data set specifically includes:

inputting the fish image test set into two VGG networks for feature extraction, then performing bilinear pooling on the features extracted by the two VGG networks to obtain bilinear vectors, and finally inputting a softmax function to obtain category labels of the images in the fish image data set.

Further, the VGG network comprises five convolutional layers, wherein an optimized SE block is inserted behind the first four convolutional layers.

Further, the optimized SE block specifically operates as follows:

obtaining a channel number C after the input image x passes through the convolution layer₁A characteristic diagram of (1);

each channel is divided into four regions, wherein the k-th channel is defined by four real numbers z_k1、z_k2、z_k3、z_k4To characterize;

obtaining the weight of each channel through excitation operation learning;

multiplying each region of each channel of the feature map by the learned weight;

the dimensions are adjusted such that the dimensions of the output image are equal to the dimensions of the input image.

Further, the four real numbers are respectively:

where W × H is the characteristic dimension of the kth channel in the characteristic spectrum of the input SE block, i is the variable from (1, H), j is the variable from (1, W), u_k(i, j) is the kth channel i of the feature spectrum input to SE block.

Further, the learning by excitation operation obtains the weight of each channel, specifically: s ═ σ (L)₂δ(L₁R (z)), i.e., first perform reshape operation r (z), each channel dimension will be characterized by C₁4 to 4C ₁1, then sequentially passing through the full connection layer L₁ReLU activation function delta, full connection layer L₂And sigma operation of the sigmoid function, and learning to obtain the weight of each channel after training and iteration.

The fish image classification method based on the deep convolutional neural network is used for solving the technical problems that in the prior art, the fish image classification method is few, the requirement on the condition of a data set is high, the high identification accuracy cannot be achieved, and the like.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic flow chart diagram provided by an embodiment of the present application;

fig. 2 is a comparison diagram of an SE block and an optimized SE block provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Examples

Referring to fig. 1, a schematic flow chart of a fish image classification method based on a deep convolutional neural network according to an embodiment of the present application includes the following steps:

s1: acquiring underwater fish images and constructing a fish image data set;

selecting a pre-training set ImageNet and a Fish data set Fish4 Knowledge;

the constructed Fish image dataset is a Crothian dataset (the Crothian dataset is a Fish dataset collected in Crohn and is a public dataset), ImageNet is a large visual database used for visual object recognition software research, and Fish4Knowledge is a commonly used Fish database (F4K for short).

S2: constructing a B-CNNS (bilinear Convolution Neural network) -based bilinear Convolution Neural network model, sequentially inputting the three data sets into the network model according to the number of samples, and obtaining the category labels of the fish images in the fish image data set through training and iterative feedback.

Specifically, the method comprises the following steps:

s21: inputting the pre-training set ImageNet into the network model for training until convergence, and removing a final full-connection layer;

In this embodiment, the network model includes an optimized data enhancement unit and a network structure unit, that is, the three data sets in step S1 are input to the optimized data enhancement unit to obtain a fish image test set with enhanced data, and the fish image test set is input to the network structure unit to obtain a category label of an image in the fish image data set.

As shown in the left half of fig. 1, an optimized data enhancement flowchart specifically includes the following working principles: firstly, each image in the three data sets is subjected to super-resolution reconstruction through an SRGAN network, and then the reconstructed image is subjected to data enhancement.

Wherein, SRGAN (Photo-reactive Single Image Super-Resolution Using a general adaptive Network, arxiv,21Nov,2016) uses a Generative countermeasure Network (GAN) for SR problem, and Super-Resolution (SR) is to reconstruct a corresponding high-Resolution Image from an observed low-Resolution Image. Because the feature extraction of the image is affected by too low resolution, the super-resolution reconstruction is performed by using the SRGAN network in the embodiment of the application, and the resolution of the image is improved.

In the embodiment of the application, the data enhancement of the reconstructed image mainly comprises turning over and/or rotating, wherein the turning over comprises horizontal turning over and vertical turning over, and the rotating comprises clockwise rotating by 90 degrees, 180 degrees and 270 degrees.

The right half of fig. 1 is a flow chart of a network structure part, the network structure unit includes two identical VGG networks, and the fish image test set is input to the network structure unit to obtain a category label of an image in the fish image data set.

After the images of the fish image test set subjected to data enhancement are input to the network structure part, because the entrance of the VGG network is fixed to be 224 × 224, the input images are firstly adjusted to be 224 × 224 images, and then the 224 × 224 images are respectively input to two VGG networks, wherein the VGG networks have strong feature extraction capability, and the whole VGG network comprises 16 convolutional layers and full-connection layers, namely, five convolutional layers and 3 full-connection layers, namely 16 +2+3+3+ 3.

In order to improve the feature extraction capability of the network, the embodiment of the application inserts an optimized SE block behind the first four convolutional layers.

FIG. 2 is a graph comparing the original SE blocks (Squeeze-and-Excitation network structure blocks) and the optimized SE blocks (referred SE blocks). In the figure, Conv Layer represents convolutional Layer, Golbal Pooling represents global Pooling, FC represents fully-connected Layer, ReLU represents ReLU activation function, Sigmoid represents Sigmoid function, Scale represents dimensional change, Reshape represents straightening, Quarter & Pooling represents dividing each channel cross, and then Pooling is performed for each block.

The optimized SE block specifically operates as follows:

obtaining a channel number C after the input image x passes through the convolution layer₁The feature map of (1) is firstly compressed along the sequence of the feature space, in the original SE block, the adopted method is to convert each two-dimensional feature channel into a real number to represent, and use it to represent the global distribution on the feature channel, while the optimized SE block in the embodiment of the present application uses four real numbers to represent the features of one feature channel.

Each channel is divided into four regions, wherein the k-th channel is defined by four real numbers z_k1、z_k2、z_k3、z_k4Characterizing, namely summing all points of each region respectively and then averaging;

Obtaining the weight of each channel through excitation operation learning;

the method specifically comprises the following steps: s ═ σ (L)₂δ(L₁R (z)), i.e., first perform reshape operation r (z), each channel dimension will be characterized by C₁4 to 4C ₁1, then sequentially passing through the full connection layer L₁ReLU activation function delta, full connection layer L₂And sigma operation of the sigmoid function, and learning to obtain the weight of each channel after training and iteration.

After the optimized SE block is embedded into each group of convolutions of the VGG network, the feature extraction capability of the network can be effectively improved, and the recognition capability of the network on fish pictures is further improved.

And performing bilinear pooling on the features extracted by the two VGG networks to obtain bilinear vectors, so that the network can notice second-order information of image features, and finally inputting a softmax function to obtain a category label of the image in the fish image data set.

In the following, the effectiveness of the present invention will be further verified through experiments, including the effectiveness of two pre-training, the effectiveness of the optimized SE block, and the effectiveness of the combination of the SE block and the B-CNNs.

1. Effectiveness of two pre-training:

TABLE 1 comparison of control results of whether several common convolutional networks used two pre-training sessions

As shown in table 1, four more common convolutional neural networks were selected for this set of experiments: AlexNet, VGG-16, inclusion-v 4, ResNet-50, and they were used to make more detailed control experiments. Specifically, four experiments were performed for each network, from top to bottom:

(1) the two data sets were never used and were trained directly with the target data set Croatian fish data set.

(2) Only ImageNet data set is adopted for pre-training, an F4K data set is not used, and finally Croatian fish data set is used for fine-tuning of network parameters.

(3) The ImageNet dataset was not used, only the F4K dataset was used for pre-training, and finally the Croatian fish dataset was used for fine tuning of the network parameters.

(4) Firstly, ImageNet is used for primary pre-training, then F4K is used for secondary pre-training on the basis, and finally, Crotaian fish data sets are used for fine adjustment of network parameters.

As can be seen from the accuracy results in the table, compared with the case of not using the training set, the accuracy of the network is improved no matter which data set is used for pre-training, but the improvement effect is not obvious enough. Clearly, on each network, relatively high accuracy is achieved after two pre-trains are used. The network obtains the ability to recognize common features through a first pre-training. Professional knowledge on the aspect of fish characteristics is mastered through the second pre-training, so that higher accuracy can be obtained through Crotaian training of the target data set.

2. The validity of the SE block is optimized.

TABLE 2 comparison of accuracy results for inserting original SE blocks and optimized SE blocks over different networks

To illustrate the effectiveness of the optimized SE block, the present example was experimented with two datasets CIRAR10 and ***'s flower dataset, which are very common in computer vision. In order to fully illustrate the optimization effect of the invention on the SE block, the embodiment of the present application uses three common networks: the effectiveness of the inclusion-v 4, inclusion _ renet _ v2 and ResNext _ v1 was demonstrated by control experiments after inserting SE blocks and optimized SE blocks (rSE), respectively.

As can be seen from table 2, after the original SE blocks are replaced with the optimized SE blocks in the three networks, the accuracy of the three networks on the two data sets is improved to some extent, which indicates that the optimized SE blocks further improve the feature extraction capability of the networks, thereby improving the classification capability of the networks on fish pictures.

3. And (5) effectiveness of combination of the optimized SE block and the B-CNNs.

TABLE 3 comparison of different network modules and different data enhancement methods

After the validity of the optimized SE block is explained. In order to further apply the method to the fish classification problem, the embodiment of the application selects to combine the method and the B-CNNs, because the fish classification belongs to fine-grained classification, and the B-CNNs are just networks specially used for fine-grained classification. So it is inserted after each set of convolutions of B-CNNs. In addition, four sets of control experiments were also performed above to demonstrate the importance of super-resolution in order to verify whether optimization for data enhancement was effective.

As can be seen from the table, the accuracy of the B-CNNs is improved to a certain extent after the SE blocks are added, and further, after the original SE blocks are changed into the optimized SE blocks, the classification capability of the network on the fish pictures is improved to a certain extent. In addition, different from the prior method which uses common data enhancement, a method which firstly enhances the image resolution by means of SRGAN and then performs common data enhancement is provided for the problem of low resolution of the target data set. It can be seen from the table that the effect of the network using the optimized data enhancement method is relatively good.

In addition, to illustrate the versatility of combining B-CNNs and optimized SE blocks, the examples of the present application performed the same experiment on the fish data set of QUT, in addition to the Croatian data set:

TABLE 4 comparison of different network Module situations

Because QUT the image resolution of the fish data set is already high. The srnan technique is not used but only the general data enhancement method is employed. Here again, the effect of SE blocks and the combination of optimized SE blocks and B-CNNs is mainly verified. As can be seen from the table, the use of the SE block, particularly the use of the optimized SE block, has the most obvious effect on improving the network accuracy.

It should be noted that when the embodiments of the present application refer to the ordinal numbers "first", "second", "third", or "fourth", etc., it should be understood that the terms are used for distinguishing them from each other only, unless they really mean that the order is expressed according to the context. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

It is to be understood that the present application is not limited to what has been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A fish image classification method based on a deep convolutional neural network is characterized by comprising the following steps:

s1: acquiring underwater fish images and constructing a fish image data set;

selecting a pre-training set ImageNet and a Fish data set Fish4 Knowledge;

s2: constructing a convolutional neural network model based on B-CNNS, sequentially inputting the three data sets of the pre-training set ImageNet, the Fish data set Fish4Knowledge and the Fish image data set into the network model according to the number of samples, and obtaining a category label of a Fish image in the Fish image data set through training and iterative feedback, wherein the step S2 specifically comprises the following steps:

S21: inputting the pre-training set ImageNet into the network model for training until convergence;

s23: inputting the fish image data set into the network model trained in the step S22 to fine-tune network parameters;

the network model comprises an optimized data enhancement unit and a network structure unit, namely the three data sets in the step S1 are input into the optimized data enhancement unit to obtain a fish image test set with enhanced data, and the fish image test set is input into the network structure unit to obtain a category label of an image in the fish image data set;

the network structure unit includes two identical VGG networks, and the inputting the fish image test set into the network structure unit to obtain the category label of the image in the fish image data set specifically includes:

inputting the fish image test set into two VGG networks for feature extraction, then performing bilinear pooling on the features extracted by the two VGG networks to obtain bilinear vectors, and finally inputting a softmax function to obtain category labels of the images in the fish image data set;

The VGG network comprises five groups of convolutional layers, wherein an optimized SE block is inserted into the back of the front four groups of convolutional layers, and the optimized SE block specifically operates as follows:

obtaining the weight of each channel through excitation operation learning;

adjusting the dimensionality so that the dimensionality of the output image is equal to the dimensionality of the input image;

wherein the four real numbers are respectively:

where W × H is the characteristic dimension of the kth channel in the characteristic spectrum of the input SE block, i is the variable from (1, H), j is the variable from (1, W), u_k(i, j) is the characteristic spectrum input to the SE block;

the weight of each channel is obtained through excitation operation learning, and specifically comprises the following steps: s ═ σ (L)₂δ(L₁R (z)), i.e., first perform reshape operation r (z), each channel dimension will be characterized by C₁4 to 4C₁1, then sequentially passing through the full connection layer L₁ReLU activation function delta, full connection layer L₂And sigma operation of the sigmoid function, and learning to obtain the weight of each channel after training and iteration.

2. The method for classifying fish images based on deep convolutional neural network of claim 1, wherein the step of inputting the three data sets in step S1 to the optimized data enhancement unit to obtain a data-enhanced fish image test set specifically comprises: firstly, each image in the three data sets is subjected to super-resolution reconstruction through an SRGAN network, and then the reconstructed image is subjected to data enhancement.

3. The deep convolutional neural network-based fish image classification method as claimed in claim 2, wherein the data enhancement comprises flipping and \ or rotation.