Fish fine grit classification method based on deep learning
Technical field
The present invention relates to image procossing and its visions, nerual network technique field, are based on depth more particularly to one kind
The fish fine grit classification method of habit.
Background technique
Fine granularity image classification (Fine-Grained Categorization), also referred to as subclass image classification
(Sub-Category Recognition), which is that one, the fields such as computer vision, pattern-recognition are very popular in recent years, to be ground
Study carefully project.More careful subclass division is carried out the purpose is to the big classification to coarseness, but due to class subtle between subclass
Between difference in difference and biggish class, compared with common image classification task, fine granularity image classification difficulty is bigger.To fish
Fine grit classification has very big commercial application value in fishery.Compared with traditional manual identified mode, it is based on depth
The fish fine grit classification method of study has big advantage, and the work of remote superman's class is wanted to imitate in terms of speed and accuracy rate
Rate, especially in the case where people are in a state of fatigue with the increase of working time, the speed that people not merely classify is reduced, quasi-
True rate also will be reduced naturally.The mankind are natural when handling mechanical cumbersome work can not to defeat machine, in order to chase after
The reasonably optimizing for asking resource distribution, the mankind is freed from such work and imperative, particulate of putting into other work
This technology of degree classification just is born out along with this demand of the mankind.
After deep learning in 2015 captures the major image procossing match umber one, from AlexNet, VGG,
InceptionNet to ResNet, the deep learning constructed by convolutional neural networks are higher and higher in classificatory accuracy rate.
The method that present image procossing largely uses all is deep learning.Neural network is built into one layer by many neurons
One layer of network, by active coating come so that model has the ability of very strong nonlinear fitting.Designer only need image and
Label input, model will be extracted automatically and result mapping learning characteristic.Neural network be substantially matrix multiple with it is non-thread
The combination of property inhibits otiose to result by many filtering cores to filter to result useful feature the most
Feature, to be learnt and be classified.
The prior art carries out mainly using two methods when fish image fine grit classification under complex background, and one is adopt
With the traditional classification algorithm based on priori, another kind is the sorting algorithm based on study.Sorting algorithm based on priori often without
Method effectively solve in fish fine grit classification task gap in class is big, between class gap it is small brought by difficult point, therefore cause to classify
Precision is low, is unable to satisfy application requirement.And the fine grit classification algorithm based on study can obtain preferable nicety of grading, but same
When also face identification duration, the accuracy rate of identification is low, needs the problem of additional tags information, thereby increase time and manpower at
This, reduces the working efficiency and benefit of fisheries industry.
Therefore it is urgent to provide a kind of fish fine grit classification methods based on deep learning to solve above-mentioned technical problem.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of fish fine grit classification method based on deep learning, energy
It is enough that the progress of fish image, accurately fine grit classification, recognition time are short under complex background.
In order to solve the above technical problems, one technical scheme adopted by the invention is that: it provides a kind of based on deep learning
Fish fine grit classification method, comprising the following steps:
S1: the shooting image that will acquire is pre-processed;
S2: carrying out feature extraction to pretreated image using deep neural network, is depth mind using variable convolution
Suitable sampled point is chosen through the convolutional layer in network, obtains characteristic pattern;
S3: construction feature pyramid network proposes network as region, sets on each of network characteristic dimension
Anchor is set, the characteristic pattern input feature vector pyramid network that step S2 is exported calculates the information of the corresponding regional area of each anchor
Amount;
S4: maximum at least four regional area of information content is taken to be up-sampled, the depth nerve net in input step S2
Network carries out feature extraction to the region extracted, exports characteristic pattern;
S5: the characteristic pattern exported in step S2 and step S4 is linked together, and carries out the pre- of classification using full articulamentum
It surveys, calculates loss function and backpropagation using label information, update area proposes network parameter;
S6: using the training dataset collected train step S1 to the step S5 network model constructed network weight
Parameter classifies to image using the network model for training network weight parameter, exports result.
In a preferred embodiment of the present invention, characteristic pattern step S4 exported inputs full articulamentum, to input picture
Classification predicted, calculate deep neural network to the information content of correct classification and descending arrange, network is proposed by region
Loss function promote the region in step S3 sequence it is consistent with collating sequence herein, for region proposal network supervision is provided
Signal.
In a preferred embodiment of the present invention, in step sl, pretreated specific steps include:
S1.1: original image is resized to by 600*600 using bilateral linear interpolation;
S1.2: random cropping goes out the image block of 448*448 from the image after interpolation;
S1.3: z-score standardization is carried out to the image that step S1.2 is obtained.
In a preferred embodiment of the present invention, the specific steps of step S2 include:
S2.1: a of the 5th convolution module in deep neural network resnet-50, the front end of tri- convolutional layers of b, c
It is separately added into a convolutional layer, the characteristic pattern which exports upper one layer does convolution operation, exports and each adopts on characteristic pattern
The corresponding offset of sampling point, the sampled point in a of the 5th convolution module, tri- convolutional layers of b, c are former sampled point through deviating
Position afterwards;
S2.2: after the offset of sampled point has been determined, a of the 5th convolution module, the spy of tri- convolutional layers of b, c output
The value of each point on figure is levied by formulaIt calculates, in formula, x is the spy of input
Sign figure, y are the characteristic pattern of output, and R is the receptive field of common convolution, p0It is the point exported on characteristic pattern, pnFor in common convolution
Sampled point, Δ pnFor offset, w (pn) be sampled point on weight.
In a preferred embodiment of the present invention, the specific steps of step S3 include:
Down-sampled by 32 times through step S2 input picture, the characteristic pattern size of output is 14*14, in feature pyramid network
In the bottom-up convolutional layer using three layers of 3*3, step-length is respectively 1,2,2, thus generates tri- features of 14*14,7*7 and 4*4
Scale is corresponding to it, and the size of anchor takes 48,96 and 192 respectively, on each scale the length-width ratio value of anchor be 1:1,2:3 and
3:2, lateral connection use the convolutional layer of 1*1.
In a preferred embodiment of the present invention, the specific steps of step S4 include:
S4.1: the information content of each anchor according to step S3 output filters out maximum at least four partial zones of information content
Domain, using bilateral linear interpolation by the size adjusting of topography to 224*224;
S4.2: the step S4.1 topography exported being input in resnet-50 network and does feature extraction, exports feature
Figure.
Further, the specific steps for providing supervisory signals for region include:
Firstly, characteristic pattern is inputted full articulamentum, classification and Information Meter are exported, by class categories and the consistent output of label
It is arranged according to information content size descending;
Secondly, the loss function of definition region proposal network isIn formula, I indicates step
The information content of S3 output, C indicate that the confidence of full articulamentum output, f (x) are nonincreasing function, define f (x)=max (1-x, 0),
Promote the region to contain much information thought in the proposal of region for can correctly classify when classifying in this way, and classification information amount is big,
Thus propose that network provides Weakly supervised signal in the case where not adding additional tags information for region.
The beneficial effects of the present invention are:
(1) present invention solves existing object classification technology when doing fine grit classification task, since environment is complicated, classification
Between the lower problem of accuracy rate caused by difference in subtle class inherited and biggish class, carried out by the image that will acquire pre-
Processing carries out feature extraction using deep neural network, and construction feature pyramid network carries out region proposal, to the area proposed out
Domain carries out cutting and feature extraction, does a subseries using the feature extracted with latter aspect, the accuracy rate of the classification is made
Region is input to for supervisory signals and proposes network, and this feature and full figure Fusion Features are on the other hand sent into full articulamentum and divided
Class exports final classification results;
(2) when the present invention carries out fine grit classification to fish image under complex background, recognition time is short, recognition accuracy
Height is not necessarily to additional tags information, is suitable for promoting and applying.
Detailed description of the invention
Fig. 1 is the flow chart of one preferred embodiment of fish fine grit classification method the present invention is based on deep learning;
Fig. 2 is the schematic diagram of variable convolution in the depth convolutional network;
Fig. 3 is the structure chart that network is proposed in the region.
Specific embodiment
The preferred embodiments of the present invention will be described in detail with reference to the accompanying drawing, so that advantages and features of the invention energy
It is easier to be readily appreciated by one skilled in the art, so as to make a clearer definition of the protection scope of the present invention.
Referring to Fig. 1, the embodiment of the present invention includes:
A kind of fish fine grit classification method based on deep learning, comprising the following steps:
S1: the shooting image that will acquire is pre-processed;Pretreated specific steps include:
S1.1: original image is resized to by 600*600 using bilateral linear interpolation;
S1.2: random cropping goes out the image block of 448*448 from the image after interpolation;
S1.3: z-score standardization is carried out to the image that step S1.2 is obtained.
S2: carrying out feature extraction to pretreated image using deep neural network, is depth mind using variable convolution
Suitable sampled point is chosen through the convolutional layer in network, obtains characteristic pattern;
Preferably, the deep neural network uses resnet-50 network, introduces variable convolution in resnet-50 network,
In conjunction with Fig. 2, the obtained image of step S1.3 is input in the resnet-50 network for be introduced into variable convolution and does feature extraction, it is defeated
Characteristic pattern out, specific steps include:
S2.1: a of the 5th convolution module in deep neural network resnet-50, tri- convolutional layers of b, c
The front end of (conv5_a, b, c) is separately added into a convolutional layer, and the characteristic pattern which exports upper one layer does convolution operation,
The corresponding offset of each sampled point on characteristic pattern is exported, the sampled point in conv5_a, b, c is former sampled point after deviating
Position;
S2.2: after the offset of sampled point has been determined, conv5_a, b, c output characteristic pattern on each point value
By formulaIt calculating, in formula, x is the characteristic pattern of input, and y is the characteristic pattern of output,
R is the receptive field of common convolution, p0It is the point exported on characteristic pattern, pnFor the sampled point in common convolution, Δ pnFor offset, w
(pn) be sampled point on weight.
S3: construction feature pyramid network proposes network as region, sets on each of network characteristic dimension
Anchor is set, the characteristic pattern input feature vector pyramid network that step S2 is exported calculates the information of the corresponding regional area of each anchor
Amount;
Down-sampled by 32 times through step S2 input picture in conjunction with Fig. 3, the characteristic pattern size of output is 14*14, in feature gold
The bottom-up convolutional layer using three layers of 3*3 in word tower network, step-length is respectively 1,2,2, thus generates 14*14,7*7 and 4*4
Three characteristic dimensions, are corresponding to it, and the size of anchor takes 48,96 and 192 respectively, and the length-width ratio value of anchor is 1 on each scale:
1,2:3 and 3:2, lateral connection use the convolutional layer of 1*1.The characteristic pattern that step S2 is exported is inputted into the feature pyramid built
Network calculates the information content in each anchor (anchor) region and descending output.
S4: maximum at least four regional area of information content is taken to be up-sampled, the depth nerve net in input step S2
Network carries out feature extraction to the region extracted, exports characteristic pattern;In the present embodiment, maximum four parts of information content are chosen
Region is up-sampled;Specific steps include:
S4.1: the information content of each anchor according to step S3 output filters out maximum four regional areas of information content, adopts
With bilateral linear interpolation by the size adjusting of topography to 224*224;
S4.2: the step S4.1 topography exported is input to resnet-50 network (with resnet-50 in step S2
Community network parameter) in do feature extraction, export characteristic pattern.
S5: the characteristic pattern that step S4 is exported inputs full articulamentum, predicts the classification of input picture, calculates and introduces
The resnet-50 sorter network of variable convolution arranges the information content and descending of correct classification, proposes the damage of network by region
Function is lost to promote the sequence (contain much information small sequence) in the region in step S3 (that correctly classifies contains much information with sequence herein
Small sequence) sequence consensus, propose that network provides supervisory signals for region;Specific steps include:
S5.1: characteristic pattern is inputted into full articulamentum, classification and Information Meter are exported, by class categories and the consistent output of label
It is arranged according to information content size descending, wherein the label is artificial pre-defined;
S5.2: definition region proposes that the loss function of network isIn formula, I indicates step
The information content of S3 output, C indicate that the confidence of full articulamentum output, f (x) are nonincreasing function, define f (x)=max (1-x, 0),
Promote the region to contain much information thought in the proposal of region for can correctly classify when classifying in this way, and classification information amount is big,
Thus propose that network provides Weakly supervised signal in the case where not adding additional tags information for region.
S6: the characteristic pattern exported in step S2 and step S4 is linked together, and carries out the pre- of classification using full articulamentum
It surveys, calculates loss function and backpropagation using label information, update area proposes network parameter;
The characteristic pattern of S6.1: step S2 output is the characteristic pattern of whole image, and the characteristic pattern of step S4 output is information content
The characteristic pattern of big regional area links together this two parts characteristic pattern, inputs full articulamentum and gives a forecast, output category class
Not;The Classification Loss function of full articulamentum uses cross entropy loss function, i.e.,In formula, C
Presentation class function, R indicate the region that region proposes that network filters out, and X indicates whole picture.
S6.2: total loss function L is calculated using the output of label information and networktotal=LI+λ·L2, λ is super in formula
Parameter generally takes λ=1 according to experiment experience, then carries out backpropagation, updates network parameter;
S7: using the training dataset collected train step S1 to the step S6 network model constructed network weight
Parameter classifies to image using the network model for training network weight parameter, exports result.
The above description is only an embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.