CN115272763A

CN115272763A - Bird identification method based on fine-grained feature fusion

Info

Publication number: CN115272763A
Application number: CN202210893351.4A
Authority: CN
Inventors: 刘权辉; 吕建成; 王坚; 邬鸿杰; 黄树东; 叶庆; 范锫; 刘勇; 王海东; 郑永康
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2022-07-27
Filing date: 2022-07-27
Publication date: 2022-11-01
Anticipated expiration: 2042-07-27
Also published as: CN115272763B

Abstract

The invention provides a bird identification method based on fine-grained feature fusion, which relates to the technical field of bird monitoring and comprises the following steps: collecting bird images and species information around the transformer substation as training samples and preprocessing the training samples; putting the preprocessed training sample into a YoloV3 neural network model for pre-training, and putting the bird image to be recognized into the pre-trained model for multi-scale processing to obtain a multi-scale image block combination; performing feature extraction on the multi-scale image block combination based on a convolutional neural network to obtain different scale image features; carrying out feature fusion on the image features of different scales to obtain image joint features; identifying by using a linear SVM multi-classification method based on the image joint characteristics to obtain a bird species identification result; the method determines the bird category by using the multi-classification linear SVM, has low cost, low investment requirement on hardware equipment, no need of consuming a large amount of manpower and material resources, and more accurate recognition result.

Description

Bird identification method based on fine-grained feature fusion

Technical Field

The invention relates to the technical field of bird monitoring, in particular to a bird identification method based on fine-grained feature fusion.

Background

The urban development land is increasingly tense, so that the site selection of the transformer substation is more and more remote, therefore, the damage of bird activities to the operation of the transformer substation becomes gradually obvious, and the method is more prominent particularly when advocating to an artificial unattended transformer substation. The influence of bird damage on the transformer substation disturbs normal operation of equipment at a low probability, and equipment failure is caused at a high probability, so that large-scale power failure is caused, and economic loss and adverse social influence are brought. In order to comprehensively prevent and control the bird damage threat of the transformer substation, the most commonly adopted method is to drive birds, and the core of the method is how to detect and identify the birds and how to adopt corresponding driving measures. The bird recognition is essentially fine-grained image classification, also called sub-category recognition, and is a rapidly-developed sub-field in the field of object recognition.

The traditional method for bird detection and identification needs to be firstly researched in the field to collect a large amount of bird activities and related information of a transformer substation and surrounding areas, such as: and a great deal of data such as bird species, quantity, morphological characteristics, sound characteristics, habit rules, breeding cycles, bird nest habits, population relationships, food chains and the like can be seen to form a bird database. At present, the transformer substation adopts two kinds of modes in the main stream to the discernment of birds: voice recognition and radar recognition; bird voice recognition mainly studies the cry of different birds, which relates to both bird ethology and acoustics, and is a new marginal interdiscipline. Yell is one of important biological features of birds, and related researchers have already performed related research works in the thirties of the last century; by far, the cry of most birds worldwide has been recorded. To study bird cry, a plurality of characteristic parameters need to be extracted from the recorded signal waveform of bird cry, and a detection circuit needs to be adjusted according to corresponding frequency characteristics; the radar identification technology is that a microwave radar module sends detection information through an interface, the linkage of an image identification module is realized, an elliptical warning space monitoring area with eighty meters long, six meters high and fifteen-degree included angle is formed, when birds fly into or pass through a protection area or pass through the protection area, the radar measures the distance and the direction of the birds from the radar, the birds are sent to a background control center through a communication network, the distance and the direction between the birds and the radar are calculated by the control center, and the driving-away equipment at the corresponding position is controlled according to the distance information, so that the birds are driven to leave the protection area.

However, in order to realize bird identification, the conventional bird repelling method needs to collect a large amount of bird activities and related information in the previous period, consumes a large amount of manpower and material resources, has high investment requirements on hardware equipment, but has poor identification precision, and because one of the keys of bird repelling is to adopt different repelling modes aiming at different birds, the bird damage prevention effect is difficult to achieve if the identification precision is too low. Based on this, the present application proposes a bird identification method based on fine-grained feature fusion to solve the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a bird identification method based on fine-grained feature fusion, which can solve the problems.

The technical scheme of the invention is as follows:

the application provides a bird identification method based on fine-grained feature fusion, which comprises the following steps:

s1, collecting bird images and species information around a transformer substation to serve as training samples, and preprocessing the training samples;

s2, putting the preprocessed training sample into a YoloV3 neural network model for pre-training, and putting the bird image to be recognized into the YoloV3 neural network model for multi-scale processing to obtain a multi-scale image block combination;

s3, performing feature extraction on the multi-scale image block combination based on the convolutional neural network to obtain different scale image features;

s4, carrying out feature fusion on the image features of different scales to obtain image joint features;

and S5, identifying by using a linear SVM multi-classification method based on the image joint characteristics to obtain a bird species identification result.

Further, the preprocessing in step S1 includes data normalization, and the formula is:

wherein, the first and the second end of the pipe are connected with each other,

representing the normalized data, x_cData representing the c-th channel of the input image, μ (x)_c) Represents the mean, σ (x), of all data under channel c_c) Represents the variance of all data under channel c.

Further, the method for performing multi-scale processing on the bird image to be recognized in the pre-trained YoloV3 neural network model to obtain the multi-scale image block combination in the step S2 includes:

taking an image of birds to be recognized as a first image, inputting the first image into a pre-trained YooloV 3 neural network model to detect the specific positions of the birds, and selecting an image of an area where the birds are in the first image as a second image by using a bounding box in the YooloV 3 neural network model;

intercepting the upper half part of the second image as a third image, and transforming three images with different spatial scales of the first image, the second image and the third image to the same size by using a bilinear interpolation method;

and performing feature fusion on the first image, the second image and the third image to obtain a multi-scale image block combination.

Further, step S3 includes:

inputting the first image, the second image and the third image into three different convolutional neural networks respectively to obtain three kinds of feature vectors;

splicing the three characteristic vectors, and obtaining the probability of each bird based on a full-connection layer;

training to obtain a feature extractor based on an error between the probability of each bird and the true fine-grained category as a supervision signal;

and performing feature extraction on the multi-scale image block combination based on the feature extractor to obtain different-scale image features.

Further, in step S5, the formula for obtaining the bird species identification result by performing identification using the linear SVM multi-classification method is as follows:

wherein x is_newFeatures representing bird image data to be identified, f (x)_new) The bird identification result is represented, i represents the ith element, M represents the bird species, W represents the weight matrix, T represents the matrix transposition, phi represents the truncation function, and b represents the bias.

Compared with the prior art, the invention at least has the following advantages or beneficial effects:

(1) The invention provides a bird identification method based on fine-grained feature fusion, which is characterized in that a bird identification result is more accurate by utilizing the deep learning of a YoloV3 neural network model;

(2) According to the invention, through carrying out position alignment operation on image blocks with different scales (part levels) of the input bird image, the model can learn the capability of sensing different part levels of the image, and further the feature extraction capability of the model is improved;

(3) Bird images to be recognized are placed in a pre-trained YoloV3 neural network model for multi-scale processing to obtain a plurality of scale image characteristics, and then characteristic alignment and position prediction are carried out, so that the characteristic extraction capability of the YoloV3 neural network model can be enhanced in a training stage, and birds can be well predicted;

(4) The invention has low input cost and low input requirement on hardware equipment, and does not need to consume a large amount of manpower and material resources.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a step diagram of a bird identification method based on fine-grained feature fusion according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

It should be noted that, in this document, the term "comprises/comprising" or any other variation thereof is intended to cover a non-exclusive inclusion, so that a process, a method, an article or an apparatus including a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such a process, a method, an article or an apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

In the description of the present application, it is also to be noted that, unless otherwise explicitly specified or limited, the terms "disposed" and "connected" are to be interpreted broadly, e.g., as being either fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in this application will be understood to be a specific case for those of ordinary skill in the art.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.

Examples

Referring to fig. 1, fig. 1 is a schematic structural block diagram of a bird identification method based on fine-grained feature fusion according to an embodiment of the present application.

The bird identification method based on fine-grained feature fusion provided by the embodiment of the application comprises the following steps:

s4, performing feature fusion on the image features of different scales to obtain image joint features;

On the basis of collecting bird images and bird species information around a transformer substation, image data can be additionally added according to the species information to expand the data scale, and then all the collected bird images are used as a training data set.

As a preferred embodiment, the preprocessing in step S1 includes data normalization, which is expressed by the following formula:

representing the normalized data, x_cData, μ (x), representing the c-th channel of the input image_c) Represents the mean, σ (x), of all data under channel c_c) Represents the variance of all data under channel c.

As a preferred embodiment, the method for performing multi-scale processing on the bird image to be recognized in the pre-trained YoloV3 neural network model in step S2 to obtain a multi-scale image block combination includes:

intercepting the upper half part of the second image as a third image, and transforming the three images with different spatial scales of the first image, the second image and the third image to the same size by utilizing a bilinear interpolation method;

As a preferred embodiment, step S3 includes:

inputting the first image, the second image and the third image into three different convolutional neural networks respectively to obtain three kinds of characteristic vectors;

training to obtain a feature extractor based on an error between the probability of each bird and the real fine-grained category as a supervision signal;

The method for extracting the features of the multi-scale image block combination further comprises the following steps:

a method (1) of aligning features of overlapping regions in a first image, a second image and a third image;

the method (2) divides the second image into a plurality of sub-regions, and randomly rearranges the sub-regions to form a new image to replace the second image, thereby accurately predicting the original positions of different rearranged image blocks.

Specifically, the method (1) inputs the first image P1, the second image P2 and the third image P3 into three CNNs with the same structure, so that each layer of the CNN obtains oneEach characteristic diagram is marked as F_1i、F_2iAnd F_3iThe flow of image processing is P1 → P2 → P3, so

Order to

P3= ω (P2), wherein

And ω is a region clipping function that preserves the alignment relationship at the feature level spatial location, thus the penalty function for aligning the overlap region is expressed as:

wherein L is_algA loss function representing the overlap region, K representing the number of layers of the convolutional neural network and K =5, i representing the lateral index, F_1iIs a feature map of the first image P1, F_2iIs a feature map of the second image P2, F_3iIs a feature map of the third image P3,

and ω are both region intercept functions; thus, the loss function of the alignment of the overlapping area can be used for feature extraction.

In the method (2), an input image, namely a second image P2 is given and is uniformly divided into a square matrix of M multiplied by M small image blocks, the position matrix of the square matrix is R, and each image block is marked as R_ijWherein i and j respectively represent a transverse index and a longitudinal index, i is more than or equal to 1, and j is more than or equal to M; random position exchange is carried out on partial adjacent image blocks in the image, and for the j-th row of R, a random vector q with the dimension of M_jIs generated with the ith element q_ij= i + r, where r is taken from [ -n, n]Is uniformly distributed, n is more than or equal to 1 and less than M, and defines the domain scope by combining an array q_jSequencing to obtain a new rearranged image of the jth row, wherein the rearranged image meets the condition：

Where i denotes the lateral index, M denotes the bird species,

indicating that the i position of a row in the square matrix is swapped to the j position, and n indicates the evenly distributed range.

In the same way, the columns are rearranged, and likewise, the rearranged image satisfies the condition:

where i denotes the lateral index, M denotes the bird species,

indicating that the i position of one column in the square matrix is swapped to the j position, and n indicates the range of uniform distribution.

Therefore, the image block at position (i, j) of the original image is rearranged, and the position of the new image is:

wherein σ (i, j) represents the position of the image block at the position (i, j) of the original image at the new image after being rearranged,

the expression indicates that the i position of a row in the square matrix is swapped to the j position,

indicating that the i position of one column in the square matrix is swapped to the j position.

Thus, the classification network L_clsThe loss function of (d) can be written as:

wherein L is_clsRepresenting a classification network, l representing a true fine-grained category, P2 being a second image, phi (P2) representing a "corrupted" version of P2, C representing a probability distribution vector;

combining P2 and its "corrupted" version to form < P2, φ (P2), l > for training, the classification network maps the input image to a probability distribution vector C (P2, θ)_cls) Wherein θ_clsAll learnable parameters in the classification network can be used for predicting the position by utilizing the distribution vector so as to extract the characteristics.

As a preferred embodiment, the formula for performing identification by using the linear SVM multi-classification method in step S5 to obtain bird species identification result is:

wherein x is_newFeatures representing bird image data to be identified, f (x)_new) The bird identification result is shown, i represents a transverse index, M represents the bird species, W represents a weight matrix, T represents a matrix transposition, phi represents a truncation function, and b represents an offset.

It will be appreciated that the configuration shown in the figures is merely illustrative and that a method of bird identification based on fine-grained feature fusion may also include more or fewer components than shown in the figures, or have a different configuration than shown in the figures. The components shown in the figures may be implemented in hardware, software, or a combination thereof.

In the embodiments provided in the present application, it should be understood that the disclosed system or method may also be implemented in other manners. The embodiments described above are merely illustrative, and the flowcharts and block diagrams in the figures, for example, illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

To sum up, according to the bird identification method based on fine-grained feature fusion provided by the embodiment of the application, on the basis of collecting bird images and bird species information around a transformer substation, image data can be additionally added according to the species information to expand the data scale, then all the collected bird images are used as a training data set to preprocess all the data, then the preprocessed training sample is put into a yoolov 3 neural network model to pre-train, bird images to be identified are put into the pre-trained yoolov 3 neural network model to be subjected to multi-scale processing to obtain a multi-scale image block combination, the multi-scale image block combination is subjected to feature extraction based on a convolutional neural network to obtain different-scale image features, the different-scale image features are subjected to feature fusion to obtain image joint features, and a linear SVM multi-classification method is used for identifying birds based on the image joint features to obtain a bird species identification result.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A bird identification method based on fine-grained feature fusion is characterized by comprising the following steps:

s2, putting the preprocessed training sample into a YoloV3 neural network model for pre-training, and putting bird images to be recognized into the YoloV3 neural network model after pre-training for multi-scale processing to obtain a multi-scale image block combination;

2. The method for identifying birds based on fine-grained feature fusion as claimed in claim 1, wherein the preprocessing in step S1 includes data normalization, and the formula is:

wherein the content of the first and second substances,

3. The bird identification method based on fine-grained feature fusion as claimed in claim 2, wherein the method for performing multi-scale processing on the bird image to be identified in the pre-trained yoolov 3 neural network model to obtain the multi-scale image block combination in the step S2 comprises:

intercepting the upper half part of the second image as a third image, and transforming three images with different spatial scales of the first image, the second image and the third image to the same size by utilizing a bilinear interpolation method;

4. The bird identification method based on fine-grained feature fusion as claimed in claim 3, wherein the step S3 comprises:

inputting the first image, the second image and the third image into three different convolutional neural networks respectively to obtain three feature vectors;

5. The bird recognition method based on fine-grained feature fusion as claimed in claim 1, wherein the formula for obtaining the bird species recognition result by using the linear SVM multi-classification method in the step S5 is as follows:

wherein x is_newFeatures representing bird image data to be identified, f (x)_new) Watch (A)And (3) showing a bird identification result, i represents the ith element, M represents the bird species, W represents a weight matrix, T represents matrix transposition, phi represents a truncation function, and b represents bias.