CN115272763A - Bird identification method based on fine-grained feature fusion - Google Patents

Bird identification method based on fine-grained feature fusion Download PDF

Info

Publication number
CN115272763A
CN115272763A CN202210893351.4A CN202210893351A CN115272763A CN 115272763 A CN115272763 A CN 115272763A CN 202210893351 A CN202210893351 A CN 202210893351A CN 115272763 A CN115272763 A CN 115272763A
Authority
CN
China
Prior art keywords
image
bird
scale
neural network
fine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210893351.4A
Other languages
Chinese (zh)
Other versions
CN115272763B (en
Inventor
刘权辉
吕建成
王坚
邬鸿杰
黄树东
叶庆
范锫
刘勇
王海东
郑永康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202210893351.4A priority Critical patent/CN115272763B/en
Publication of CN115272763A publication Critical patent/CN115272763A/en
Application granted granted Critical
Publication of CN115272763B publication Critical patent/CN115272763B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a bird identification method based on fine-grained feature fusion, which relates to the technical field of bird monitoring and comprises the following steps: collecting bird images and species information around the transformer substation as training samples and preprocessing the training samples; putting the preprocessed training sample into a YoloV3 neural network model for pre-training, and putting the bird image to be recognized into the pre-trained model for multi-scale processing to obtain a multi-scale image block combination; performing feature extraction on the multi-scale image block combination based on a convolutional neural network to obtain different scale image features; carrying out feature fusion on the image features of different scales to obtain image joint features; identifying by using a linear SVM multi-classification method based on the image joint characteristics to obtain a bird species identification result; the method determines the bird category by using the multi-classification linear SVM, has low cost, low investment requirement on hardware equipment, no need of consuming a large amount of manpower and material resources, and more accurate recognition result.

Description

Bird identification method based on fine-grained feature fusion
Technical Field
The invention relates to the technical field of bird monitoring, in particular to a bird identification method based on fine-grained feature fusion.
Background
The urban development land is increasingly tense, so that the site selection of the transformer substation is more and more remote, therefore, the damage of bird activities to the operation of the transformer substation becomes gradually obvious, and the method is more prominent particularly when advocating to an artificial unattended transformer substation. The influence of bird damage on the transformer substation disturbs normal operation of equipment at a low probability, and equipment failure is caused at a high probability, so that large-scale power failure is caused, and economic loss and adverse social influence are brought. In order to comprehensively prevent and control the bird damage threat of the transformer substation, the most commonly adopted method is to drive birds, and the core of the method is how to detect and identify the birds and how to adopt corresponding driving measures. The bird recognition is essentially fine-grained image classification, also called sub-category recognition, and is a rapidly-developed sub-field in the field of object recognition.
The traditional method for bird detection and identification needs to be firstly researched in the field to collect a large amount of bird activities and related information of a transformer substation and surrounding areas, such as: and a great deal of data such as bird species, quantity, morphological characteristics, sound characteristics, habit rules, breeding cycles, bird nest habits, population relationships, food chains and the like can be seen to form a bird database. At present, the transformer substation adopts two kinds of modes in the main stream to the discernment of birds: voice recognition and radar recognition; bird voice recognition mainly studies the cry of different birds, which relates to both bird ethology and acoustics, and is a new marginal interdiscipline. Yell is one of important biological features of birds, and related researchers have already performed related research works in the thirties of the last century; by far, the cry of most birds worldwide has been recorded. To study bird cry, a plurality of characteristic parameters need to be extracted from the recorded signal waveform of bird cry, and a detection circuit needs to be adjusted according to corresponding frequency characteristics; the radar identification technology is that a microwave radar module sends detection information through an interface, the linkage of an image identification module is realized, an elliptical warning space monitoring area with eighty meters long, six meters high and fifteen-degree included angle is formed, when birds fly into or pass through a protection area or pass through the protection area, the radar measures the distance and the direction of the birds from the radar, the birds are sent to a background control center through a communication network, the distance and the direction between the birds and the radar are calculated by the control center, and the driving-away equipment at the corresponding position is controlled according to the distance information, so that the birds are driven to leave the protection area.
However, in order to realize bird identification, the conventional bird repelling method needs to collect a large amount of bird activities and related information in the previous period, consumes a large amount of manpower and material resources, has high investment requirements on hardware equipment, but has poor identification precision, and because one of the keys of bird repelling is to adopt different repelling modes aiming at different birds, the bird damage prevention effect is difficult to achieve if the identification precision is too low. Based on this, the present application proposes a bird identification method based on fine-grained feature fusion to solve the above-mentioned problems.
Disclosure of Invention
The invention aims to provide a bird identification method based on fine-grained feature fusion, which can solve the problems.
The technical scheme of the invention is as follows:
the application provides a bird identification method based on fine-grained feature fusion, which comprises the following steps:
s1, collecting bird images and species information around a transformer substation to serve as training samples, and preprocessing the training samples;
s2, putting the preprocessed training sample into a YoloV3 neural network model for pre-training, and putting the bird image to be recognized into the YoloV3 neural network model for multi-scale processing to obtain a multi-scale image block combination;
s3, performing feature extraction on the multi-scale image block combination based on the convolutional neural network to obtain different scale image features;
s4, carrying out feature fusion on the image features of different scales to obtain image joint features;
and S5, identifying by using a linear SVM multi-classification method based on the image joint characteristics to obtain a bird species identification result.
Further, the preprocessing in step S1 includes data normalization, and the formula is:
Figure BDA0003768422250000031
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003768422250000032
representing the normalized data, xcData representing the c-th channel of the input image, μ (x)c) Represents the mean, σ (x), of all data under channel cc) Represents the variance of all data under channel c.
Further, the method for performing multi-scale processing on the bird image to be recognized in the pre-trained YoloV3 neural network model to obtain the multi-scale image block combination in the step S2 includes:
taking an image of birds to be recognized as a first image, inputting the first image into a pre-trained YooloV 3 neural network model to detect the specific positions of the birds, and selecting an image of an area where the birds are in the first image as a second image by using a bounding box in the YooloV 3 neural network model;
intercepting the upper half part of the second image as a third image, and transforming three images with different spatial scales of the first image, the second image and the third image to the same size by using a bilinear interpolation method;
and performing feature fusion on the first image, the second image and the third image to obtain a multi-scale image block combination.
Further, step S3 includes:
inputting the first image, the second image and the third image into three different convolutional neural networks respectively to obtain three kinds of feature vectors;
splicing the three characteristic vectors, and obtaining the probability of each bird based on a full-connection layer;
training to obtain a feature extractor based on an error between the probability of each bird and the true fine-grained category as a supervision signal;
and performing feature extraction on the multi-scale image block combination based on the feature extractor to obtain different-scale image features.
Further, in step S5, the formula for obtaining the bird species identification result by performing identification using the linear SVM multi-classification method is as follows:
Figure BDA0003768422250000041
wherein x isnewFeatures representing bird image data to be identified, f (x)new) The bird identification result is represented, i represents the ith element, M represents the bird species, W represents the weight matrix, T represents the matrix transposition, phi represents the truncation function, and b represents the bias.
Compared with the prior art, the invention at least has the following advantages or beneficial effects:
(1) The invention provides a bird identification method based on fine-grained feature fusion, which is characterized in that a bird identification result is more accurate by utilizing the deep learning of a YoloV3 neural network model;
(2) According to the invention, through carrying out position alignment operation on image blocks with different scales (part levels) of the input bird image, the model can learn the capability of sensing different part levels of the image, and further the feature extraction capability of the model is improved;
(3) Bird images to be recognized are placed in a pre-trained YoloV3 neural network model for multi-scale processing to obtain a plurality of scale image characteristics, and then characteristic alignment and position prediction are carried out, so that the characteristic extraction capability of the YoloV3 neural network model can be enhanced in a training stage, and birds can be well predicted;
(4) The invention has low input cost and low input requirement on hardware equipment, and does not need to consume a large amount of manpower and material resources.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a step diagram of a bird identification method based on fine-grained feature fusion according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
It should be noted that, in this document, the term "comprises/comprising" or any other variation thereof is intended to cover a non-exclusive inclusion, so that a process, a method, an article or an apparatus including a series of elements includes not only those elements but also other elements not explicitly listed or inherent to such a process, a method, an article or an apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
In the description of the present application, it is also to be noted that, unless otherwise explicitly specified or limited, the terms "disposed" and "connected" are to be interpreted broadly, e.g., as being either fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in this application will be understood to be a specific case for those of ordinary skill in the art.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the individual features of the embodiments can be combined with one another without conflict.
Examples
Referring to fig. 1, fig. 1 is a schematic structural block diagram of a bird identification method based on fine-grained feature fusion according to an embodiment of the present application.
The bird identification method based on fine-grained feature fusion provided by the embodiment of the application comprises the following steps:
s1, collecting bird images and species information around a transformer substation to serve as training samples, and preprocessing the training samples;
s2, putting the preprocessed training sample into a YoloV3 neural network model for pre-training, and putting the bird image to be recognized into the YoloV3 neural network model for multi-scale processing to obtain a multi-scale image block combination;
s3, performing feature extraction on the multi-scale image block combination based on the convolutional neural network to obtain different scale image features;
s4, performing feature fusion on the image features of different scales to obtain image joint features;
and S5, identifying by using a linear SVM multi-classification method based on the image joint characteristics to obtain a bird species identification result.
On the basis of collecting bird images and bird species information around a transformer substation, image data can be additionally added according to the species information to expand the data scale, and then all the collected bird images are used as a training data set.
As a preferred embodiment, the preprocessing in step S1 includes data normalization, which is expressed by the following formula:
Figure BDA0003768422250000071
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003768422250000081
representing the normalized data, xcData, μ (x), representing the c-th channel of the input imagec) Represents the mean, σ (x), of all data under channel cc) Represents the variance of all data under channel c.
As a preferred embodiment, the method for performing multi-scale processing on the bird image to be recognized in the pre-trained YoloV3 neural network model in step S2 to obtain a multi-scale image block combination includes:
taking an image of birds to be recognized as a first image, inputting the first image into a pre-trained YooloV 3 neural network model to detect the specific positions of the birds, and selecting an image of an area where the birds are in the first image as a second image by using a bounding box in the YooloV 3 neural network model;
intercepting the upper half part of the second image as a third image, and transforming the three images with different spatial scales of the first image, the second image and the third image to the same size by utilizing a bilinear interpolation method;
and performing feature fusion on the first image, the second image and the third image to obtain a multi-scale image block combination.
As a preferred embodiment, step S3 includes:
inputting the first image, the second image and the third image into three different convolutional neural networks respectively to obtain three kinds of characteristic vectors;
splicing the three characteristic vectors, and obtaining the probability of each bird based on a full-connection layer;
training to obtain a feature extractor based on an error between the probability of each bird and the real fine-grained category as a supervision signal;
and performing feature extraction on the multi-scale image block combination based on the feature extractor to obtain different-scale image features.
The method for extracting the features of the multi-scale image block combination further comprises the following steps:
a method (1) of aligning features of overlapping regions in a first image, a second image and a third image;
the method (2) divides the second image into a plurality of sub-regions, and randomly rearranges the sub-regions to form a new image to replace the second image, thereby accurately predicting the original positions of different rearranged image blocks.
Specifically, the method (1) inputs the first image P1, the second image P2 and the third image P3 into three CNNs with the same structure, so that each layer of the CNN obtains oneEach characteristic diagram is marked as F1i、F2iAnd F3iThe flow of image processing is P1 → P2 → P3, so
Figure BDA0003768422250000091
Order to
Figure BDA0003768422250000092
P3= ω (P2), wherein
Figure BDA0003768422250000093
And ω is a region clipping function that preserves the alignment relationship at the feature level spatial location, thus the penalty function for aligning the overlap region is expressed as:
Figure BDA0003768422250000094
wherein L isalgA loss function representing the overlap region, K representing the number of layers of the convolutional neural network and K =5, i representing the lateral index, F1iIs a feature map of the first image P1, F2iIs a feature map of the second image P2, F3iIs a feature map of the third image P3,
Figure BDA0003768422250000095
and ω are both region intercept functions; thus, the loss function of the alignment of the overlapping area can be used for feature extraction.
In the method (2), an input image, namely a second image P2 is given and is uniformly divided into a square matrix of M multiplied by M small image blocks, the position matrix of the square matrix is R, and each image block is marked as RijWherein i and j respectively represent a transverse index and a longitudinal index, i is more than or equal to 1, and j is more than or equal to M; random position exchange is carried out on partial adjacent image blocks in the image, and for the j-th row of R, a random vector q with the dimension of MjIs generated with the ith element qij= i + r, where r is taken from [ -n, n]Is uniformly distributed, n is more than or equal to 1 and less than M, and defines the domain scope by combining an array qjSequencing to obtain a new rearranged image of the jth row, wherein the rearranged image meets the condition:
Figure BDA0003768422250000101
Where i denotes the lateral index, M denotes the bird species,
Figure BDA0003768422250000102
indicating that the i position of a row in the square matrix is swapped to the j position, and n indicates the evenly distributed range.
In the same way, the columns are rearranged, and likewise, the rearranged image satisfies the condition:
Figure BDA0003768422250000103
where i denotes the lateral index, M denotes the bird species,
Figure BDA0003768422250000104
indicating that the i position of one column in the square matrix is swapped to the j position, and n indicates the range of uniform distribution.
Therefore, the image block at position (i, j) of the original image is rearranged, and the position of the new image is:
Figure BDA0003768422250000105
wherein σ (i, j) represents the position of the image block at the position (i, j) of the original image at the new image after being rearranged,
Figure BDA0003768422250000106
the expression indicates that the i position of a row in the square matrix is swapped to the j position,
Figure BDA0003768422250000107
indicating that the i position of one column in the square matrix is swapped to the j position.
Thus, the classification network LclsThe loss function of (d) can be written as:
Figure BDA0003768422250000108
wherein L isclsRepresenting a classification network, l representing a true fine-grained category, P2 being a second image, phi (P2) representing a "corrupted" version of P2, C representing a probability distribution vector;
combining P2 and its "corrupted" version to form < P2, φ (P2), l > for training, the classification network maps the input image to a probability distribution vector C (P2, θ)cls) Wherein θclsAll learnable parameters in the classification network can be used for predicting the position by utilizing the distribution vector so as to extract the characteristics.
As a preferred embodiment, the formula for performing identification by using the linear SVM multi-classification method in step S5 to obtain bird species identification result is:
Figure BDA0003768422250000111
wherein x isnewFeatures representing bird image data to be identified, f (x)new) The bird identification result is shown, i represents a transverse index, M represents the bird species, W represents a weight matrix, T represents a matrix transposition, phi represents a truncation function, and b represents an offset.
It will be appreciated that the configuration shown in the figures is merely illustrative and that a method of bird identification based on fine-grained feature fusion may also include more or fewer components than shown in the figures, or have a different configuration than shown in the figures. The components shown in the figures may be implemented in hardware, software, or a combination thereof.
In the embodiments provided in the present application, it should be understood that the disclosed system or method may also be implemented in other manners. The embodiments described above are merely illustrative, and the flowcharts and block diagrams in the figures, for example, illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
To sum up, according to the bird identification method based on fine-grained feature fusion provided by the embodiment of the application, on the basis of collecting bird images and bird species information around a transformer substation, image data can be additionally added according to the species information to expand the data scale, then all the collected bird images are used as a training data set to preprocess all the data, then the preprocessed training sample is put into a yoolov 3 neural network model to pre-train, bird images to be identified are put into the pre-trained yoolov 3 neural network model to be subjected to multi-scale processing to obtain a multi-scale image block combination, the multi-scale image block combination is subjected to feature extraction based on a convolutional neural network to obtain different-scale image features, the different-scale image features are subjected to feature fusion to obtain image joint features, and a linear SVM multi-classification method is used for identifying birds based on the image joint features to obtain a bird species identification result.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (5)

1. A bird identification method based on fine-grained feature fusion is characterized by comprising the following steps:
s1, collecting bird images and species information around a transformer substation to serve as training samples, and preprocessing the training samples;
s2, putting the preprocessed training sample into a YoloV3 neural network model for pre-training, and putting bird images to be recognized into the YoloV3 neural network model after pre-training for multi-scale processing to obtain a multi-scale image block combination;
s3, performing feature extraction on the multi-scale image block combination based on the convolutional neural network to obtain different scale image features;
s4, performing feature fusion on the image features of different scales to obtain image joint features;
and S5, identifying by using a linear SVM multi-classification method based on the image joint characteristics to obtain a bird species identification result.
2. The method for identifying birds based on fine-grained feature fusion as claimed in claim 1, wherein the preprocessing in step S1 includes data normalization, and the formula is:
Figure FDA0003768422240000011
wherein the content of the first and second substances,
Figure FDA0003768422240000012
representing the normalized data, xcData, μ (x), representing the c-th channel of the input imagec) Represents the mean, σ (x), of all data under channel cc) Represents the variance of all data under channel c.
3. The bird identification method based on fine-grained feature fusion as claimed in claim 2, wherein the method for performing multi-scale processing on the bird image to be identified in the pre-trained yoolov 3 neural network model to obtain the multi-scale image block combination in the step S2 comprises:
taking an image of birds to be recognized as a first image, inputting the first image into a pre-trained YooloV 3 neural network model to detect the specific positions of the birds, and selecting an image of an area where the birds are in the first image as a second image by using a bounding box in the YooloV 3 neural network model;
intercepting the upper half part of the second image as a third image, and transforming three images with different spatial scales of the first image, the second image and the third image to the same size by utilizing a bilinear interpolation method;
and performing feature fusion on the first image, the second image and the third image to obtain a multi-scale image block combination.
4. The bird identification method based on fine-grained feature fusion as claimed in claim 3, wherein the step S3 comprises:
inputting the first image, the second image and the third image into three different convolutional neural networks respectively to obtain three feature vectors;
splicing the three characteristic vectors, and obtaining the probability of each bird based on a full-connection layer;
training to obtain a feature extractor based on an error between the probability of each bird and the real fine-grained category as a supervision signal;
and performing feature extraction on the multi-scale image block combination based on the feature extractor to obtain different-scale image features.
5. The bird recognition method based on fine-grained feature fusion as claimed in claim 1, wherein the formula for obtaining the bird species recognition result by using the linear SVM multi-classification method in the step S5 is as follows:
Figure FDA0003768422240000031
wherein x isnewFeatures representing bird image data to be identified, f (x)new) Watch (A)And (3) showing a bird identification result, i represents the ith element, M represents the bird species, W represents a weight matrix, T represents matrix transposition, phi represents a truncation function, and b represents bias.
CN202210893351.4A 2022-07-27 2022-07-27 Bird identification method based on fine-grained feature fusion Active CN115272763B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210893351.4A CN115272763B (en) 2022-07-27 2022-07-27 Bird identification method based on fine-grained feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210893351.4A CN115272763B (en) 2022-07-27 2022-07-27 Bird identification method based on fine-grained feature fusion

Publications (2)

Publication Number Publication Date
CN115272763A true CN115272763A (en) 2022-11-01
CN115272763B CN115272763B (en) 2023-04-07

Family

ID=83771151

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210893351.4A Active CN115272763B (en) 2022-07-27 2022-07-27 Bird identification method based on fine-grained feature fusion

Country Status (1)

Country Link
CN (1) CN115272763B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110751195A (en) * 2019-10-12 2020-02-04 西南交通大学 Fine-grained image classification method based on improved YOLOv3
CN111104538A (en) * 2019-12-06 2020-05-05 深圳久凌软件技术有限公司 Fine-grained vehicle image retrieval method and device based on multi-scale constraint
CN111914599A (en) * 2019-05-09 2020-11-10 四川大学 Fine-grained bird recognition method based on semantic information multi-layer feature fusion
CN112507904A (en) * 2020-12-15 2021-03-16 重庆邮电大学 Real-time classroom human body posture detection method based on multi-scale features
CN112560675A (en) * 2020-12-15 2021-03-26 三峡大学 Bird visual target detection method combining YOLO and rotation-fusion strategy
CN112668444A (en) * 2020-12-24 2021-04-16 南京泓图人工智能技术研究院有限公司 Bird detection and identification method based on YOLOv5
CN113076861A (en) * 2021-03-30 2021-07-06 南京大学环境规划设计研究院集团股份公司 Bird fine-granularity identification method based on second-order features
WO2021135499A1 (en) * 2020-06-08 2021-07-08 平安科技(深圳)有限公司 Damage detection model training and vehicle damage detection methods, device, apparatus, and medium
CN113516156A (en) * 2021-04-13 2021-10-19 浙江工业大学 Fine-grained image classification method based on multi-source information fusion
CN113989662A (en) * 2021-10-18 2022-01-28 中国电子科技集团公司第五十二研究所 Remote sensing image fine-grained target identification method based on self-supervision mechanism
CN114067360A (en) * 2021-11-16 2022-02-18 国网上海市电力公司 Pedestrian attribute detection method and device
WO2022083784A1 (en) * 2020-10-23 2022-04-28 西安科锐盛创新科技有限公司 Road detection method based on internet of vehicles

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914599A (en) * 2019-05-09 2020-11-10 四川大学 Fine-grained bird recognition method based on semantic information multi-layer feature fusion
CN110751195A (en) * 2019-10-12 2020-02-04 西南交通大学 Fine-grained image classification method based on improved YOLOv3
CN111104538A (en) * 2019-12-06 2020-05-05 深圳久凌软件技术有限公司 Fine-grained vehicle image retrieval method and device based on multi-scale constraint
WO2021135499A1 (en) * 2020-06-08 2021-07-08 平安科技(深圳)有限公司 Damage detection model training and vehicle damage detection methods, device, apparatus, and medium
WO2022083784A1 (en) * 2020-10-23 2022-04-28 西安科锐盛创新科技有限公司 Road detection method based on internet of vehicles
CN112507904A (en) * 2020-12-15 2021-03-16 重庆邮电大学 Real-time classroom human body posture detection method based on multi-scale features
CN112560675A (en) * 2020-12-15 2021-03-26 三峡大学 Bird visual target detection method combining YOLO and rotation-fusion strategy
CN112668444A (en) * 2020-12-24 2021-04-16 南京泓图人工智能技术研究院有限公司 Bird detection and identification method based on YOLOv5
CN113076861A (en) * 2021-03-30 2021-07-06 南京大学环境规划设计研究院集团股份公司 Bird fine-granularity identification method based on second-order features
CN113516156A (en) * 2021-04-13 2021-10-19 浙江工业大学 Fine-grained image classification method based on multi-source information fusion
CN113989662A (en) * 2021-10-18 2022-01-28 中国电子科技集团公司第五十二研究所 Remote sensing image fine-grained target identification method based on self-supervision mechanism
CN114067360A (en) * 2021-11-16 2022-02-18 国网上海市电力公司 Pedestrian attribute detection method and device

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
KUIHE YANG ET AL.: "Deep Learning-Based Object Detection Improvement for Fine-Grained Birds" *
朱泽群: "DCNN在鸟类目标识别中的应用研究" *
李思瑶;刘宇红;张荣芬;: "基于迁移学习与模型融合的犬种识别方法" *
汪洋: "基于深度学习的细粒度鸟类识别方法研究与***实现" *
谢娟英;侯琦;史颖欢;吕鹏;景丽萍;庄福振;张军平;谭晓阳;许升全;: "蝴蝶种类自动识别研究" *
边小勇;江沛龄;赵敏;丁胜;张晓龙;: "基于多分支神经网络模型的弱监督细粒度图像分类方法" *

Also Published As

Publication number Publication date
CN115272763B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Linardos et al. Machine learning in disaster management: recent developments in methods and applications
Goodwin et al. Unlocking the potential of deep learning for marine ecology: overview, applications, and outlook
Guan et al. Deep learning-based tree classification using mobile LiDAR data
Fdez-Riverola et al. CBR based system for forecasting red tides
Yang et al. Fine-grained image classification for crop disease based on attention mechanism
Botella et al. Species distribution modeling based on the automated identification of citizen observations
CN111368886A (en) Sample screening-based label-free vehicle picture classification method
Fu et al. Cloud detection for FY meteorology satellite based on ensemble thresholds and random forests approach
CN106447066A (en) Big data feature extraction method and device
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
Comber et al. Using semantics to clarify the conceptual confusion between land cover and land use: the example of ‘forest’
Wang et al. Hierarchical instance recognition of individual roadside trees in environmentally complex urban areas from UAV laser scanning point clouds
CN110276351A (en) Multilingual scene text detection and recognition methods
Anbananthen et al. An intelligent decision support system for crop yield prediction using hybrid machine learning algorithms
Dunkin et al. A spatially explicit, multi-criteria decision support model for loggerhead sea turtle nesting habitat suitability: a remote sensing-based approach
CN109472733A (en) Image latent writing analysis method based on convolutional neural networks
Zhang et al. Object-based classification framework of remote sensing images with graph convolutional networks
Barnes et al. This looks like that there: Interpretable neural networks for image tasks when location matters
Lunga et al. Resflow: A remote sensing imagery data-flow for improved model generalization
Choi et al. Semi-supervised target classification in multi-frequency echosounder data
Ye et al. Aerial scene classification via an ensemble extreme learning machine classifier based on discriminative hybrid convolutional neural networks features
Jiang et al. Forestry digital twin with machine learning in Landsat 7 data
CN115456166A (en) Knowledge distillation method for neural network classification model of passive domain data
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net
Mohamed et al. Improvement of 3D LiDAR point cloud classification of urban road environment based on random forest classifier

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant