CN110399868B - Coastal wetland bird detection method - Google Patents

Coastal wetland bird detection method Download PDF

Info

Publication number
CN110399868B
CN110399868B CN201810354126.7A CN201810354126A CN110399868B CN 110399868 B CN110399868 B CN 110399868B CN 201810354126 A CN201810354126 A CN 201810354126A CN 110399868 B CN110399868 B CN 110399868B
Authority
CN
China
Prior art keywords
area
foreground
size
bird
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810354126.7A
Other languages
Chinese (zh)
Other versions
CN110399868A (en
Inventor
邹月娴
关文婕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN201810354126.7A priority Critical patent/CN110399868B/en
Publication of CN110399868A publication Critical patent/CN110399868A/en
Application granted granted Critical
Publication of CN110399868B publication Critical patent/CN110399868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for detecting birds in a coastal wetland, which is characterized in that a convolutional neural network is utilized, and through feature fusion, detail information beneficial to small-size target positioning and high-level semantic information beneficial to identification are fused into a feature map with high resolution; obtaining an interested area through an interested area generation network; obtaining a foreground area through an area object network; and further screening out an interested area in the foreground area, thereby realizing the bird detection and identification of the coastal wetland. The method can solve the problem that the detection performance of a large number of small-size birds at a distant view is poor in the coastal wetland bird detection in the prior art, and can greatly improve the detection success rate and accuracy of the small-size targets in the coastal wetland bird detection.

Description

Coastal wetland bird detection method
Technical Field
The invention relates to a target detection technology and a coastal wetland bird protection technology in computer vision, in particular to a coastal wetland bird detection method.
Background
The coastal wetland is a residence place of birds, and the distribution, the quantity, the biological diversity and other characteristics of the birds are related to the altitude, the soil humidity, the nitrogen gradient, the landscape index and other inorganic environmental factors in the wetland ecosystem, and are related to the biological integrity of the ecosystem. Therefore, the ecological parameters of birds are often used as evaluation indexes for site selection of protected areas, the integrity of the ecosystem and the health of the ecosystem. Therefore, an effective and simple bird diversity evaluation method is sought, and the method is crucial to timely understanding of wetland ecological environment quality and change information. However, the traditional working modes of 'long-term squat, hidden observation and periodic nest checking' are still adopted in the existing coastal wetland bird monitoring, and the continuity, the credibility and the timeliness of the obtained bird information data are poor. Therefore, by means of a target detection technology in computer vision, birds are automatically detected, the number and the types of the birds are counted for a long time, automation and digitization of bird activity records are achieved, labor cost can be greatly reduced, a scientific method is provided for coastal wetland protection and recovery, and the method has important application value.
With the development of deep learning, the target detection algorithm based on deep learning has a good effect in many applications. In the algorithms, deep semantic information capable of reflecting the essence of an image is extracted from an original image by using a convolutional neural network, and then the information is classified, so that a final detection result is obtained. In the coastal wetland bird detection task, due to the protection of birds, data acquisition equipment is often placed at a place far away from a bird residence place, so that a large number of small-size bird targets exist at a distant view in acquired videos or pictures. The existing target detection technology has poor detection performance on small targets, is easy to miss detection, has poor detection effect on small-size birds at distant scenes, and is difficult to apply to coastal wetland bird detection tasks.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides the method for detecting the birds in the coastal wetlands, which can solve the problem that the detection performance of the prior art on a large number of small-size birds at distant scenes is poor in the detection of the birds in the coastal wetlands, and can greatly improve the detection success rate and accuracy of the birds in the coastal wetlands for detecting small-size targets.
The method adopts the following technical scheme:
a coastal wetland bird detection method utilizes a convolutional neural network to fuse detail information beneficial to small-size target positioning and high-level semantic information beneficial to identification into a high-resolution feature map through feature fusion; obtaining an interested area through an interested area generation network; obtaining a foreground area through a regional object network, and further screening out an interested area in the foreground area; therefore, the detection success rate and the accuracy rate of detecting small and medium-sized targets by the coastal wetland birds are improved; the method comprises the following steps:
A. through feature fusion, the high-resolution feature expression (feature map) of the whole picture containing high-level semantic information and detail information is obtained, and the implementation method comprises the following steps:
A1. inputting the bird pictures of the coastal wetland into a convolutional neural network, and obtaining a characteristic diagram (characteristic expression) of each stage through convolution operation of four stages;
A2. selecting the feature map of the third stage and the feature map of the fourth stage in the step A1 for feature fusion to obtain a high-resolution feature expression (feature map) of the whole picture containing high-level semantic information and detail information;
B. the method for generating the network by using the interesting regions and extracting the multiple interesting regions comprises the following steps:
B1. according to the ratio 1/n of the size of the high-resolution feature map obtained in the step A to the size of the original image, generating a plurality of candidate frames with different sizes and aspect ratios in the original image every n pixel points; establishing mapping between the high-resolution feature map and the candidate frame;
B2. and B, according to the high-resolution feature map obtained in the step A, calculating a region-of-interest generation network (a network structure diagram is shown in figure 3), and obtaining the score of each position candidate frame predicted as a foreground (containing the bird target) and a candidate frame translation scaling parameter.
B3. And according to the translation scaling parameters, carrying out translation scaling on each candidate frame to obtain an interested area containing the bird target in the picture.
B4. And (3) training by adopting a supervised learning method in machine learning, and during training, evaluating classification results of the candidate frames at all positions and translation and scaling results of the candidate frames by adopting a cross entropy loss function and a SmoothL1 loss function. The loss function is shown by the following equation:
Figure BDA0001634182830000021
wherein, i is a candidate frame index value and represents the ith candidate frame. p is a radical of i Is to predict the probability that the object in the candidate box is a bird. Authentic mark
Figure BDA0001634182830000022
Is 1, if the candidate box contains birds, otherwise is 0. t is t i For the predicted panning scaling parameters of the candidate frame,
Figure BDA0001634182830000023
the scaling parameter is the true translation. L is a radical of an alcohol cls The difference between the prediction probability and the true marker is evaluated as a cross entropy loss function. The formula is as follows:
Figure BDA0001634182830000024
L reg the predicted comment zoom parameter is evaluated for the difference from the true pan zoom parameter for the SmoothL1 loss function. L is reg The formula is as follows:
Figure BDA0001634182830000031
in addition, N cls And N reg Is a normalization parameter and λ is a balance parameter that balances the two part loss function.
C. Selecting a plurality of foreground areas with target birds by adopting a regional object network, wherein the realization method comprises the following steps:
C1. and B, obtaining an object diagram by calculating the regional object network (the network structure diagram is shown in figure 4) according to the high-resolution feature diagram obtained in the step A. The pixel value range of each pixel in the object map is (0,1), which indicates the predicted probability value of whether the object region contains an avian target (foreground/background). And according to the probability value, taking the object area with the probability value larger than 0.5 as the foreground object area.
C2. The size of the object region is determined: according to the size ratio 1/n of the high-resolution feature map and the original image obtained in the step A, dividing the original image into a plurality of object areas every n pixel points of the original image, and establishing mapping between the high-resolution feature map and the object areas;
C3. during training, if the area of the overlapping region of the object region and the foreground object exceeds 70% of the area of the object region, the object region is considered as the foreground region, otherwise, the object region is considered as the background region. The loss function of the foreground/background prediction of the evaluation object area is shown as the following formula:
Figure BDA0001634182830000032
wherein, i is the index value of the object region and represents the ith object region. p is a radical of formula i A probability value is predicted for the object region obtained in C1,
Figure BDA0001634182830000033
is the true label of the object region (foreground is 1, background is 0). N is a radical of cls Representing the number of object regions in the image. L is cls The formula is a cross entropy function, and is shown in formula 2.
D. Combining the foreground region of interest obtained in the step B3 with the foreground object region obtained in the step C1, and reserving the region of interest at the position of the foreground region;
E. and B, according to the mapping relation between the region of interest obtained in the step D in the input coastal wetland bird picture and the high-resolution feature map obtained in the step A, finding out feature frames corresponding to the region of interest in the high-resolution feature map, and unifying the feature frames to a fixed size.
F. And performing convolution operation on the feature frames through a plurality of convolution layers and pooling operation on the pooling layers to obtain feature vectors with fixed sizes, calculating the feature vectors to obtain scores predicted as birds and translation scaling parameters of the region of interest, and further obtaining a final identification frame through the translation scaling parameters of the region of interest.
G. And (4) performing non-maximum suppression treatment on all recognition results (classification scores and recognition frames) obtained in the step (F) to generate a final coastal wetland bird target detection and recognition result, so that the coastal wetland bird target can be recognized.
Compared with the prior art, the invention has the beneficial effects that:
in the convolutional network, the high-level feature map contains rich high-level semantic information, which is beneficial to classifying objects, but small-size targets occupy fewer pixel points in the feature map, so that the small-size targets are difficult to identify and position. By introducing the feature fusion method, the method disclosed by the invention fuses the detail information beneficial to positioning the small-size bird target in the distant view coastal wetland and the high-level semantic information beneficial to bird target identification into one feature map, so that the detection success rate and accuracy of the small-size bird target in the distant view can be greatly improved. On the other hand, the method can obtain the foreground (including the coastal wetland birds) area through the regional object network, further screen out the interested area in the foreground area, greatly reduce the redundant interested area in the background area, relieve the problem of unbalance quantity between the interested area in the background area and the interested area in the foreground area, and improve the generalization capability of the model.
Drawings
FIG. 1 is a block diagram of the flow of bird detection method in coastal wetland.
FIG. 2 is a schematic diagram of a feature fusion process in an embodiment of the invention;
wherein, F' is the characteristic of a lower layer, and the dimension is expanded to 1024 dimensions through convolution operation of 1 × 1; f is the characteristic of a higher layer, and the characteristic size is expanded to 2 times through deconvolution operation of 2 multiplied by 2; f fuse And performing point-to-point addition after respective operations on the F' and the F for characteristic fusion to obtain final output characteristics, and performing 1 × 1 convolution operation.
FIG. 3 is a schematic diagram of an area of interest generation network architecture in which the present invention may be implemented;
wherein (a) is a high resolution profile, obtainable in step a; (b) is the intermediate feature resulting from (a) a convolution operation by 3x 3; (c) classifying scores (foreground/background) and 4 panning scaling parameters for each candidate box resulting from (b) two 1x1 convolution operations, respectively; num _ anchors is the number of candidate frames generated by each pixel point in step B1; conv is a convolution operation.
FIG. 4 is a schematic diagram of a domain object network architecture in the practice of the present invention;
wherein, (a) is a high resolution feature map; (b) is a foreground/background object map; (c) is an object diagram; conv is the convolution operation; ReLu is a linear rectification activation function, and the formula is as follows: f (x) max (0, x).
Detailed Description
The invention will be further described by way of examples, without in any way limiting the scope of the invention, with reference to the accompanying drawings.
The invention provides a method for detecting birds in a coastal wetland, which is characterized in that a convolutional neural network is utilized, and through feature fusion, detail information beneficial to small-size target positioning and high-level semantic information beneficial to identification are fused into a high-resolution feature map; obtaining an interested area through an area suggestion network; obtaining a foreground area through a regional object network, and further screening out an interested area in the foreground area; therefore, the detection success rate and the accuracy rate of detecting small and medium-sized targets by the coastal wetland birds are improved;
fig. 1 is a block flow diagram illustrating a method for detecting birds in a coastal wetland according to an exemplary embodiment of the present invention. The method can be applied to a PC (personal computer), and can also be applied to mobile terminal equipment such as a mobile phone, a tablet personal computer and the like, without limitation.
In this embodiment, an RGB 3 channel image with any size is used as an input, and the image may be an image of each frame captured in a video, or a photographed picture, which is not limited herein. As shown in fig. 1, the embodiment of the present invention includes the following steps:
A. inputting an image to be detected, and acquiring a feature expression (feature map) of the whole image containing high-level semantic information and detail information through a feature fusion module, wherein the implementation method comprises the following steps:
A1. carrying out convolution operation on the input coastal wetland bird pictures to be detected in the step for 4 stages to obtain a characteristic diagram of each stage;
the convolution operation that generates feature maps of the same size is referred to as the same stage of convolution operation, i.e., the feature maps generated by the convolution operation at each stage are of different sizes. The convolution operation of each stage is defined according to different convolution neural network structures, and the invention does not define the convolution neural network structure of each stage in detail.
It should be noted that different features can be obtained through convolution operations of different combinations, and in a possible implementation manner, the feature map can be extracted through a convolution neural Network with a structure such as Deep Residual Network (ResNet), which is not limited herein. The convolution operation is equivalent to a nonlinear mapping in mathematics, and different convolution parameters can obtain different calculation results. The features obtained by the convolution operation correspond to the calculation results obtained by the convolution operation. In the convolutional neural network, the parameters of each convolution operation are learned through back propagation training, and the learned parameters of each network structure are different.
A2. And selecting one group of feature maps F in the fourth stage and one group of feature maps F' in the third stage in the step A1 as the input of the feature fusion module.
A3. FIG. 2 is a schematic diagram of a feature fusion module. As shown in fig. 2, the dimension F 'is first expanded to be the same as the dimension F, the dimension F is then expanded to be the same as the dimension F', the two groups of feature maps with the same dimension and dimension are added point to point, and the final output feature map F is obtained through fusion processing fuse . The feature map dimensions are determined in particular by the network structure.
In one possible implementation, the F' dimension may be augmented by a convolution operation using a convolution kernel of size 1x1, the size of F may be augmented by a deconvolution operation using a convolution kernel of size 1x1, and the post-additive fusion process may be implemented by a convolution operation using a convolution kernel of size 1x 1.
B. A plurality of interested areas are extracted by adopting an interested area generating network, and a network structure diagram is shown in figure 3, and the realization method is as follows;
B1. according to the characteristic diagram F in the step A fuse The ratio of the size of the original image to the size of the original image is 1/n, and a plurality of candidate frames with different sizes and aspect ratios are generated in the original image every n pixel points.
As a possible implementation, the size of the candidate box may be 4 scales 32 2 ,64 2 ,128 2 ,512 2 The aspect ratio of the candidate box may be {1:1,1:2,1:0.5 }.
B2. According to the mapping relation between the candidate frames in the original image and the feature map, in the stepHigh-resolution feature map F with high-level semantic information obtained in A fuse Finding out the feature frames corresponding to the candidate frames, and calculating the background/foreground classification score and smoothL1 loss function of the feature frames through a cross entropy loss function to calculate the translation scaling parameters of the region of interest, so as to obtain the region of interest;
the loss function is shown by the following equation:
Figure BDA0001634182830000061
wherein, i is a candidate frame index value and represents the ith candidate frame. p is a radical of i Is to predict the probability that the object in the candidate box is a bird. Authentic mark
Figure BDA0001634182830000062
Is 1, if the candidate box contains birds, otherwise is 0. t is t i For the predicted panning scaling parameters of the candidate frame,
Figure BDA0001634182830000063
the scaling parameter is the true translation. L is cls The difference between the prediction probability and the true marker is evaluated as a cross entropy loss function. The formula is as follows:
Figure BDA0001634182830000064
L reg the predicted comment zoom parameter is evaluated for the difference from the true pan zoom parameter for the SmoothL1 loss function. L is reg The formula is as follows:
Figure BDA0001634182830000065
in addition, N cls And N reg Is a normalization parameter and λ is a balancing parameter that balances the two part loss functions.
C. Selecting a plurality of foreground areas with objects by adopting an area object network, wherein the implementation method comprises the following steps:
C1. according to the characteristic diagram F in the step A fuse The ratio of the size of the original image to the size of the original image is 1/n, and the original image is divided into a plurality of object areas every n pixel points on the original image;
C2. b, according to the mapping relation from the object area in the original image to the feature image, finding out feature blocks corresponding to the object area in the feature image obtained in the step A, and calculating background/foreground classification scores of the feature blocks through a cross entropy loss function to obtain the object area classified as the foreground or the background;
the loss function of the foreground/background prediction of the evaluation object area is shown as the following formula:
Figure BDA0001634182830000066
wherein, i is the index value of the object region and represents the ith object region. p is a radical of formula i Probability values are predicted for the object regions obtained in C1,
Figure BDA0001634182830000067
is the true label of the object region (foreground is 1, background is 0). N is a radical of cls Representing the number of object regions in the image. L is cls The formula is a cross entropy function, and is shown in formula 2.
In a possible implementation manner, feature learning of the object region may be implemented by adding a convolution operation using a convolution kernel of 1 × 1 size, but it should be noted that, depending on the actual situation, the extraction feature expression manner may be flexibly selected, including using different convolution operation structures and extracting artificial features (HOG features, Haar features), and the present invention is not limited thereto.
D. Combining the results of the step B and the step C, and keeping the interested area of the foreground area; and B, according to the mapping relation from the region of interest in the input data to the feature diagram obtained in the step A, finding a plurality of feature frames corresponding to the region of interest in the feature diagram, and fixing the plurality of feature frames to be of the same size.
In one possible implementation, these feature boxes may be fixed to a uniform 7 x 7 size. And the feature frame is subjected to a plurality of convolution layers and pooling layers, and the score of the bird target and the candidate frame translation scaling parameter are obtained through calculation. The number of convolutional and pooling layers passed is determined by the infrastructure. Further, the final bird identification frame is obtained by translating the scaling parameters of the candidate frame. For bird targets scoring greater than 0.5, bird targets are considered.
E. And D, performing non-maximum suppression treatment on the bird identification frame obtained in the step D to obtain a final position and category identification result of the birds, and positioning and identifying the bird target of the coastal wetland.
According to the embodiment, the high-level semantic information and the low-level fine-grained information are fused into the final feature map by combining the feature fusion module, so that the detection effect of the bird target with the small size on the long-range view can be greatly improved. And the interested areas of the background area are eliminated by combining the area object network, so that the number of the interested areas can be reduced, and the generalization capability of the model is improved. The figure shows an example of the result of the bird detection method for the coastal wetland. It can be seen that the detection effect of the small-size birds in the long shot is good.
It is noted that the disclosed embodiments are intended to aid in further understanding of the invention, but those skilled in the art will appreciate that: various substitutions and modifications are possible without departing from the spirit and scope of this disclosure and the appended claims. Therefore, the invention should not be limited by the disclosure of the embodiments, but should be defined by the scope of the appended claims.

Claims (6)

1. A coastal wetland bird detection method utilizes a convolutional neural network to fuse detail information beneficial to small-size target positioning and high-level semantic information beneficial to identification into a high-resolution feature map through feature fusion; obtaining an interested area through an interested area generation network; obtaining a foreground area through an area object network; further screening out an interested area in the foreground area; therefore, the detection success rate and the accuracy rate of small-size targets in bird detection of the coastal wetland are improved; the method comprises the following steps:
A. acquiring a high-resolution feature map of the whole picture containing high-level semantic information and detail information through feature fusion; the following operations are specifically executed:
A1. inputting the bird pictures of the coastal wetland into a convolutional neural network, and obtaining a characteristic diagram of each stage through convolution operation of four stages;
A2. selecting the feature map of the third stage and the feature map of the fourth stage in the step A1 for feature fusion to obtain a high-resolution feature map of the whole picture containing high-level semantic information and detail information;
B. extracting a plurality of regions of interest by using a region of interest generation network, and specifically executing operations B1-B4:
B1. b, setting the ratio of the size of the high-resolution feature map obtained in the step A to the size of the original image to be 1/n, and generating a plurality of candidate frames with different sizes and aspect ratios in the original image every n pixel points; establishing mapping between the high-resolution feature map and the candidate frame;
B2. b, according to the high-resolution feature map obtained in the step A, calculating through a region-of-interest generation network to obtain the score of the candidate frame at each position predicted as the foreground and the translation scaling parameters of the candidate frame; the foreground refers to the avian target;
B3. carrying out translation zooming on each candidate frame according to the translation zooming parameters to obtain an interested area containing the bird target in the picture;
B4. training by adopting a supervised learning method in machine learning, and during training, evaluating classification results of the candidate frames at each position and translation and scaling results of the candidate frames by adopting a cross entropy loss function and a Smoothl1 loss function;
C. selecting a plurality of foreground areas with target birds by adopting a regional object network, and specifically executing the following operations:
C1. according to the high-resolution feature map obtained in the step A, obtaining an object map through calculation of a regional object network; the pixel value of each pixel in the object graph represents the predicted probability value of whether the object region contains the bird target; if the bird target is a foreground target, otherwise, the bird target is a background; the prediction probability value is (0, 1); the object area with the probability value larger than 0.5 is a foreground object area;
C2. determining the size of the object region: dividing the original image into a plurality of object areas every n pixel points according to the size ratio 1/n of the high-resolution feature image to the original image obtained in the step A, and establishing mapping between the high-resolution feature image and the object areas;
C3. during training, setting an area ratio, and if the ratio of the area of an overlapped area of the object area and the foreground target to the area of the object area exceeds the set area ratio, considering the object area as the foreground area; otherwise, the area is a background area;
D. combining the foreground region of interest obtained in the step B3 with the foreground object region obtained in the step C1, and reserving the region of interest at the position of the foreground region;
E. according to the input mapping relation between the interested region of the foreground region position obtained in the step D in the coastal wetland bird picture and the high-resolution feature map obtained in the step A, finding out feature frames corresponding to the interested region in the high-resolution feature map, and unifying the feature frames to a fixed size;
F. obtaining a feature vector with a fixed size by performing convolution operation on the feature frame through a plurality of convolution layers of a convolution neural network and pooling operation on a pooling layer, obtaining scores predicted as birds and translation scaling parameters of an area of interest by utilizing the feature vector through calculation, and obtaining a final identification frame through the translation scaling parameters of the area of interest;
G. and F, performing non-maximum suppression treatment on the classification scores and the identification frames obtained in the step F to obtain a coastal wetland bird target detection and identification result, and identifying the coastal wetland bird target.
2. The coastal wetland bird detection method of claim 1, wherein the loss function of step B4 is represented by formula 1:
Figure FDA0003778235730000021
where i is the candidate box index value, representingThe ith candidate box; p is a radical of i Is the probability of predicting that the object within the candidate box is a bird; authentic mark
Figure FDA0003778235730000022
Setting to be 1, if the candidate frame contains birds, otherwise, setting to be 0; t is t i A predicted panning scaling parameter for the candidate box;
Figure FDA0003778235730000023
scaling parameters for true translation; l is a radical of an alcohol cls Is a cross entropy loss function, and is used for evaluating the difference between the prediction probability and the real mark, and the formula is as follows:
Figure FDA0003778235730000024
L reg is a SmoothL1 loss function and is used for evaluating the difference between a predicted translation scaling parameter and a real translation scaling parameter;
L reg represented by formula 3:
Figure FDA0003778235730000025
N cls and N reg Is a normalization parameter and λ is a balance parameter that balances the two part loss function.
3. The method for detecting bird species in coastal wetlands of claim 1, wherein in the training of step C3, the area ratio is set to 70%; the prediction loss function for evaluating the foreground or background of the object region is expressed as formula 4:
Figure FDA0003778235730000026
wherein, i is the index value of the object region and represents the ith object region; p is a radical of i Prediction of object regions obtained in C1A probability value;
Figure FDA0003778235730000027
the real mark of the object area is shown, the foreground is 1, and the background is 0; l is cls For the cross entropy function, the formula is as follows:
Figure FDA0003778235730000031
N cls representing the number of object regions in the image.
4. The method for detecting birds on coastal wetlands of claim 1, wherein the characteristics in step A are fused, specifically:
taking a group of feature maps F in the fourth stage in the step A1 and a group of feature maps F' in the feature maps in the third stage as input of feature fusion;
amplifying the dimension of F 'to be the same as that of F, and then amplifying the size of F to be the same as that of F';
adding the two processed feature maps with the same size and dimensionality in a point-to-point manner, and then performing fusion processing to obtain a fused output feature map F fuse
The feature map dimensions are specifically determined by the network structure.
5. The coastal wetland bird detection method of claim 4, wherein the F' dimension is amplified by a convolution operation using a convolution kernel of size 1x 1; amplifying the size of F by deconvolution operations using convolution kernels of size 1x 1; the fusion process after addition is realized by a convolution operation using a convolution kernel of size 1 × 1.
6. The method for detecting birds on coastal wetlands of claim 1, wherein the size of the candidate frame is {32 } 2 ,64 2 ,128 2 ,512 2 }; the aspect ratio of the candidate box is {1:1,1:2,1:0.5 }.
CN201810354126.7A 2018-04-19 2018-04-19 Coastal wetland bird detection method Active CN110399868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810354126.7A CN110399868B (en) 2018-04-19 2018-04-19 Coastal wetland bird detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810354126.7A CN110399868B (en) 2018-04-19 2018-04-19 Coastal wetland bird detection method

Publications (2)

Publication Number Publication Date
CN110399868A CN110399868A (en) 2019-11-01
CN110399868B true CN110399868B (en) 2022-09-09

Family

ID=68319502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810354126.7A Active CN110399868B (en) 2018-04-19 2018-04-19 Coastal wetland bird detection method

Country Status (1)

Country Link
CN (1) CN110399868B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091127A (en) * 2019-12-16 2020-05-01 腾讯科技(深圳)有限公司 Image detection method, network model training method and related device
CN113076860B (en) * 2021-03-30 2022-02-25 南京大学环境规划设计研究院集团股份公司 Bird detection system under field scene
CN114594106A (en) * 2022-03-08 2022-06-07 苏州菲利达铜业有限公司 Real-time monitoring method and system for copper pipe electroplating process

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599827A (en) * 2016-12-09 2017-04-26 浙江工商大学 Small target rapid detection method based on deep convolution neural network
CN106845430A (en) * 2017-02-06 2017-06-13 东华大学 Pedestrian detection and tracking based on acceleration region convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10354159B2 (en) * 2016-09-06 2019-07-16 Carnegie Mellon University Methods and software for detecting objects in an image using a contextual multiscale fast region-based convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599827A (en) * 2016-12-09 2017-04-26 浙江工商大学 Small target rapid detection method based on deep convolution neural network
CN106845430A (en) * 2017-02-06 2017-06-13 东华大学 Pedestrian detection and tracking based on acceleration region convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
图像分类卷积神经网络的特征选择模型压缩方法;邹月娴等;《控制理论与应用》;20170630;全文 *

Also Published As

Publication number Publication date
CN110399868A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN108399362B (en) Rapid pedestrian detection method and device
CN109284670B (en) Pedestrian detection method and device based on multi-scale attention mechanism
Zhuo et al. Cloud classification of ground-based images using texture–structure features
CN111178183B (en) Face detection method and related device
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN110110755B (en) Pedestrian re-identification detection method and device based on PTGAN region difference and multiple branches
CN110399868B (en) Coastal wetland bird detection method
Chen et al. Remote sensing image quality evaluation based on deep support value learning networks
CN112801047B (en) Defect detection method and device, electronic equipment and readable storage medium
CN113205507B (en) Visual question answering method, system and server
CN113435407B (en) Small target identification method and device for power transmission system
CN112396053A (en) Method for detecting object of all-round fisheye image based on cascade neural network
CN116503399B (en) Insulator pollution flashover detection method based on YOLO-AFPS
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN113887472A (en) Remote sensing image cloud detection method based on cascade color and texture feature attention
Malav et al. DHSGAN: An end to end dehazing network for fog and smoke
CN116805387B (en) Model training method, quality inspection method and related equipment based on knowledge distillation
CN114332457A (en) Image instance segmentation model training method, image instance segmentation method and device
Ke et al. Haze removal from a single remote sensing image based on a fully convolutional neural network
CN110334703B (en) Ship detection and identification method in day and night image
CN116977859A (en) Weak supervision target detection method based on multi-scale image cutting and instance difficulty
CN111815677A (en) Target tracking method and device, terminal equipment and readable storage medium
CN116612272A (en) Intelligent digital detection system for image processing and detection method thereof
CN116310323A (en) Aircraft target instance segmentation method, system and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant