CN115170860A - Image classification recognition method, recognition device and storage medium - Google Patents

Image classification recognition method, recognition device and storage medium Download PDF

Info

Publication number
CN115170860A
CN115170860A CN202210640202.7A CN202210640202A CN115170860A CN 115170860 A CN115170860 A CN 115170860A CN 202210640202 A CN202210640202 A CN 202210640202A CN 115170860 A CN115170860 A CN 115170860A
Authority
CN
China
Prior art keywords
feature
region
map
boundary
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210640202.7A
Other languages
Chinese (zh)
Inventor
周翊民
吴相栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202210640202.7A priority Critical patent/CN115170860A/en
Publication of CN115170860A publication Critical patent/CN115170860A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image classification and identification method, an image classification and identification device and a storage medium, wherein the image classification and identification method comprises the following steps: performing feature extraction on an image to be recognized to obtain a first feature map; carrying out down-sampling operation on the first feature map to obtain a second feature map; rotating the second characteristic diagram to enable the boundary of the characteristic region to be matched with the boundary of the target detection region to obtain a third characteristic diagram; and obtaining a classification recognition result of the image to be recognized according to the third feature map. By means of the method, the efficiency and the accuracy of image classification and identification can be improved.

Description

Image classification recognition method, recognition device and storage medium
Technical Field
The present application relates to the field of computer image classification and identification, and in particular, to an image classification and identification method, an image classification and identification apparatus, and a storage medium.
Background
Along with the improvement of the automation degree, the computer vision can assist in monitoring in various occasions, and specifically can comprise people flow monitoring, vehicle monitoring, disaster monitoring, security monitoring and the like.
In an application scenario, ports are large in scale and freight throughput, and the number of transport vessels is large when offshore sea areas come and go. In the supervision of ships, traditional coast monitoring system facilities are old, are set at specific positions, have fixed visual angles, are not flexible enough, can only monitor coast conditions at horizontal visual angles, and have low identification accuracy under the condition that ships shield each other.
Compared with the defects of the traditional port monitoring equipment, the aerial image can obtain a wider visual field, the problem of mutual shielding among ships does not exist, the orientation distribution of the ships can be seen, and good support is provided for better coast management. However, in the case of a ship with an aerial image, a problem occurs in that a feature map in a detection task does not fit a real target range.
Disclosure of Invention
The technical problem mainly solved by the application is to provide an image classification and identification method, an image classification and identification device and a storage medium, which can improve the efficiency and accuracy of image classification and identification.
In order to solve the technical problem, the application adopts a technical scheme that: there is provided an image recognition method, the method comprising: performing feature extraction on an image to be recognized to obtain a first feature map; carrying out down-sampling operation on the first feature map to obtain a second feature map; rotating the second characteristic diagram to enable the boundary of the characteristic region to be matched with the boundary of the target detection region to obtain a third characteristic diagram; and obtaining a classification recognition result of the image to be recognized according to the third feature map.
The rotating the second feature map to match the boundary of the feature region with the boundary of the target detection region to obtain a third feature map includes: determining a target feature area in the second feature map; respectively rotating the target feature region based on a plurality of set rotation angles to respectively obtain a plurality of corresponding candidate feature regions; determining a rotation characteristic region matched with the boundary of the target detection region in the plurality of candidate characteristic regions; and determining a third feature map based on a plurality of rotation feature areas corresponding to the plurality of target feature areas in the second feature map.
The method for respectively rotating the target feature region based on a plurality of set rotation angles to respectively obtain a plurality of corresponding candidate feature regions includes: determining a plurality of angle channels, wherein each angle channel corresponds to an angle interval; respectively rotating the target characteristic region based on the plurality of angle channels to respectively obtain a plurality of corresponding candidate characteristic regions; and for each candidate characteristic region, carrying out pixel interpolation processing on the candidate characteristic region by using the boundary of the corresponding angle channel.
The target feature area is processed by adopting the following formula:
F On =Int(SA(Y(p)·R T (θ),C n ),θ),n=0,1,…,N-1;
wherein, C n Denotes the n-th angular channel, F On Shown in the angular channel C n The following candidate characteristic region, int represents an interpolation function, SA represents an angle channel switching function, Y (p) represents a characteristic function corresponding to the second characteristic diagram, R (theta) = (cos theta, -sin theta; sin theta, cos theta) T Is a rotation matrix for controlling the switching of the rotation angle, and N is a positive integer.
The target detection area is downsampled by adopting the following formula:
Figure BDA0003681959950000021
wherein, F is a feature function corresponding to the first feature map, W (K) represents the weight of each position on the used convolution kernel, K represents the position range of convolution calculation, and the value of K controls the traversal process of the pixel point.
Wherein, in the plurality of candidate feature regions, determining a rotated feature region matched with the boundary of the target detection region includes: constructing a score map of each candidate characteristic region corresponding to a plurality of angle channels; determining a response value corresponding to each score map; and determining a rotating characteristic region matched with the boundary of the target detection region according to the score map and the response value of each candidate characteristic region.
The method for constructing the score map of each candidate feature region corresponding to the plurality of angle channels comprises the following steps: dividing each candidate feature region into d 2 An individual block region; wherein d is a positive integer; pooling each subblock region separately(ii) a Will d 2 And splicing the results of the pooling operation of the sub-block regions to form a score map of the candidate characteristic region corresponding to one angle channel.
Wherein, the sub-block area is subjected to pooling operation by adopting the following formula:
Figure BDA0003681959950000031
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003681959950000032
shown in the angular channel C n Upper (i, j) th subblock region B i,j The output of the filter is performed in a pond,
Figure BDA0003681959950000033
represents N × k 2 One of the score maps, w is a learning parameter, and p is a sub-block region B i, The number of pixels in (u, v) is the feature point P i, The global coordinates of (a).
Wherein the characteristic point P i,j Can be defined by the following formula:
Figure BDA0003681959950000034
wherein (u ', v') represents a characteristic point P i,j (u) local coordinates of 0 ,v 0 ) The coordinate of the upper left corner point of the alignment feature region is expressed, when (u, v) epsilon is B i,j Its range is defined by the following equation:
Figure BDA0003681959950000035
Figure BDA0003681959950000036
where w and h are the length and width of the conventional convolution region, respectively.
And splicing the results of the subblock region pooling operation by adopting the following formula to form a score map of the candidate characteristic region:
Figure BDA0003681959950000037
the response value of the angle channel is calculated by the following formula:
Figure BDA0003681959950000038
the method for determining the rotating characteristic region matched with the boundary of the target detection region according to the score map and the response value of each candidate characteristic region comprises the following steps: obtaining a third characteristic diagram by adopting the following formula:
Figure BDA0003681959950000041
wherein F A For the selected rotated feature region that best matches the boundary of the target detection region, F OM For obtaining the angle channel C of the maximum product n Of the area.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an image classification recognition apparatus comprising a processor and a memory, the memory storing program data, the processor being configured to execute the program data to implement the image classification recognition method as described above.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer-readable storage medium storing program data for implementing the image classification recognition method described above when the program data is executed.
The beneficial effect of this application is: the image classification and identification method is applied to an image classification and identification device, and the image classification and identification device performs feature extraction on an image to be identified to obtain a first feature map; performing downsampling operation based on the first feature map to obtain a second feature map; rotating the second characteristic diagram to enable the boundary of the characteristic region to be matched with the boundary of the target detection region to obtain a third characteristic diagram; and further utilizing the third feature map to obtain a classification recognition result of the image to be recognized. Through the mode, compared with the processing area of the conventional convolution operation, the method based on the angle channel switching enables the original convolution processing area to be attached to the target detection area through angle rotation, and the characteristic area which is most matched with the target area is obtained through constructing the characteristic score map, so that the problem that the characteristic area obtained through the conventional convolution is seriously misaligned with the target detection area is solved, and the image is classified and identified more efficiently.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:
FIG. 1 is a flowchart illustrating an embodiment of a method for classifying and identifying images provided herein;
FIG. 2 is a schematic illustration of rotated feature misalignment (left graph) and rotated feature alignment effect (right graph) in a conventional convolution;
FIG. 3 is a flowchart illustrating an embodiment of a method step 13 for classifying and identifying images provided by the present application;
FIG. 4 is a flowchart illustrating an embodiment of the method 132 for classifying and identifying images provided herein;
FIG. 5 is a schematic diagram of a rotational feature alignment module based on angle channel switching in the image classification recognition method provided in the present application;
FIG. 6 is a flowchart illustrating an embodiment of the method step 133 for classifying and identifying images provided herein;
FIG. 7 is a flowchart illustrating an embodiment of a method step 1331 of image classification identification provided herein;
FIG. 8 is a schematic diagram of a feature score map construction process in the image classification and identification method provided by the present application;
FIG. 9 is a schematic structural diagram of an image classification recognition model provided in the present application;
FIG. 10 is a schematic structural diagram of an embodiment of an image classification and identification apparatus provided in the present application;
FIG. 11 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic flowchart of an embodiment of a method for classifying and identifying an image provided by the present application, where the method includes:
step 11: and performing feature extraction on the image to be recognized to obtain a first feature map.
The image to be recognized comprises a target detection area.
Optionally, the image to be recognized is an image captured by a camera, and in an embodiment of port ship monitoring, the image to be recognized may be an image captured by a camera on the drone.
Specifically, the image to be recognized is subjected to target detection, and the target detection result includes a corresponding target detection area (or target detection frame), where the target detection area is generally a rectangle, and the rectangle is generally a circumscribed rectangle of the target to be detected, so as to approximately represent the position, direction, size, and other conditions of the target to be detected. Taking a ship as an example, the berthing position of the ship at a harbor is various, and thus the rectangle of the target detection area formed also has a different shape. In this embodiment, an included angle between a long side of the rectangle and an x-axis of the image may be defined as a rotation angle of the target detection area.
Specifically, the feature extraction is to use a computer to extract image information, such as brightness, edge, texture, color and other data, so as to realize the classification of the image. In this embodiment, mainly, the image features to be recognized are extracted to obtain the target region to be detected. Taking an aerial ship image as an example, specific data in the aerial ship image is extracted to classify objects in the image to obtain the area where the ship is located.
Step 12: and performing downsampling operation on the first feature map to obtain a second feature map.
Wherein the second feature map includes a feature region.
Specifically, the down-sampling refers to a process of reducing an image. The down-sampling may cause the image to conform to the size of the display area, generating a thumbnail of the corresponding image.
In one embodiment, with a convolution kernel using 3x 3:
K={(-1,-1),(0,-1),...,(0,1),(1,1)}
for example, the input feature map is down-sampled, and the reduced feature map is output.
Optionally, in an embodiment, the target detection area is downsampled by using the following formula:
Figure BDA0003681959950000071
wherein, F is a feature function corresponding to the first feature map, W (K) represents the weight of each position on the used convolution kernel, K represents the position range of convolution calculation, and the value of K controls the traversal process of the pixel point.
Step 13: and rotating the second characteristic diagram to enable the boundary of the characteristic region to be matched with the boundary of the target detection region, so as to obtain a third characteristic diagram.
Wherein the third feature map is a feature region that best matches the boundary of the target detection region.
In particular, conventional convolution kernels are square in shape and operate with only horizontal and vertical translations, often not perfectly matching the target detection region boundaries. As shown in fig. 2, in the left image, the frame a is a convolution feature region extracted according to the feature extraction range, and the frame b is an actual region of the target object, and it can be seen that the problem of feature misalignment obviously exists between the two regions, so that the feature region needs to be aligned with the target actual region by rotating the convolved region of the feature region, and the implementation effect is as shown in the right image in fig. 2.
Optionally, in an embodiment, as shown in fig. 3, fig. 3 is a schematic flow chart of an embodiment of step 13, where step 13 may specifically include:
step 131: and determining a target characteristic region in the second characteristic diagram.
The target feature area is a feature map which is subjected to down-sampling and conforms to the size of the target detection area. For example, if the size of the second feature map is 3 × 3, the target feature region may be any one of 9 rectangular (or square) regions, and further, each feature region in the second feature map may also be determined in turn in a traversal manner, and subsequent steps are performed in turn based on each feature region.
Step 132: and respectively rotating the target characteristic region based on the set plurality of rotation angles to respectively obtain a plurality of corresponding candidate characteristic regions.
Specifically, the rotation is to rotate the original feature frame according to the result of the division of the angle channels by an angle classification method. Wherein the rotation angle value is circularly switched in the given angle channel.
It is understood that the number of the plurality of set rotation angles (angle channels) determines the matching degree of the boundary of the feature region after rotation determined subsequently and the boundary of the target detection region, and therefore, as many angle channels as possible can be set.
Optionally, in an embodiment, as shown in fig. 4, fig. 4 is a schematic flowchart of an embodiment of step 132, where step 132 may specifically include:
step 1321: a plurality of angle channels are determined, and each angle channel corresponds to an angle interval.
Optionally, in an embodiment, 1 ° to 90 ° may be divided into 10 angular channels, and the angular channels may sequentially include, according to the size of the angle: 1-10 degrees, 11-20 degrees, 21-30 degrees, 823060 degrees, 82303081 degrees and 90 degrees.
It will be appreciated that the centre point of rotation may be defined as the centre of the rectangle of the feature area, i.e. the intersection of two diagonals of the rectangle, about which the feature area is rotated, and that each angular channel corresponds to a 10 sector having a boundary.
Of course, the above is merely an example, and in other embodiments, a plurality of angular channels based on 360 ° may be determined.
Step 1322: and respectively rotating the target characteristic region based on the plurality of angle channels to respectively obtain a plurality of corresponding candidate characteristic regions.
Step 1323: and for each candidate characteristic region, carrying out pixel interpolation processing on the candidate characteristic region by using the boundary of the corresponding angle channel.
In particular, the interpolation is a complementary interpolation of the continuous function on the basis of the discrete data, so that this continuous curve passes through all given discrete data points, acting to fill the gaps between the pixels at the time of image transformation. Because the angle channel is an interval and cannot be accurately calculated, the calculation of the characteristic region generates errors due to angle classification processing, and therefore the pixel point deviation after the characteristic region is rotated needs to be solved by carrying out interpolation processing on the region.
Optionally, in an embodiment, the input target feature region is rotated by using the following formula:
F On =Int(SA(Y(p)·R T (θ),c n ),θ),n=0,1,....,N-1;
wherein, c n Denotes the n-th angular channel, F On Is shown in the angular channel c n The following candidate characteristic region, int represents an interpolation function, SA represents an angle channel switching function, Y (p) represents a characteristic function corresponding to the second characteristic diagram, R (theta) = (cos theta, -sin theta; sin theta, cos theta) T Is a rotation matrix for controllingAnd N is a positive integer for switching the rotation angle.
Specifically, as shown in fig. 5, for the input candidate feature region, the boundaries of the angle intervals are selected according to the divided angle channels, and the pixel interpolation calculation is performed on the region to realize the rotation of the feature region.
Step 133: and determining a rotating characteristic region matched with the boundary of the target detection region in the candidate characteristic regions.
The purpose of this step is to determine a final required feature region among a plurality of candidate feature regions.
Specifically, a plurality of candidate feature regions may be evaluated by a certain method. For example, the similarity between each candidate feature region and the original image to be detected may be calculated, or the overlap ratio between the boundary of each candidate feature region and the target region of the original image to be detected may be calculated. One way to score multiple candidate feature regions is as follows:
optionally, in an embodiment, as shown in fig. 6, fig. 6 is a schematic flowchart of an embodiment of step 133, where step 133 may specifically include:
step 1331: and constructing a score map of each candidate feature region corresponding to a plurality of angle channels.
Optionally, in an embodiment, as shown in fig. 7, fig. 7 is a schematic flow chart of an embodiment of step 1331, where the step 1331 may specifically include:
step 13311: dividing each candidate feature region into d 2 An individual block region.
Step 13312: pooling is performed separately for each subblock region.
Specifically, pooling refers to statistically summarizing the characteristic values of an unknown position and an adjacent position in a plane, and using the summarized result as the value of the position in the plane.
Specifically, as shown in fig. 8, taking d =3 as an example, the feature region corresponding to each angular channel is divided into 3 × 3 sub-block regions, and the score of each sub-block region is calculated to construct a score map.
Optionally, in an embodiment, the sub-block region is pooled by using the following formula:
Figure BDA0003681959950000091
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003681959950000101
indicated at the angular passage c n Upper (i, j) th subblock region B i,j The output of the filter is the output of the pool,
Figure BDA0003681959950000102
represents Nxd 2 One of the score maps, w is a learning parameter, and p is a subblock region B i,j The number of pixels in (u, v) is the feature point P i,j The global coordinates of (a).
Specifically, the feature point P i,j Can be defined by the following formula:
Figure BDA0003681959950000103
wherein (u ', v') represents a feature point P i,j (u) local coordinates of (c) 0 ,v 0 ) The coordinates of the upper left corner points of the alignment feature region are expressed when (u, v) belongs to B i , j Its range is defined by the following equation:
Figure BDA0003681959950000104
Figure BDA0003681959950000105
where w and h are the length and width of the conventional convolution region, respectively.
Step 13313: d is to be 2 Splicing results after individual sub-block region pooling operation to form pairs of candidate feature regionsA score plot of the angular channels should be formed.
Optionally, in an embodiment, the score map of the candidate feature region is constructed by splicing the results of the subblock region pooling operation by using the following formula:
Figure BDA0003681959950000106
step 1332: and determining a response value corresponding to each score map.
Specifically, the response value refers to a matching degree of each sub-block region with a boundary of the target detection region.
Optionally, in an embodiment, the following formula is used to determine the response value corresponding to each score map:
Figure BDA0003681959950000107
step 1333: and determining a rotating characteristic region matched with the boundary of the target detection region according to the score map and the response value of each candidate characteristic region.
Step 134: and determining a third feature map based on a plurality of rotation feature areas corresponding to the plurality of target feature areas in the second feature map.
Optionally, in an embodiment, the following formula is adopted to obtain the third feature map:
Figure BDA0003681959950000111
wherein F A For the selected rotated feature region that best matches the boundary of the target detection region, F OM For obtaining the angle channel C of the maximum product n The area characteristic of (1).
Step 14: and obtaining a classification recognition result of the image to be recognized according to the third feature map.
And predicting the angle and the coordinate of the target object in the image based on the third feature map so as to classify the target object.
In particular, the result of the classification may be the kind of ship, such as a cargo ship, an engineering ship, a fishery ship, a military ship, a civil ship, and the like.
In particular, the loss function is one way to measure the gap between the predicted and actual values of the output of the neural network. Wherein, a cross entropy loss function is used as a classification loss function so as to more accurately obtain the class information of the target object. Entropy is the uncertainty that indicates that a random variable is some possible case. The cross entropy loss function is used to estimate the distance between two sample distributions.
It can be understood that we use cross entropy to evaluate the difference between the target class probability distribution obtained by current training and the true analog distribution, that is, the cross entropy loss function is used to characterize the distance between the actual output (probability) and the expected output (probability), that is, the smaller the value of cross entropy, the closer the two probability distributions are.
Specifically, in the design of the anchor frame for target detection, a self-adaptive anchor frame calculation method is adopted, namely, the optimal anchor frame values in different data sets can be calculated in a self-adaptive mode during each training, so that the quality of the preset anchor frame is improved.
Specifically, the target bounding box Loss function uses CIoU _ Loss, which can simultaneously consider the bounding box misalignment problem, bounding box center point distance information, and bounding box aspect ratio scale information.
Specifically, in the screening process of the target box, a weighted NMS method is used, thereby outputting final prediction box information. And the weighted NMS weights according to the confidence coefficient of the network prediction in the rectangular frame elimination process to obtain a new rectangular frame, uses the rectangular frame as a final predicted rectangular frame, and eliminates the frames.
Different from the prior art, the image classification and identification method provided by the embodiment is applied to an image classification and identification device, and the image classification and identification device performs feature extraction according to an image to be identified to obtain a first feature map; performing downsampling operation based on the first feature map to obtain a second feature map; rotating the second characteristic diagram to enable the boundary of the characteristic region to be matched with the boundary of the target detection region to obtain a third characteristic diagram; and further utilizing the third feature map to obtain a classification recognition result of the image to be recognized. Through the mode, compared with the processing area of the conventional convolution operation, the method based on the angle channel switching enables the original convolution processing area to be attached to the target detection area through angle rotation, and the characteristic area which is most matched with the target area is obtained through constructing the characteristic score map, so that the problem that the characteristic area obtained through the conventional convolution is seriously not aligned with the target detection area is solved, and the image is classified and identified more efficiently.
The method of the above embodiment, may be implemented using a network model,
referring to fig. 9, fig. 9 is a schematic structural diagram of an image classification recognition model provided in the present application, where the image classification recognition model 90 may include a feature extraction module 91, a rotational feature alignment module 92, a full connection module 93, and a prediction result module 94.
The feature extraction module 91 is configured to perform feature extraction on an image to be identified to obtain a first feature map, where the image to be identified includes a target detection area; further, performing downsampling operation on the first feature map to obtain a second feature map, wherein the second feature map comprises feature areas; the rotating feature alignment module 92 is configured to rotate the second feature map, so that the boundary of the feature region matches with the boundary of the target detection region, to obtain a third feature map; the full-connection module 93 is configured to perform full-connection operation on the third feature map, and the prediction result module 94 obtains a classification recognition result of the image to be recognized according to the third feature map.
Optionally, the rotating feature alignment module 92 may specifically include a switching angle channel module 921, an angle interpolation module 922, and a feature score map module 923, where the switching angle channel module 921 is configured to set a plurality of angle channels, and enable the feature area to rotate based on an angle range of the angle channels; the angle interpolation module 922 is configured to perform interpolation processing on the feature region obtained by rotation based on the boundary of the angle channel; the feature score map module 923 is configured to score, among a plurality of candidate feature regions, the candidate feature regions by a method of constructing a feature score map to determine a final required feature region.
Specifically, for an initial target image transmitted by an upper network, a feature region of a region where a target is located is determined, and then the offset of the feature region is calculated through angle classification, wherein an angle value is circularly switched in a set angle channel. Since the angle classification process causes an error in the calculation of the feature region, the feature region after the corresponding rotation is interpolated at the boundary of each angle section. After the rotation alignment features corresponding to different angle channels are obtained, a position-sensitive score chart is constructed for each channel alignment feature, wherein the score chart is simplified and calculated by using block processing, and the response value of the corresponding angle channel is calculated. And for each angle channel, calculating the accumulated score value and the response value, and taking the product value of the accumulated score value and the response value as the basis for selecting the angle channel, wherein the angle channel with the maximum value is the channel where the selected characteristic region which is most matched with the target region is located. And performing full-connection operation on the score map corresponding to the channel, inserting the score map into the existing target detection model, optimizing the extraction of the rotation characteristics, contributing to reducing the error of angle classification processing, and improving the classification effect.
It can be understood that specific implementation steps and implementation principles of this embodiment may refer to the embodiment of fig. 1, which are not described herein again.
Referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of an image classification recognition apparatus 100 provided in the present application, where the image classification recognition apparatus 100 includes a memory 101 and a processor 102, the memory 101 is used for storing program data, and the processor 102 is used for executing the program data to implement the following method:
performing feature extraction according to an image to be recognized to obtain a first feature map; performing downsampling operation based on the first feature map to obtain a second feature map; rotating the second characteristic diagram to enable the boundary of the characteristic region to be matched with the boundary of the target detection region to obtain a third characteristic diagram; and further utilizing the third feature map to obtain a classification recognition result of the image to be recognized.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application, where the computer-readable storage medium 110 stores program data 111, and when the program data 111 is executed by a processor, the program data is used to implement the following methods:
performing feature extraction on an image to be recognized to obtain a first feature map; performing downsampling operation based on the first feature map to obtain a second feature map; rotating the second characteristic diagram to enable the boundary of the characteristic region to be matched with the boundary of the target detection region to obtain a third characteristic diagram; and further utilizing the third feature map to obtain a classification recognition result of the image to be recognized.
Embodiments of the present application may be implemented in software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.
The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (12)

1. An image classification recognition method, characterized in that the method comprises:
performing feature extraction on an image to be recognized to obtain a first feature map, wherein the image to be recognized comprises a target detection area;
performing downsampling operation on the first feature map to obtain a second feature map, wherein the second feature map comprises feature areas;
rotating the second feature map to enable the boundary of the feature region to be matched with the boundary of the target detection region, so as to obtain a third feature map;
and obtaining a classification recognition result of the image to be recognized according to the third feature map.
2. The method of claim 1,
the rotating the second feature map to match the boundary of the feature region with the boundary of the target detection region to obtain a third feature map includes:
determining a target feature region in the second feature map;
respectively rotating the target feature region based on a plurality of set rotation angles to respectively obtain a plurality of corresponding candidate feature regions;
determining a rotated feature region matching the boundary of the target detection region among the plurality of candidate feature regions;
and determining a third feature map based on a plurality of rotation feature areas corresponding to the plurality of target feature areas in the second feature map.
3. The method of claim 2,
the rotating the target feature region based on the set plurality of rotation angles to obtain a plurality of corresponding candidate feature regions respectively includes:
determining a plurality of angle channels, wherein each angle channel corresponds to an angle interval;
respectively rotating the target feature region based on the plurality of angle channels to respectively obtain a plurality of corresponding candidate feature regions;
and for each candidate characteristic region, carrying out pixel interpolation processing on the candidate characteristic region by utilizing the boundary of the corresponding angle channel.
4. The method of claim 3, wherein the target feature region is processed using the following formula:
F On =Int(SA(Y(p)·R T (θ),C n ),θ),n=0,1,…,N-1;
wherein, C n The n-th angular channel is shown,
Figure FDA0003681959940000021
indicated at the angular passage C n The Int represents an interpolation function, SA represents an angle channel switching function, Y (p) represents a characteristic function corresponding to the second characteristic diagram, R (theta) = (cos theta, -sin theta; sin theta, cos theta) T Is a rotation matrix for controlling the switching of the rotation angle, and N is a positive integer.
5. The method of claim 4, wherein the target detection area is downsampled using the following equation:
Figure FDA0003681959940000022
wherein, F is a feature function corresponding to the first feature map, W (K) represents the weight of each position on the used convolution kernel, K represents the position range of convolution calculation, and the value of K controls the traversal process of the pixel point.
6. The method of claim 2,
the determining, in the plurality of candidate feature regions, a rotated feature region matching a boundary of the target detection region includes:
constructing a score map of each candidate characteristic region corresponding to a plurality of angle channels;
determining a response value corresponding to each score map;
and determining a rotating characteristic region matched with the boundary of the target detection region according to the score map and the response value of each candidate characteristic region.
7. The method of claim 6,
the constructing of the score maps of a plurality of angle channels corresponding to each candidate feature region comprises:
dividing each of the candidate feature regions into d 2 An individual block region; wherein d is a positive integer;
performing pooling operation on each subblock region respectively;
d is to be 2 And splicing results after the pooling operation of the sub-block regions to form a score map of the candidate feature region corresponding to an angle channel.
8. The method of claim 7, wherein the sub-block regions are pooled using the following formula:
Figure FDA0003681959940000031
wherein the content of the first and second substances,
Figure FDA0003681959940000032
indicated at the angular passage C n Upper (i, j) th subblock region B i,j The output of the filter is performed in a pond,
Figure FDA0003681959940000033
represents N × d 2 One of the score maps, w is a learning parameter, and p is a sub-block region B i,j The number of pixels in (u, v) is the feature point P i,j Global coordinates of (2);
the characteristic point P i,j Can be defined by the following formula:
Figure FDA0003681959940000034
wherein (u ', v') represents a feature point P i,j (u) local coordinates of 0 ,v 0 ) The coordinates of the upper left corner points of the alignment feature region are expressed when (u, v) belongs to B i,j Its range is defined by the following equation:
Figure FDA0003681959940000035
Figure FDA0003681959940000036
where w and h are the length and width of the conventional convolution region, respectively.
9. The method according to claim 8, wherein the score map of the candidate feature region is constructed by stitching the results of the sub-block region pooling operation using the following formula:
Figure FDA0003681959940000037
calculating the response value of the angle channel by adopting the following formula:
Figure FDA0003681959940000038
10. the method of claim 6,
the determining, according to the score map and the response value of each candidate feature region, a rotated feature region matched with the boundary of the target detection region includes:
obtaining the third feature map by using the following formula:
Figure FDA0003681959940000039
wherein F A For the selected rotated feature region that best matches the boundary of the target detection region, F OM For obtaining the angle channel C of the maximum product n Of the area.
11. An image classification recognition apparatus, characterized in that the image classification recognition apparatus comprises a processor and a memory, the memory storing program data, the processor being configured to execute the program data to implement the image classification recognition method according to any one of claims 1 to 10.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores program data for implementing the image classification recognition method according to any one of claims 1 to 10 when the program data is executed.
CN202210640202.7A 2022-06-07 2022-06-07 Image classification recognition method, recognition device and storage medium Pending CN115170860A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210640202.7A CN115170860A (en) 2022-06-07 2022-06-07 Image classification recognition method, recognition device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210640202.7A CN115170860A (en) 2022-06-07 2022-06-07 Image classification recognition method, recognition device and storage medium

Publications (1)

Publication Number Publication Date
CN115170860A true CN115170860A (en) 2022-10-11

Family

ID=83486298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210640202.7A Pending CN115170860A (en) 2022-06-07 2022-06-07 Image classification recognition method, recognition device and storage medium

Country Status (1)

Country Link
CN (1) CN115170860A (en)

Similar Documents

Publication Publication Date Title
CN109583369B (en) Target identification method and device based on target area segmentation network
CN112348815B (en) Image processing method, image processing apparatus, and non-transitory storage medium
CN111563473B (en) Remote sensing ship identification method based on dense feature fusion and pixel level attention
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN108960135B (en) Dense ship target accurate detection method based on high-resolution remote sensing image
CN109145747A (en) A kind of water surface panoramic picture semantic segmentation method
CN109712071B (en) Unmanned aerial vehicle image splicing and positioning method based on track constraint
CN111738112A (en) Remote sensing ship image target detection method based on deep neural network and self-attention mechanism
US20140314270A1 (en) Detection of floating objects in maritime video using a mobile camera
US8744177B2 (en) Image processing method and medium to extract a building region from an image
WO2018000252A1 (en) Oceanic background modelling and restraining method and system for high-resolution remote sensing oceanic image
CN113326763B (en) Remote sensing target detection method based on boundary frame consistency
CN111079739A (en) Multi-scale attention feature detection method
CN112215190A (en) Illegal building detection method based on YOLOV4 model
CN112633274A (en) Sonar image target detection method and device and electronic equipment
CN112580447B (en) Edge second-order statistics and fusion-based power line detection method
Luo et al. Stationary vehicle detection in aerial surveillance with a UAV
CN115937552A (en) Image matching method based on fusion of manual features and depth features
CN106600613A (en) Embedded GPU-based improved LBP infrared target detection method
CN115240086A (en) Unmanned aerial vehicle-based river channel ship detection method, device, equipment and storage medium
CN116563591A (en) Optical smoke detection method based on feature extraction under sea-sky background
CN113673515A (en) Computer vision target detection algorithm
CN117787690A (en) Hoisting operation safety risk identification method and identification device
CN112926426A (en) Ship identification method, system, equipment and storage medium based on monitoring video
CN115861922B (en) Sparse smoke detection method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination