CN111104538A - Fine-grained vehicle image retrieval method and device based on multi-scale constraint - Google Patents

Fine-grained vehicle image retrieval method and device based on multi-scale constraint Download PDF

Info

Publication number
CN111104538A
CN111104538A CN201911245009.8A CN201911245009A CN111104538A CN 111104538 A CN111104538 A CN 111104538A CN 201911245009 A CN201911245009 A CN 201911245009A CN 111104538 A CN111104538 A CN 111104538A
Authority
CN
China
Prior art keywords
fine
data set
grained
training data
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911245009.8A
Other languages
Chinese (zh)
Inventor
张斯尧
罗茜
王思远
蒋杰
张�诚
李乾
谢喜林
黄晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jiuling Software Technology Co ltd
Original Assignee
Shenzhen Jiuling Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jiuling Software Technology Co ltd filed Critical Shenzhen Jiuling Software Technology Co ltd
Priority to CN201911245009.8A priority Critical patent/CN111104538A/en
Publication of CN111104538A publication Critical patent/CN111104538A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a fine-grained vehicle image retrieval method and a fine-grained vehicle image retrieval device based on multi-scale constraint, wherein the method comprises the following steps: carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set; processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set; training parameter values of target parameters of the fine-grained neural network model by adopting the third training data set to obtain a trained fine-grained neural network model; and inputting the vehicle image to be recognized into a trained fine-grained neural network model for recognition, and obtaining a target vehicle image of the same type as the vehicle image to be recognized. Compared with the prior art, the method and the device reduce the workload of labeling the image category or selecting the frame to label the position of the object in advance, save the cost and improve the efficiency.

Description

Fine-grained vehicle image retrieval method and device based on multi-scale constraint
Technical Field
The invention relates to the technical field of image retrieval, in particular to a fine-grained vehicle image retrieval method and device based on multi-scale constraint, terminal equipment and a computer readable medium.
Background
Vehicle image retrieval is a technology for retrieving similar images through input images, and mainly relates to two parts, namely image vehicle feature extraction and image vehicle feature similarity analysis. Fine-grained image recognition consists in finding local regional features in images that have subtle differences, allowing the recognition of different subclasses within a large class. The fine-grained image recognition technology is used for vehicle image retrieval, and fine-grained features of the images can be extracted and similarity of the fine-grained features of the images can be analyzed.
Meanwhile, with the continuous promotion of smart cities, the road traffic safety of cities is more and more emphasized by people. However, the existing vehicle image retrieval algorithm has low recognition rate on vehicles of the same type and information extraction is not clear.
Disclosure of Invention
In view of the above, the invention provides a fine-grained vehicle image retrieval method and device based on multi-scale constraint, a computer device and a storage medium, which are used for solving the problems that the vehicle image needing to search information is not accurately positioned, the information is not clearly extracted and the like in the prior art.
The first aspect of the embodiment of the invention provides a fine-grained vehicle image retrieval method based on multi-scale constraint, which comprises the following steps:
carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set;
processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set;
training parameter values of target parameters of the fine-grained neural network model by adopting the third training data set to obtain a trained fine-grained neural network model;
and inputting the vehicle image to be recognized into a trained fine-grained neural network model for recognition, and obtaining a target vehicle image of the same type as the vehicle image to be recognized.
Further, before the step of labeling the multi-scale region information of the first training data set to obtain the second training data set, the method further includes:
constructing a fine-grained neural network model based on VGG-m or Alex-Net, and replacing a full-connection layer with a global average pooling layer;
and pre-training the fine-grained neural network model by adopting an ImageNet data set.
Further, the step of processing the second training data set based on the improved bounding box constraint algorithm and the Helen constraint algorithm includes:
carrying out bounding box constraint algorithm optimization on the detection result according to the mutual inclusion relationship of the multi-scale regions in the second training data set picture, and screening out a detection frame containing a target object and a multi-scale target center in the picture;
and extracting target objects and detection frames of the central parts of all scales which are ranked in the front by using FASTER-RCNN and meet the Helen detection constraint condition.
Further, the step of training parameter values of target parameters of the fine-grained neural network model using the third training data set includes:
inputting the images in the third training data set into a fine-grained neural network model, extracting image features through a last layer of activation convolution layer of the fine-grained neural network model, and outputting n two-dimensional feature maps, wherein each feature map is distributed to represent a feature saliency area with a plurality of activation responses;
superposing the n two-dimensional feature maps, setting a threshold, selecting an area with activation response higher than the threshold after superposition, and obtaining a mapping mask;
adjusting the size of the mask map by adopting a bicubic interpolation method to enable the size of the mask map to be the same as that of the input image, and covering the mask map on the input image;
and selecting a region with the largest area occupied by the mask map and the activation response higher than a threshold value, wherein the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.
A second aspect of an embodiment of the present invention provides an apparatus for retrieving a fine-grained vehicle image based on a multi-scale constraint, where the apparatus includes:
the first acquisition module is used for carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set;
the processing module is used for processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set;
the training module is used for training the parameter values of the target parameters of the fine-grained neural network model by adopting the third training data set to obtain the trained fine-grained neural network model;
and the recognition module is used for inputting the vehicle image to be recognized into the trained fine-grained neural network model for recognition, and obtaining the target vehicle image of the same type as the vehicle image to be recognized.
Further, the apparatus further comprises:
the model construction module is used for constructing a fine-grained neural network model based on VGG-m or Alex-Net, and a global average pooling layer is adopted to replace a full connection layer;
and the pre-training module is used for pre-training the fine-grained neural network model by adopting ImageNet data set.
Further, the processing module comprises:
the bounding box constraint module is used for carrying out bounding box constraint algorithm optimization on the detection result according to the mutual inclusion relationship of the multi-scale regions in the second training data set picture, and screening out a detection frame containing a target object and a multi-scale target center in the picture;
and the Helen constraint module is used for extracting target objects and detection frames of the central parts of all scales, which contain the object score probability scores ranked in the front and meet the Helen detection constraint condition, by adopting FASTER-RCNN.
Further, the training module comprises:
an extraction module, configured to input a fine-grained neural network model to an image in the third training data set, extract image features through a last layer of activation convolution layer of the fine-grained neural network model, and output n two-dimensional feature maps, where each feature map distributively represents a feature saliency region having multiple activation responses;
the superposition module is used for superposing the n two-dimensional feature maps, setting a threshold value, and selecting an area with the activation response higher than the threshold value after superposition to obtain a mask map;
the adjusting module is used for adjusting the size of the mask map by adopting a bicubic interpolation method to make the size of the mask map the same as that of the input image and covering the mask map on the input image;
and the judging module is used for selecting a region with the largest occupied area in the mask map and the activation response higher than a threshold value, the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.
A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for fine-grained vehicle image retrieval based on multi-scale constraints when executing the computer program.
A fourth aspect of the embodiments of the present invention provides a computer-readable medium storing a computer program which, when being processed and executed, implements the steps of the above-mentioned fine-grained vehicle image retrieval method based on multi-scale constraints.
In the embodiment of the invention, the fine-grained characteristics of the image are extracted through deep learning by adopting a mode of automatically positioning the image object by adopting a fine-grained neural network model, and are compared with the similarity of the reference image characteristics, so that the image of the same type as the reference image object is identified. Compared with the prior art, the method and the device reduce the workload of labeling the image categories or selecting the frames to label the positions of the objects in advance, save the cost and improve the efficiency. And global average pooling is adopted to replace a neural network model of a full connection layer, so that the number of model parameters is reduced, the operation speed is improved, and the method is more suitable for retrieval and identification of large-scale image sets.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of a method for fine-grained vehicle image retrieval based on multi-scale constraints according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of Helen's constraint in a method for fine-grained vehicle image retrieval based on multi-scale constraint according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an apparatus for fine-grained vehicle image retrieval based on multi-scale constraint according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Referring to fig. 1, fig. 1 is a flowchart of a method for retrieving a fine-grained vehicle image based on a multi-scale constraint according to an embodiment of the present invention. As shown in fig. 1, the fine-grained vehicle image retrieval method based on multi-scale constraint of the embodiment includes the following steps:
step S102, carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set;
further, before the step of performing multi-scale region information labeling on the first training data set to obtain the second training data set, the method further includes:
constructing a fine-grained neural network model based on VGG-m or Alex-Net, and replacing a full-connection layer with a global average pooling layer;
and pre-training the fine-grained neural network model by adopting an ImageNet data set.
Specifically, a fine-grained neural network model based on VGG-16 is constructed, global average pooling is adopted to replace a full-link layer, image features extracted from activated convolution feature mapping are directly fused, the number of parameters is reduced, and the operation speed is improved.
The fine-grained neural network model is pre-trained by using the ImageNet data set as a first training data set, and the whole model is pre-trained.
In order to fine-tune the network model of the MA-CNN with multiple regional scales, marking a multi-scale target region of training data is needed, dividing the detected target region into 3 scales, wherein the most central scale region is p0Region with an intermediate dimension of p1The outermost part of the region is the complete target region. The target region comprises p1And p0Region, p1Region includes p0And (4) a region. Wherein p is0Is given by the formula (1), the middle part p1Is represented by the formula (2), wherein x1,y1,x2,y2Respectively are the horizontal and vertical coordinate values of the lower left corner and the upper right corner of the outermost region of the target object.
Figure BDA0002307278440000051
And
Figure BDA0002307278440000052
left side labeling information for the target object center part p0, respectively
The abscissa and ordinate values of the lower and upper right corners.
Figure BDA0002307278440000053
And
Figure BDA0002307278440000054
respectively, a central part p of the target object1And labeling the horizontal and vertical coordinate values of the left lower corner and the right upper corner of the information. The specific labeling effect is shown in fig. 2.
Figure BDA0002307278440000055
Figure BDA0002307278440000056
Step S104, processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set;
and processing the vehicle data image based on the vehicle image data labeled by the multi-scale regional information by using an improved bounding box constraint algorithm and combining a Helen constraint algorithm, so that the final information of the labeled vehicle is positioned more accurately, and the processed vehicle image is arranged into a data set.
The method comprises the following specific steps: according to the mutual inclusion relationship of the multi-scale regions, the bounding box constraint algorithm optimization is carried out on the detection result, the detection frame containing the target object and the multi-scale target center in the picture is more effectively screened out, and the detection frame containing the target object is detectedIncreased number of detection frames and increased accuracy of the detected position, p1The region being contained within a target region of the object, p0The region is contained in p1Within the zone. I.e., the constraint conditions that satisfy equations (3) and (4) and the value of equation (4) is not 0, the value of epsilon is set to 10, the detection boxes satisfying the above relationship are selected and the detection boxes including the target object score probability scores detected by MA-CNN are preferentially selected to be ranked in descending order.
Figure BDA0002307278440000057
Figure BDA0002307278440000058
And after bounding the labeled target vehicle image by a bounding box, classifying to form a corresponding vehicle image data set.
In addition, since bounding box constraints can only determine the inclusion relationship of each scale region, the central points of the scale detection frames cannot be well constrained to the same central point. Therefore, the invention restrains the central point of each scale detection frame at the same central point to obtain the detection frame with more accurate detection position, and three points which are not on the same straight line on the plane uniquely determine a circumscribed circle, thereby calculating the central point coordinate of the multi-scale area, wherein formula (5) is the calculation formula of the detection central coordinate of the target object, formula (6) is the calculation formula of the central part p of the target object1The formula of the center coordinates of the detection frame, as shown in equation (7), is the center part p of the target object0Formulas (8), (9), and (10) are formulas of distances between the respective center coordinates. The circumscribed circle area of the triangle connecting the center coordinates is calculated by equations (11) and (12). The maximum value of the area S is set to 120. That is, the area size calculated by the equation (12) cannot exceed 120, and the candidate frames extracted by FASTER-RCNN include the target object and the detection frames of the central parts of the respective scales, which have the highest object score probability scores and satisfy the heron detection constraint condition. Fig. 4 is a schematic diagram of the constraints of the hellen detection.
Figure BDA0002307278440000061
Figure BDA0002307278440000062
Figure BDA0002307278440000063
Figure BDA0002307278440000064
Figure BDA0002307278440000065
Figure BDA0002307278440000066
Figure BDA0002307278440000067
Figure BDA0002307278440000068
After bounding the labeled target vehicle image by bounding boxes, classifying the labeled target vehicle image to form a corresponding vehicle image data set
Step S106, training parameter values of target parameters of the fine-grained neural network model by adopting a third training data set to obtain the trained fine-grained neural network model;
training a neural network by using the marked and positioned vehicle data set, and training the whole neural network after adjusting a loss function of the neural network, so that the whole network can accurately identify the fine granularity of the vehicle image and extract corresponding characteristics;
and training the constructed neural network model (in the mode of supervised learning of labeled data) by using a vehicle data set containing fine-grained image classification of different vehicle attributes for vehicle feature extraction and vehicle multi-attribute recognition.
Specifically, a first image is input into a trained fine-grained neural network model, n two-dimensional feature maps are output after image features are extracted through a last layer of activation convolution layer of the model, and each feature map is in a distributed mode and represents a plurality of feature significance regions. Assume that a first image of a given size H × W has a convolution feature of H × W × d after convolution, that is, the convolution feature includes a series of two-dimensional feature maps S ═ { Sn } (n ═ 1, · d), and the Sn size is H × W. Sn is a feature map of the nth channel, i.e., the nth feature.
Then, overlapping the n two-dimensional feature maps, setting a threshold, selecting an area with activation response higher than the threshold after overlapping, and obtaining a summary feature mask map;
the feature mapping activation region activated by the activation function can represent a meaningful part of an image semantically, but the activation region of a single channel cannot accurately represent the meaningful semantic part of the image, and the activation region can be determined to be the meaningful part only if the same region of a plurality of channels is the activation region, so that the same regions of the plurality of channels need to be overlapped, and the significance of the activation region is enhanced. Mapping and superposing n two-dimensional features, namely converting the three-dimensional convolution feature of hxwxd into an hxw two-dimensional tensor, adding in the depth direction, and expressing the superposed summarized feature as
Figure BDA0002307278440000071
Setting a threshold α to construct a mask M with the same size as the summary feature map A, the mask M can be expressed as
Figure BDA0002307278440000072
Then, the size of the mask map is adjusted by bicubic interpolation to be the same as the size of the input image, and the mask map is overlaid on the input image.
The size of the mask map is adjusted by adopting a bicubic interpolation method according to the distance of the original imageThe pixel value of 16 nearest pixels of a certain pixel M (x, y) is used as the corresponding pixel of the calculation target image
Figure BDA0002307278440000073
In terms of the pixel value parameter, the nearest 16 pixels are 4 × 4 field points a (x + xi, y + yj) in the vicinity of the pixel value, where i, j is 0,1,2, and 3. And (3) solving the weight of the pixel values of 16 pixels by using a BiCubic function, wherein the pixel value of the target image pixel (X, Y) is the weighted superposition of 16 pixels.
Constructing a BiCubic function
Figure BDA0002307278440000074
Wherein s represents that a certain pixel point a (x + xi, y + yj) in 16 pixel points is mapped to a target image and then reaches the pixel point
Figure BDA0002307278440000075
W(s) represents the weighted value corresponding to the pixel point, then the point pair corresponds to the pixel point of the target image
Figure BDA0002307278440000076
Has a contribution of a (x + x)i,y+yj)×W(xi)×W(yj) Then, then
Figure BDA0002307278440000077
Has a pixel value of
Figure BDA0002307278440000078
And finally, selecting a region with the largest area occupied by the mask map and the activation response higher than a threshold value, namely a largest region with continuous pixel points of 1, wherein the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.
And selecting the area with the largest occupied area and adopting a Flood Fill algorithm.
Specifically, one pixel point in the mask map is selected as a starting point, whether the pixel point is 1 or not is judged, if yes, the pixel point is marked, otherwise, the pixel point is not marked, the starting point is expanded to the surrounding pixel points until all the pixel points are marked, and unmarked points are selected again to serve as the starting point. And finally, selecting the area with the most continuous pixels as a result. The corresponding area of the area in the input image is the main target object position of the image, and the activation response characteristic is the characteristic information of the target object.
And positioning the main target object position of the corresponding input image by using the area with the maximum number of continuous pixels, and extracting the characteristic information of the object.
And S108, inputting the vehicle image to be recognized into the trained fine-grained neural network model for recognition, and obtaining the target vehicle image of the same type as the vehicle image to be recognized.
And inputting the image to be recognized into a pre-trained fine-grained neural network model to automatically position a main target object, extracting the characteristics of the target object, comparing the characteristics with the characteristics of the target object of the first image, and outputting the image to be recognized containing the objects of the same category as the main target object of the first image.
The method for automatically positioning the main target object of the image to be recognized by the fine-grained neural network model and extracting the characteristics of the target object is consistent with the step in the S4. The extracted feature information of the image object to be recognized is compared with the feature information extracted in S4, and an image to be recognized containing an object of the same category as the main target object of the first image is output. The similarity calculation between the object feature information extracted from the image to be recognized and the target object feature information of the first image may be analyzed based on a cosine similarity algorithm. The specific formula is as follows:
Figure BDA0002307278440000081
the smaller the calculated value is, the higher the similarity is. Of course, in specific implementation, the analysis may also be performed according to other image similarity algorithms, which is not limited in this application.
The image main target object feature information extracted in steps S106 and S108 includes object fine local feature information, and images of the same category as the first image main target object can be identified from the image set according to these fine-grained local feature information. For example, the vehicle images of the same model as the vehicle in the reference image in the image set are identified according to fine-grained features such as the vehicle shape and the color of the vehicle.
The main target object features of the images extracted in the steps S106 and S108 may further reduce dimensionality, eliminate redundant information, and reduce computational cost by one or more of singular value decomposition, whitening, or principal component analysis.
In the embodiment of the invention, the fine-grained characteristics of the image are extracted through deep learning by adopting a mode of automatically positioning the image object by adopting a fine-grained neural network model, and are compared with the similarity of the reference image characteristics, so that the image of the same type as the reference image object is identified. Compared with the prior art, the method and the device reduce the workload of labeling the image categories or selecting the frames to label the positions of the objects in advance, save the cost and improve the efficiency. And global average pooling is adopted to replace a neural network model of a full connection layer, so that the number of model parameters is reduced, the operation speed is improved, and the method is more suitable for retrieval and identification of large-scale image sets.
Referring to fig. 3, fig. 3 is a block diagram of an apparatus for fine-grained vehicle image retrieval based on multi-scale constraint according to an embodiment of the present invention. As shown in fig. 3, the training model 20 for fine-grained vehicle image retrieval based on multi-scale constraints of the present embodiment includes a first obtaining module 202, a processing module 204, a training module 206, and a recognition module 208. The first obtaining module 202, the processing module 204, the training module 206, and the recognition module 208 are respectively configured to perform the specific methods in S102, S104, S106, and S108 in fig. 1, and details can be referred to in the related introduction of fig. 1 and are only briefly described here:
the first obtaining module 202 is configured to perform multi-scale regional information labeling on the first training data set to obtain a second training data set;
a processing module 204, configured to process the second training data set based on an improved bounding box constraint algorithm and a hellan constraint algorithm to obtain a third training data set;
a training module 206, configured to train parameter values of target parameters of the fine-grained neural network model by using a third training data set, to obtain a trained fine-grained neural network model;
and the identification module 208 is used for inputting the vehicle image to be identified into the trained fine-grained neural network model for identification, and obtaining a target vehicle image of the same type as the vehicle image to be identified.
Further, the apparatus further comprises:
the model construction module is used for constructing a fine-grained neural network model based on VGG-m or Alex-Net, and a global average pooling layer is adopted to replace a full connection layer;
and the pre-training module is used for pre-training the fine-grained neural network model by adopting the ImageNet data set.
Further, the processing module 204 includes:
the bounding box constraint module is used for carrying out bounding box constraint algorithm optimization on the detection result according to the mutual inclusion relationship of the multi-scale regions in the second training data set picture and screening out a detection frame containing the target object and the multi-scale target center in the picture;
and the Helen constraint module is used for extracting target objects and detection frames of the central parts of all scales, which contain the object score probability scores ranked in the front and meet the Helen detection constraint condition, by adopting FASTER-RCNN.
Further, the training module 206 includes:
the extraction module is used for inputting the images in the third training data set into a fine-grained neural network model, outputting n two-dimensional feature maps after extracting image features through the last layer of activation convolution layer of the fine-grained neural network model, and each feature map is distributed to represent a feature saliency area with a plurality of activation responses;
the superposition module is used for superposing the n two-dimensional feature maps, setting a threshold value, and selecting an area with the activation response higher than the threshold value after superposition to obtain a mask map;
the adjusting module is used for adjusting the size of the mask map by adopting a bicubic interpolation method to make the size of the mask map the same as that of the input image and covering the mask map on the input image;
and the judging module is used for selecting a region with the largest area occupied by the mask map and the activation response higher than a threshold value, the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.
Fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 10 of this embodiment includes: a processor 100, a memory 101 and a computer program 102 stored in said memory 101 and executable on said processor 100, such as a program for training a fine-grained vehicle image retrieval based on a multi-scale constraint. The processor 100, when executing the computer program 102, implements the steps in the above-described method embodiments, e.g., the steps of S102, S104, S106, and S108 shown in fig. 1. Alternatively, the processor 100, when executing the computer program 102, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the first obtaining module 202, the processing module 204, the training module 206 and the recognition module 208 shown in fig. 3.
Illustratively, the computer program 102 may be partitioned into one or more modules/units that are stored in the memory 101 and executed by the processor 100 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 102 in the terminal device 10. For example, the computer program 102 may be divided into a first acquisition module 202, a processing module 204, a training module 206, and a recognition module 208 (modules in a virtual device), each of which functions specifically as follows:
the first obtaining module 202 is configured to perform multi-scale regional information labeling on the first training data set to obtain a second training data set;
a processing module 204, configured to process the second training data set based on an improved bounding box constraint algorithm and a hellan constraint algorithm to obtain a third training data set;
a training module 206, configured to train parameter values of target parameters of the fine-grained neural network model by using a third training data set, to obtain a trained fine-grained neural network model;
and the identification module 208 is used for inputting the vehicle image to be identified into the trained fine-grained neural network model for identification, and obtaining a target vehicle image of the same type as the vehicle image to be identified.
The terminal device 10 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. Terminal device 10 may include, but is not limited to, a processor 100, a memory 101. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 10 and does not constitute a limitation of terminal device 10 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.
The Processor 100 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 101 may be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may also be an external storage device of the terminal device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 10. Further, the memory 101 may also include both an internal storage unit of the terminal device 10 and an external storage device. The memory 101 is used for storing the computer program and other programs and data required by the terminal device 10. The memory 101 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A fine-grained vehicle image retrieval method based on multi-scale constraint is characterized by comprising the following steps:
carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set;
processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set;
training parameter values of target parameters of the fine-grained neural network model by adopting the third training data set to obtain a trained fine-grained neural network model;
and inputting the vehicle image to be recognized into a trained fine-grained neural network model for recognition, and obtaining a target vehicle image of the same type as the vehicle image to be recognized.
2. The method for fine-grained vehicle image retrieval based on multi-scale constraint according to claim 1, wherein before the step of performing multi-scale regional information labeling on the first training data set to obtain the second training data set, the method further comprises:
constructing a fine-grained neural network model based on VGG-m or Alex-Net, and replacing a full-connection layer with a global average pooling layer;
and pre-training the fine-grained neural network model by adopting an ImageNet data set.
3. The method for fine-grained vehicle image retrieval based on multi-scale constraints according to claim 2, wherein the step of processing the second training data set based on the modified bounding box constraint algorithm and the Helen constraint algorithm comprises:
carrying out bounding box constraint algorithm optimization on the detection result according to the mutual inclusion relationship of the multi-scale regions in the second training data set picture, and screening out a detection frame containing a target object and a multi-scale target center in the picture;
and extracting target objects and detection frames of the central parts of all scales which are ranked in the front by using FASTER-RCNN and meet the Helen detection constraint condition.
4. A method for multi-scale constraint-based fine-grained vehicle image retrieval according to claim 1, wherein the step of training parameter values of target parameters of a fine-grained neural network model using the third training data set comprises:
inputting the images in the third training data set into a fine-grained neural network model, extracting image features through a last layer of activation convolution layer of the fine-grained neural network model, and outputting n two-dimensional feature maps, wherein each feature map is distributed to represent a feature saliency area with a plurality of activation responses;
superposing the n two-dimensional feature maps, setting a threshold, selecting an area with activation response higher than the threshold after superposition, and obtaining a mapping mask;
adjusting the size of the mask map by adopting a bicubic interpolation method to enable the size of the mask map to be the same as that of the input image, and covering the mask map on the input image;
and selecting a region with the largest area occupied by the mask map and the activation response higher than a threshold value, wherein the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.
5. An apparatus for fine-grained vehicle image retrieval based on multi-scale constraints, comprising:
the first acquisition module is used for carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set;
the processing module is used for processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set;
the training module is used for training the parameter values of the target parameters of the fine-grained neural network model by adopting the third training data set to obtain the trained fine-grained neural network model;
and the recognition module is used for inputting the vehicle image to be recognized into the trained fine-grained neural network model for recognition, and obtaining the target vehicle image of the same type as the vehicle image to be recognized.
6. The apparatus for fine-grained vehicle image retrieval based on multi-scale constraints according to claim 5, further comprising:
the model construction module is used for constructing a fine-grained neural network model based on VGG-m or Alex-Net, and a global average pooling layer is adopted to replace a full connection layer;
and the pre-training module is used for pre-training the fine-grained neural network model by adopting ImageNet data set.
7. The apparatus for fine-grained vehicle image retrieval based on multi-scale constraints according to claim 5, wherein the processing module comprises:
the bounding box constraint module is used for carrying out bounding box constraint algorithm optimization on the detection result according to the mutual inclusion relationship of the multi-scale regions in the second training data set picture, and screening out a detection frame containing a target object and a multi-scale target center in the picture;
and the Helen constraint module is used for extracting target objects and detection frames of the central parts of all scales, which contain the object score probability scores ranked in the front and meet the Helen detection constraint condition, by adopting FASTER-RCNN.
8. The apparatus for multi-scale constraint-based fine-grained vehicle image retrieval of claim 5, wherein the training module comprises:
an extraction module, configured to input a fine-grained neural network model to an image in the third training data set, extract image features through a last layer of activation convolution layer of the fine-grained neural network model, and output n two-dimensional feature maps, where each feature map distributively represents a feature saliency region having multiple activation responses;
the superposition module is used for superposing the n two-dimensional feature maps, setting a threshold value, and selecting an area with the activation response higher than the threshold value after superposition to obtain a mask map;
the adjusting module is used for adjusting the size of the mask map by adopting a bicubic interpolation method to make the size of the mask map the same as that of the input image and covering the mask map on the input image;
and the judging module is used for selecting a region with the largest occupied area in the mask map and the activation response higher than a threshold value, the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-4 when executing the computer program.
10. A computer-readable medium, in which a computer program is stored which, when being processed and executed, carries out the steps of the method according to any one of claims 1 to 4.
CN201911245009.8A 2019-12-06 2019-12-06 Fine-grained vehicle image retrieval method and device based on multi-scale constraint Pending CN111104538A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911245009.8A CN111104538A (en) 2019-12-06 2019-12-06 Fine-grained vehicle image retrieval method and device based on multi-scale constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911245009.8A CN111104538A (en) 2019-12-06 2019-12-06 Fine-grained vehicle image retrieval method and device based on multi-scale constraint

Publications (1)

Publication Number Publication Date
CN111104538A true CN111104538A (en) 2020-05-05

Family

ID=70421623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911245009.8A Pending CN111104538A (en) 2019-12-06 2019-12-06 Fine-grained vehicle image retrieval method and device based on multi-scale constraint

Country Status (1)

Country Link
CN (1) CN111104538A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737512A (en) * 2020-06-04 2020-10-02 东华大学 Silk cultural relic image retrieval method based on depth feature region fusion
CN112085106A (en) * 2020-09-10 2020-12-15 江苏提米智能科技有限公司 Image identification method and device applied to multi-image fusion, electronic equipment and storage medium
CN112200004A (en) * 2020-09-15 2021-01-08 深圳市优必选科技股份有限公司 Training method and device of image detection model and terminal equipment
CN112348112A (en) * 2020-11-24 2021-02-09 深圳市优必选科技股份有限公司 Training method and device for image recognition model and terminal equipment
CN113658101A (en) * 2021-07-19 2021-11-16 南方科技大学 Method and device for detecting landmark points in image, terminal equipment and storage medium
CN113704537A (en) * 2021-10-28 2021-11-26 南京码极客科技有限公司 Fine-grained cross-media retrieval method based on multi-scale feature union
CN113762266A (en) * 2021-09-01 2021-12-07 北京中星天视科技有限公司 Target detection method, device, electronic equipment and computer readable medium
CN113936195A (en) * 2021-12-16 2022-01-14 云账户技术(天津)有限公司 Sensitive image recognition model training method and device and electronic equipment
CN114155495A (en) * 2022-02-10 2022-03-08 西南交通大学 Safety monitoring method, device, equipment and medium for vehicle operation in sea-crossing bridge
CN114419349A (en) * 2022-03-30 2022-04-29 中国科学技术大学 Image matching method and device
CN115272763A (en) * 2022-07-27 2022-11-01 四川大学 Bird identification method based on fine-grained feature fusion
CN115393846A (en) * 2022-10-28 2022-11-25 成都西交智汇大数据科技有限公司 Blood cell identification method, device, equipment and readable storage medium
CN116189231A (en) * 2022-12-06 2023-05-30 吉林省吉林祥云信息技术有限公司 AI visual portrait identification escape method, system, equipment and storage medium based on countermeasure network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481188A (en) * 2017-06-23 2017-12-15 珠海经济特区远宏科技有限公司 A kind of image super-resolution reconstructing method
CN108764292A (en) * 2018-04-27 2018-11-06 北京大学 Deep learning image object mapping based on Weakly supervised information and localization method
CN109359684A (en) * 2018-10-17 2019-02-19 苏州大学 Fine granularity model recognizing method based on Weakly supervised positioning and subclass similarity measurement
CN109857889A (en) * 2018-12-19 2019-06-07 苏州科达科技股份有限公司 A kind of image search method, device, equipment and readable storage medium storing program for executing
CN110009679A (en) * 2019-02-28 2019-07-12 江南大学 A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks
CN110084139A (en) * 2019-04-04 2019-08-02 长沙千视通智能科技有限公司 A kind of recognition methods again of the vehicle based on multiple-limb deep learning
CN110533024A (en) * 2019-07-10 2019-12-03 杭州电子科技大学 Biquadratic pond fine granularity image classification method based on multiple dimensioned ROI feature

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481188A (en) * 2017-06-23 2017-12-15 珠海经济特区远宏科技有限公司 A kind of image super-resolution reconstructing method
CN108764292A (en) * 2018-04-27 2018-11-06 北京大学 Deep learning image object mapping based on Weakly supervised information and localization method
CN109359684A (en) * 2018-10-17 2019-02-19 苏州大学 Fine granularity model recognizing method based on Weakly supervised positioning and subclass similarity measurement
CN109857889A (en) * 2018-12-19 2019-06-07 苏州科达科技股份有限公司 A kind of image search method, device, equipment and readable storage medium storing program for executing
CN110009679A (en) * 2019-02-28 2019-07-12 江南大学 A kind of object localization method based on Analysis On Multi-scale Features convolutional neural networks
CN110084139A (en) * 2019-04-04 2019-08-02 长沙千视通智能科技有限公司 A kind of recognition methods again of the vehicle based on multiple-limb deep learning
CN110533024A (en) * 2019-07-10 2019-12-03 杭州电子科技大学 Biquadratic pond fine granularity image classification method based on multiple dimensioned ROI feature

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HELIANG ZHENG: "Learning Multi-Attention Convolutional Neural Network for Fine-Grained Image Recognition", GVF, pages 5209 - 5216 *
熊昌镇、蒋杰: "多尺度区域特征的细粒度分类算法研究", vol. 51, no. 51, pages 55 - 60 *
魏秀参: "深度学习下细粒度级别图像的视觉分析研究", 《中国优秀硕士学位论文全文数据库(硕士) 信息科技辑》, pages 1 - 116 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737512B (en) * 2020-06-04 2021-11-12 东华大学 Silk cultural relic image retrieval method based on depth feature region fusion
CN111737512A (en) * 2020-06-04 2020-10-02 东华大学 Silk cultural relic image retrieval method based on depth feature region fusion
CN112085106A (en) * 2020-09-10 2020-12-15 江苏提米智能科技有限公司 Image identification method and device applied to multi-image fusion, electronic equipment and storage medium
CN112200004A (en) * 2020-09-15 2021-01-08 深圳市优必选科技股份有限公司 Training method and device of image detection model and terminal equipment
CN112200004B (en) * 2020-09-15 2024-01-16 深圳市优必选科技股份有限公司 Training method and device for image detection model and terminal equipment
CN112348112A (en) * 2020-11-24 2021-02-09 深圳市优必选科技股份有限公司 Training method and device for image recognition model and terminal equipment
CN112348112B (en) * 2020-11-24 2023-12-15 深圳市优必选科技股份有限公司 Training method and training device for image recognition model and terminal equipment
CN113658101A (en) * 2021-07-19 2021-11-16 南方科技大学 Method and device for detecting landmark points in image, terminal equipment and storage medium
CN113658101B (en) * 2021-07-19 2023-06-30 南方科技大学 Method and device for detecting landmark points in image, terminal equipment and storage medium
CN113762266B (en) * 2021-09-01 2024-04-26 北京中星天视科技有限公司 Target detection method, device, electronic equipment and computer readable medium
CN113762266A (en) * 2021-09-01 2021-12-07 北京中星天视科技有限公司 Target detection method, device, electronic equipment and computer readable medium
CN113704537A (en) * 2021-10-28 2021-11-26 南京码极客科技有限公司 Fine-grained cross-media retrieval method based on multi-scale feature union
CN113936195A (en) * 2021-12-16 2022-01-14 云账户技术(天津)有限公司 Sensitive image recognition model training method and device and electronic equipment
CN114155495A (en) * 2022-02-10 2022-03-08 西南交通大学 Safety monitoring method, device, equipment and medium for vehicle operation in sea-crossing bridge
CN114419349A (en) * 2022-03-30 2022-04-29 中国科学技术大学 Image matching method and device
CN114419349B (en) * 2022-03-30 2022-07-15 中国科学技术大学 Image matching method and device
CN115272763B (en) * 2022-07-27 2023-04-07 四川大学 Bird identification method based on fine-grained feature fusion
CN115272763A (en) * 2022-07-27 2022-11-01 四川大学 Bird identification method based on fine-grained feature fusion
CN115393846A (en) * 2022-10-28 2022-11-25 成都西交智汇大数据科技有限公司 Blood cell identification method, device, equipment and readable storage medium
CN116189231A (en) * 2022-12-06 2023-05-30 吉林省吉林祥云信息技术有限公司 AI visual portrait identification escape method, system, equipment and storage medium based on countermeasure network

Similar Documents

Publication Publication Date Title
CN111104538A (en) Fine-grained vehicle image retrieval method and device based on multi-scale constraint
US20210158699A1 (en) Method, device, readable medium and electronic device for identifying traffic light signal
CN106951830B (en) Image scene multi-object marking method based on prior condition constraint
CN112052839A (en) Image data processing method, apparatus, device and medium
Alam et al. Indian traffic sign detection and recognition
CN111126459A (en) Method and device for identifying fine granularity of vehicle
CN107679531A (en) Licence plate recognition method, device, equipment and storage medium based on deep learning
US20230076266A1 (en) Data processing system, object detection method, and apparatus thereof
CN107545263B (en) Object detection method and device
JP2016062610A (en) Feature model creation method and feature model creation device
CN111680678B (en) Target area identification method, device, equipment and readable storage medium
CN104200228B (en) Recognizing method and system for safety belt
CN110852311A (en) Three-dimensional human hand key point positioning method and device
CN111931683B (en) Image recognition method, device and computer readable storage medium
CN112801236B (en) Image recognition model migration method, device, equipment and storage medium
CN113537180B (en) Tree obstacle identification method and device, computer equipment and storage medium
CN113095152A (en) Lane line detection method and system based on regression
CN110704652A (en) Vehicle image fine-grained retrieval method and device based on multiple attention mechanism
Li et al. An aerial image segmentation approach based on enhanced multi-scale convolutional neural network
CN115661522A (en) Vehicle guiding method, system, equipment and medium based on visual semantic vector
Chen et al. Contrast limited adaptive histogram equalization for recognizing road marking at night based on YOLO models
CN114168768A (en) Image retrieval method and related equipment
CN111104539A (en) Fine-grained vehicle image retrieval method, device and equipment
Ghasemi et al. Optimizing Sector Ring Histogram of Oriented Gradients for human injured detection from drone images
CN111797704B (en) Action recognition method based on related object perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200505