CN111104538A

CN111104538A - Fine-grained vehicle image retrieval method and device based on multi-scale constraint

Info

Publication number: CN111104538A
Application number: CN201911245009.8A
Authority: CN
Inventors: 张斯尧; 罗茜; 王思远; 蒋杰; 张�诚; 李乾; 谢喜林; 黄晋
Original assignee: Shenzhen Jiuling Software Technology Co ltd
Current assignee: Shenzhen Jiuling Software Technology Co ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-05-05

Abstract

The invention discloses a fine-grained vehicle image retrieval method and a fine-grained vehicle image retrieval device based on multi-scale constraint, wherein the method comprises the following steps: carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set; processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set; training parameter values of target parameters of the fine-grained neural network model by adopting the third training data set to obtain a trained fine-grained neural network model; and inputting the vehicle image to be recognized into a trained fine-grained neural network model for recognition, and obtaining a target vehicle image of the same type as the vehicle image to be recognized. Compared with the prior art, the method and the device reduce the workload of labeling the image category or selecting the frame to label the position of the object in advance, save the cost and improve the efficiency.

Description

Fine-grained vehicle image retrieval method and device based on multi-scale constraint

Technical Field

The invention relates to the technical field of image retrieval, in particular to a fine-grained vehicle image retrieval method and device based on multi-scale constraint, terminal equipment and a computer readable medium.

Background

Vehicle image retrieval is a technology for retrieving similar images through input images, and mainly relates to two parts, namely image vehicle feature extraction and image vehicle feature similarity analysis. Fine-grained image recognition consists in finding local regional features in images that have subtle differences, allowing the recognition of different subclasses within a large class. The fine-grained image recognition technology is used for vehicle image retrieval, and fine-grained features of the images can be extracted and similarity of the fine-grained features of the images can be analyzed.

Meanwhile, with the continuous promotion of smart cities, the road traffic safety of cities is more and more emphasized by people. However, the existing vehicle image retrieval algorithm has low recognition rate on vehicles of the same type and information extraction is not clear.

Disclosure of Invention

In view of the above, the invention provides a fine-grained vehicle image retrieval method and device based on multi-scale constraint, a computer device and a storage medium, which are used for solving the problems that the vehicle image needing to search information is not accurately positioned, the information is not clearly extracted and the like in the prior art.

The first aspect of the embodiment of the invention provides a fine-grained vehicle image retrieval method based on multi-scale constraint, which comprises the following steps:

carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set;

processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set;

training parameter values of target parameters of the fine-grained neural network model by adopting the third training data set to obtain a trained fine-grained neural network model;

and inputting the vehicle image to be recognized into a trained fine-grained neural network model for recognition, and obtaining a target vehicle image of the same type as the vehicle image to be recognized.

Further, before the step of labeling the multi-scale region information of the first training data set to obtain the second training data set, the method further includes:

constructing a fine-grained neural network model based on VGG-m or Alex-Net, and replacing a full-connection layer with a global average pooling layer;

and pre-training the fine-grained neural network model by adopting an ImageNet data set.

Further, the step of processing the second training data set based on the improved bounding box constraint algorithm and the Helen constraint algorithm includes:

carrying out bounding box constraint algorithm optimization on the detection result according to the mutual inclusion relationship of the multi-scale regions in the second training data set picture, and screening out a detection frame containing a target object and a multi-scale target center in the picture;

and extracting target objects and detection frames of the central parts of all scales which are ranked in the front by using FASTER-RCNN and meet the Helen detection constraint condition.

Further, the step of training parameter values of target parameters of the fine-grained neural network model using the third training data set includes:

inputting the images in the third training data set into a fine-grained neural network model, extracting image features through a last layer of activation convolution layer of the fine-grained neural network model, and outputting n two-dimensional feature maps, wherein each feature map is distributed to represent a feature saliency area with a plurality of activation responses;

superposing the n two-dimensional feature maps, setting a threshold, selecting an area with activation response higher than the threshold after superposition, and obtaining a mapping mask;

adjusting the size of the mask map by adopting a bicubic interpolation method to enable the size of the mask map to be the same as that of the input image, and covering the mask map on the input image;

and selecting a region with the largest area occupied by the mask map and the activation response higher than a threshold value, wherein the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.

A second aspect of an embodiment of the present invention provides an apparatus for retrieving a fine-grained vehicle image based on a multi-scale constraint, where the apparatus includes:

the first acquisition module is used for carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set;

the processing module is used for processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set;

the training module is used for training the parameter values of the target parameters of the fine-grained neural network model by adopting the third training data set to obtain the trained fine-grained neural network model;

and the recognition module is used for inputting the vehicle image to be recognized into the trained fine-grained neural network model for recognition, and obtaining the target vehicle image of the same type as the vehicle image to be recognized.

Further, the apparatus further comprises:

the model construction module is used for constructing a fine-grained neural network model based on VGG-m or Alex-Net, and a global average pooling layer is adopted to replace a full connection layer;

and the pre-training module is used for pre-training the fine-grained neural network model by adopting ImageNet data set.

Further, the processing module comprises:

the bounding box constraint module is used for carrying out bounding box constraint algorithm optimization on the detection result according to the mutual inclusion relationship of the multi-scale regions in the second training data set picture, and screening out a detection frame containing a target object and a multi-scale target center in the picture;

and the Helen constraint module is used for extracting target objects and detection frames of the central parts of all scales, which contain the object score probability scores ranked in the front and meet the Helen detection constraint condition, by adopting FASTER-RCNN.

Further, the training module comprises:

an extraction module, configured to input a fine-grained neural network model to an image in the third training data set, extract image features through a last layer of activation convolution layer of the fine-grained neural network model, and output n two-dimensional feature maps, where each feature map distributively represents a feature saliency region having multiple activation responses;

the superposition module is used for superposing the n two-dimensional feature maps, setting a threshold value, and selecting an area with the activation response higher than the threshold value after superposition to obtain a mask map;

the adjusting module is used for adjusting the size of the mask map by adopting a bicubic interpolation method to make the size of the mask map the same as that of the input image and covering the mask map on the input image;

and the judging module is used for selecting a region with the largest occupied area in the mask map and the activation response higher than a threshold value, the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.

A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for fine-grained vehicle image retrieval based on multi-scale constraints when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable medium storing a computer program which, when being processed and executed, implements the steps of the above-mentioned fine-grained vehicle image retrieval method based on multi-scale constraints.

In the embodiment of the invention, the fine-grained characteristics of the image are extracted through deep learning by adopting a mode of automatically positioning the image object by adopting a fine-grained neural network model, and are compared with the similarity of the reference image characteristics, so that the image of the same type as the reference image object is identified. Compared with the prior art, the method and the device reduce the workload of labeling the image categories or selecting the frames to label the positions of the objects in advance, save the cost and improve the efficiency. And global average pooling is adopted to replace a neural network model of a full connection layer, so that the number of model parameters is reduced, the operation speed is improved, and the method is more suitable for retrieval and identification of large-scale image sets.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

FIG. 1 is a flowchart of a method for fine-grained vehicle image retrieval based on multi-scale constraints according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of Helen's constraint in a method for fine-grained vehicle image retrieval based on multi-scale constraint according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an apparatus for fine-grained vehicle image retrieval based on multi-scale constraint according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Referring to fig. 1, fig. 1 is a flowchart of a method for retrieving a fine-grained vehicle image based on a multi-scale constraint according to an embodiment of the present invention. As shown in fig. 1, the fine-grained vehicle image retrieval method based on multi-scale constraint of the embodiment includes the following steps:

step S102, carrying out multi-scale regional information labeling on the first training data set to obtain a second training data set;

further, before the step of performing multi-scale region information labeling on the first training data set to obtain the second training data set, the method further includes:

Specifically, a fine-grained neural network model based on VGG-16 is constructed, global average pooling is adopted to replace a full-link layer, image features extracted from activated convolution feature mapping are directly fused, the number of parameters is reduced, and the operation speed is improved.

The fine-grained neural network model is pre-trained by using the ImageNet data set as a first training data set, and the whole model is pre-trained.

In order to fine-tune the network model of the MA-CNN with multiple regional scales, marking a multi-scale target region of training data is needed, dividing the detected target region into 3 scales, wherein the most central scale region is p₀Region with an intermediate dimension of p₁The outermost part of the region is the complete target region. The target region comprises p₁And p₀Region, p₁Region includes p₀And (4) a region. Wherein p is₀Is given by the formula (1), the middle part p₁Is represented by the formula (2), wherein x₁,y₁,x₂,y₂Respectively are the horizontal and vertical coordinate values of the lower left corner and the upper right corner of the outermost region of the target object.

And

left side labeling information for the target object center part p0, respectively

The abscissa and ordinate values of the lower and upper right corners.

And

respectively, a central part p of the target object₁And labeling the horizontal and vertical coordinate values of the left lower corner and the right upper corner of the information. The specific labeling effect is shown in fig. 2.

Step S104, processing the second training data set based on an improved bounding box constraint algorithm and a Helen constraint algorithm to obtain a third training data set;

and processing the vehicle data image based on the vehicle image data labeled by the multi-scale regional information by using an improved bounding box constraint algorithm and combining a Helen constraint algorithm, so that the final information of the labeled vehicle is positioned more accurately, and the processed vehicle image is arranged into a data set.

The method comprises the following specific steps: according to the mutual inclusion relationship of the multi-scale regions, the bounding box constraint algorithm optimization is carried out on the detection result, the detection frame containing the target object and the multi-scale target center in the picture is more effectively screened out, and the detection frame containing the target object is detectedIncreased number of detection frames and increased accuracy of the detected position, p₁The region being contained within a target region of the object, p₀The region is contained in p₁Within the zone. I.e., the constraint conditions that satisfy equations (3) and (4) and the value of equation (4) is not 0, the value of epsilon is set to 10, the detection boxes satisfying the above relationship are selected and the detection boxes including the target object score probability scores detected by MA-CNN are preferentially selected to be ranked in descending order.

And after bounding the labeled target vehicle image by a bounding box, classifying to form a corresponding vehicle image data set.

In addition, since bounding box constraints can only determine the inclusion relationship of each scale region, the central points of the scale detection frames cannot be well constrained to the same central point. Therefore, the invention restrains the central point of each scale detection frame at the same central point to obtain the detection frame with more accurate detection position, and three points which are not on the same straight line on the plane uniquely determine a circumscribed circle, thereby calculating the central point coordinate of the multi-scale area, wherein formula (5) is the calculation formula of the detection central coordinate of the target object, formula (6) is the calculation formula of the central part p of the target object₁The formula of the center coordinates of the detection frame, as shown in equation (7), is the center part p of the target object₀Formulas (8), (9), and (10) are formulas of distances between the respective center coordinates. The circumscribed circle area of the triangle connecting the center coordinates is calculated by equations (11) and (12). The maximum value of the area S is set to 120. That is, the area size calculated by the equation (12) cannot exceed 120, and the candidate frames extracted by FASTER-RCNN include the target object and the detection frames of the central parts of the respective scales, which have the highest object score probability scores and satisfy the heron detection constraint condition. Fig. 4 is a schematic diagram of the constraints of the hellen detection.

After bounding the labeled target vehicle image by bounding boxes, classifying the labeled target vehicle image to form a corresponding vehicle image data set

Step S106, training parameter values of target parameters of the fine-grained neural network model by adopting a third training data set to obtain the trained fine-grained neural network model;

training a neural network by using the marked and positioned vehicle data set, and training the whole neural network after adjusting a loss function of the neural network, so that the whole network can accurately identify the fine granularity of the vehicle image and extract corresponding characteristics;

and training the constructed neural network model (in the mode of supervised learning of labeled data) by using a vehicle data set containing fine-grained image classification of different vehicle attributes for vehicle feature extraction and vehicle multi-attribute recognition.

Specifically, a first image is input into a trained fine-grained neural network model, n two-dimensional feature maps are output after image features are extracted through a last layer of activation convolution layer of the model, and each feature map is in a distributed mode and represents a plurality of feature significance regions. Assume that a first image of a given size H × W has a convolution feature of H × W × d after convolution, that is, the convolution feature includes a series of two-dimensional feature maps S ═ { Sn } (n ═ 1, · d), and the Sn size is H × W. Sn is a feature map of the nth channel, i.e., the nth feature.

Then, overlapping the n two-dimensional feature maps, setting a threshold, selecting an area with activation response higher than the threshold after overlapping, and obtaining a summary feature mask map;

the feature mapping activation region activated by the activation function can represent a meaningful part of an image semantically, but the activation region of a single channel cannot accurately represent the meaningful semantic part of the image, and the activation region can be determined to be the meaningful part only if the same region of a plurality of channels is the activation region, so that the same regions of the plurality of channels need to be overlapped, and the significance of the activation region is enhanced. Mapping and superposing n two-dimensional features, namely converting the three-dimensional convolution feature of hxwxd into an hxw two-dimensional tensor, adding in the depth direction, and expressing the superposed summarized feature as

Setting a threshold α to construct a mask M with the same size as the summary feature map A, the mask M can be expressed as

Then, the size of the mask map is adjusted by bicubic interpolation to be the same as the size of the input image, and the mask map is overlaid on the input image.

The size of the mask map is adjusted by adopting a bicubic interpolation method according to the distance of the original imageThe pixel value of 16 nearest pixels of a certain pixel M (x, y) is used as the corresponding pixel of the calculation target image

In terms of the pixel value parameter, the nearest 16 pixels are 4 × 4 field points a (x + xi, y + yj) in the vicinity of the pixel value, where i, j is 0,1,2, and 3. And (3) solving the weight of the pixel values of 16 pixels by using a BiCubic function, wherein the pixel value of the target image pixel (X, Y) is the weighted superposition of 16 pixels.

Constructing a BiCubic function

Wherein s represents that a certain pixel point a (x + xi, y + yj) in 16 pixel points is mapped to a target image and then reaches the pixel point

W(s) represents the weighted value corresponding to the pixel point, then the point pair corresponds to the pixel point of the target image

Has a contribution of a (x + x)_i,y+y_j)×W(x_i)×W(y_j) Then, then

Has a pixel value of

And finally, selecting a region with the largest area occupied by the mask map and the activation response higher than a threshold value, namely a largest region with continuous pixel points of 1, wherein the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.

And selecting the area with the largest occupied area and adopting a Flood Fill algorithm.

Specifically, one pixel point in the mask map is selected as a starting point, whether the pixel point is 1 or not is judged, if yes, the pixel point is marked, otherwise, the pixel point is not marked, the starting point is expanded to the surrounding pixel points until all the pixel points are marked, and unmarked points are selected again to serve as the starting point. And finally, selecting the area with the most continuous pixels as a result. The corresponding area of the area in the input image is the main target object position of the image, and the activation response characteristic is the characteristic information of the target object.

And positioning the main target object position of the corresponding input image by using the area with the maximum number of continuous pixels, and extracting the characteristic information of the object.

And S108, inputting the vehicle image to be recognized into the trained fine-grained neural network model for recognition, and obtaining the target vehicle image of the same type as the vehicle image to be recognized.

And inputting the image to be recognized into a pre-trained fine-grained neural network model to automatically position a main target object, extracting the characteristics of the target object, comparing the characteristics with the characteristics of the target object of the first image, and outputting the image to be recognized containing the objects of the same category as the main target object of the first image.

The method for automatically positioning the main target object of the image to be recognized by the fine-grained neural network model and extracting the characteristics of the target object is consistent with the step in the S4. The extracted feature information of the image object to be recognized is compared with the feature information extracted in S4, and an image to be recognized containing an object of the same category as the main target object of the first image is output. The similarity calculation between the object feature information extracted from the image to be recognized and the target object feature information of the first image may be analyzed based on a cosine similarity algorithm. The specific formula is as follows:

the smaller the calculated value is, the higher the similarity is. Of course, in specific implementation, the analysis may also be performed according to other image similarity algorithms, which is not limited in this application.

The image main target object feature information extracted in steps S106 and S108 includes object fine local feature information, and images of the same category as the first image main target object can be identified from the image set according to these fine-grained local feature information. For example, the vehicle images of the same model as the vehicle in the reference image in the image set are identified according to fine-grained features such as the vehicle shape and the color of the vehicle.

The main target object features of the images extracted in the steps S106 and S108 may further reduce dimensionality, eliminate redundant information, and reduce computational cost by one or more of singular value decomposition, whitening, or principal component analysis.

Referring to fig. 3, fig. 3 is a block diagram of an apparatus for fine-grained vehicle image retrieval based on multi-scale constraint according to an embodiment of the present invention. As shown in fig. 3, the training model 20 for fine-grained vehicle image retrieval based on multi-scale constraints of the present embodiment includes a first obtaining module 202, a processing module 204, a training module 206, and a recognition module 208. The first obtaining module 202, the processing module 204, the training module 206, and the recognition module 208 are respectively configured to perform the specific methods in S102, S104, S106, and S108 in fig. 1, and details can be referred to in the related introduction of fig. 1 and are only briefly described here:

the first obtaining module 202 is configured to perform multi-scale regional information labeling on the first training data set to obtain a second training data set;

a processing module 204, configured to process the second training data set based on an improved bounding box constraint algorithm and a hellan constraint algorithm to obtain a third training data set;

a training module 206, configured to train parameter values of target parameters of the fine-grained neural network model by using a third training data set, to obtain a trained fine-grained neural network model;

and the identification module 208 is used for inputting the vehicle image to be identified into the trained fine-grained neural network model for identification, and obtaining a target vehicle image of the same type as the vehicle image to be identified.

Further, the apparatus further comprises:

and the pre-training module is used for pre-training the fine-grained neural network model by adopting the ImageNet data set.

Further, the processing module 204 includes:

the bounding box constraint module is used for carrying out bounding box constraint algorithm optimization on the detection result according to the mutual inclusion relationship of the multi-scale regions in the second training data set picture and screening out a detection frame containing the target object and the multi-scale target center in the picture;

Further, the training module 206 includes:

the extraction module is used for inputting the images in the third training data set into a fine-grained neural network model, outputting n two-dimensional feature maps after extracting image features through the last layer of activation convolution layer of the fine-grained neural network model, and each feature map is distributed to represent a feature saliency area with a plurality of activation responses;

and the judging module is used for selecting a region with the largest area occupied by the mask map and the activation response higher than a threshold value, the region corresponding to the region in the input image is the position of the main target object of the image, and the activation response characteristic is the characteristic of the target object.

Fig. 4 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 4, the terminal device 10 of this embodiment includes: a processor 100, a memory 101 and a computer program 102 stored in said memory 101 and executable on said processor 100, such as a program for training a fine-grained vehicle image retrieval based on a multi-scale constraint. The processor 100, when executing the computer program 102, implements the steps in the above-described method embodiments, e.g., the steps of S102, S104, S106, and S108 shown in fig. 1. Alternatively, the processor 100, when executing the computer program 102, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the first obtaining module 202, the processing module 204, the training module 206 and the recognition module 208 shown in fig. 3.

Illustratively, the computer program 102 may be partitioned into one or more modules/units that are stored in the memory 101 and executed by the processor 100 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 102 in the terminal device 10. For example, the computer program 102 may be divided into a first acquisition module 202, a processing module 204, a training module 206, and a recognition module 208 (modules in a virtual device), each of which functions specifically as follows:

The terminal device 10 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. Terminal device 10 may include, but is not limited to, a processor 100, a memory 101. Those skilled in the art will appreciate that fig. 4 is merely an example of a terminal device 10 and does not constitute a limitation of terminal device 10 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the terminal device may also include input-output devices, network access devices, buses, etc.

The Processor 100 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 101 may be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may also be an external storage device of the terminal device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 10. Further, the memory 101 may also include both an internal storage unit of the terminal device 10 and an external storage device. The memory 101 is used for storing the computer program and other programs and data required by the terminal device 10. The memory 101 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A fine-grained vehicle image retrieval method based on multi-scale constraint is characterized by comprising the following steps:

2. The method for fine-grained vehicle image retrieval based on multi-scale constraint according to claim 1, wherein before the step of performing multi-scale regional information labeling on the first training data set to obtain the second training data set, the method further comprises:

3. The method for fine-grained vehicle image retrieval based on multi-scale constraints according to claim 2, wherein the step of processing the second training data set based on the modified bounding box constraint algorithm and the Helen constraint algorithm comprises:

4. A method for multi-scale constraint-based fine-grained vehicle image retrieval according to claim 1, wherein the step of training parameter values of target parameters of a fine-grained neural network model using the third training data set comprises:

5. An apparatus for fine-grained vehicle image retrieval based on multi-scale constraints, comprising:

6. The apparatus for fine-grained vehicle image retrieval based on multi-scale constraints according to claim 5, further comprising:

7. The apparatus for fine-grained vehicle image retrieval based on multi-scale constraints according to claim 5, wherein the processing module comprises:

8. The apparatus for multi-scale constraint-based fine-grained vehicle image retrieval of claim 5, wherein the training module comprises:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-4 when executing the computer program.

10. A computer-readable medium, in which a computer program is stored which, when being processed and executed, carries out the steps of the method according to any one of claims 1 to 4.