CN108197326B

CN108197326B - Vehicle retrieval method and device, electronic equipment and storage medium

Info

Publication number: CN108197326B
Application number: CN201810119550.3A
Authority: CN
Inventors: 彭湃; 郭晓威; 张有才
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-02-06
Filing date: 2018-02-06
Publication date: 2023-05-26
Anticipated expiration: 2038-02-06
Also published as: CN108197326A

Abstract

The present disclosure discloses a vehicle retrieval method and apparatus, an electronic device, and a computer-readable storage medium, the scheme including: acquiring a vehicle image to be retrieved; extracting features of the vehicle image to be retrieved through a convolutional neural network to obtain feature vectors; extracting attribute features and depth features between the positive and negative samples respectively for the feature vectors; combining the attribute features and the depth features to obtain vehicle visual features of the vehicle image to be retrieved; and carrying out matching calculation of vehicle visual characteristics between the images in the vehicle retrieval database and the vehicle images to be retrieved, and obtaining target images matched with the vehicle images to be retrieved. According to the technical scheme provided by the invention, the accuracy of vehicle retrieval is higher, and the vehicle retrieval efficiency is high because no manual inquiry is needed.

Description

Vehicle retrieval method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of image recognition, and in particular relates to a vehicle retrieval method and device, electronic equipment and a computer readable storage medium.

Background

The vehicle to be queried is searched in the mass monitoring video, and the method has important application in public security systems. Typically, the monitoring system will store hundreds or even thousands of monitoring videos, and searching for the vehicle to be queried in such huge videos requires a large amount of computation and a long time. The amount of computation and the time required are rapidly increased with the increase of the number of monitoring videos and the expansion of the search time range.

In the prior art, the following two schemes can be adopted for vehicle retrieval in a plurality of monitoring videos:

scheme one, manual search mode. Whether the vehicle to be inquired appears or not is checked by human eyes, and the method has high retrieval accuracy and lower retrieval efficiency. When the monitoring video to be searched is less, the method can be adopted, and when the monitoring video to be searched is more, a great deal of time is needed.

According to the second scheme, vehicle retrieval is carried out based on license plate information or based on one or more of manually defined markers including annual signs, ornaments and hanging decorations through a vehicle identification technology, and vehicle retrieval is carried out according to the marker characteristics. However, the suspicious vehicle has a high possibility of missing license plate information, and a license plate sleeving phenomenon can also exist, so that the vehicle retrieval cannot be completed when the license plate information is missing. Retrieval based on manually defined features has limitations such as frequent variability in trim of the vehicle, and poor accuracy in retrieving the vehicle based on these local features.

In conclusion, the vehicle searching scheme provided in the prior art is low in searching efficiency and low in searching accuracy.

Disclosure of Invention

In order to solve the problems of low retrieval efficiency and low retrieval accuracy of the vehicle retrieval scheme provided by the prior art, the present disclosure provides a vehicle retrieval method and device.

In one aspect, the present invention provides a vehicle retrieval method, the method comprising:

acquiring a vehicle image to be retrieved;

extracting the characteristics of the vehicle image to be searched through a convolutional neural network to obtain a characteristic vector;

extracting attribute features and depth features between the positive and negative samples respectively from the feature vectors;

combining the attribute features and the depth features to obtain vehicle visual features of the vehicle image to be retrieved;

and carrying out matching calculation of vehicle visual characteristics between the images in the vehicle retrieval database and the vehicle images to be retrieved, and obtaining target images matched with the vehicle images to be retrieved.

In an exemplary embodiment, the extracting the attribute feature and the depth feature between the positive and negative samples from the feature vector respectively includes:

and predicting the attribute category of the vehicle image to be searched for by the feature vector through a sample vehicle image and a category prediction branch model constructed by the attribute category marked by the sample vehicle image so as to obtain the attribute feature.

And carrying out depth feature extraction on the feature vector through a sample vehicle image and a positive and negative constraint branch model constructed by the corresponding positive and negative samples.

In an exemplary embodiment, before the extracting of the attribute feature and the extracting of the depth feature between the positive and negative samples, the method further includes:

acquiring a sample vehicle image and an attribute category marked by the sample vehicle image;

and performing neural network learning through the sample vehicle image and the marked attribute category to obtain a category prediction branch model for extracting the attribute features.

In an exemplary embodiment, the performing neural network learning through the sample vehicle image and the marked attribute category to obtain a category prediction branch model for performing the attribute feature extraction includes:

extracting a sample feature vector of the sample vehicle image through the convolutional neural network;

performing convolution calculation on the sample feature vector by constructing a plurality of full-connection layers, and extracting sample attribute features corresponding to the sample feature vector;

and optimizing weight parameters of the full-connection layer according to the attribute category marked by the sample vehicle image, so that the difference between the sample attribute characteristics and the marked attribute category is minimum, and the optimized full-connection layer forms the category prediction branch model.

acquiring a sample vehicle image and positive and negative samples corresponding to the sample vehicle image to form a sample triplet;

and performing neural network learning on the sample triplets by adopting a loss function suitable for matching calculation to obtain positive and negative constraint branch models for extracting the depth features.

In an exemplary embodiment, performing neural network learning on the sample triplets with respect to positive and negative samples using a loss function adapted to match a calculation to obtain positive and negative constraint branch models for performing the depth feature extraction, including:

and performing neural network learning of the depth feature extraction on the sample triplet, and obtaining a positive and negative constraint branch model for the depth feature extraction by optimizing feature distances between a sample vehicle image and positive and negative samples.

In an exemplary embodiment, the performing a matching calculation of the visual characteristics of the vehicle between the image in the vehicle search database and the image of the vehicle to be searched to obtain a target image matched with the image of the vehicle to be searched includes:

Comparing the vehicle visual characteristics of the images in the vehicle retrieval database with the vehicle images to be retrieved;

searching an image with highest similarity to the vehicle visual characteristics of the vehicle image to be searched in the vehicle searching database, and obtaining a target image matched with the vehicle image to be searched.

In another aspect, the present invention also provides a vehicle search device, including:

the image acquisition module is used for acquiring the vehicle image to be retrieved;

the vector obtaining module is used for extracting the characteristics of the vehicle image to be searched through a convolutional neural network to obtain characteristic vectors;

the feature extraction module is used for extracting attribute features and depth features between the positive and negative samples respectively for the feature vectors;

the feature merging module is used for merging the attribute features and the depth features to obtain the vehicle visual features of the vehicle images to be searched;

and the vehicle retrieval module is used for carrying out matching calculation of the visual characteristics of the vehicle between the image in the vehicle retrieval database and the image of the vehicle to be retrieved, and obtaining a target image matched with the image of the vehicle to be retrieved.

In an exemplary embodiment, the feature extraction module includes:

And the attribute feature extraction unit is used for carrying out attribute type prediction on the feature vector by using a sample vehicle image and a type prediction branch model constructed by the attribute type marked by the sample vehicle image so as to obtain attribute features.

In an exemplary embodiment, the feature extraction module includes:

and the depth feature extraction unit is used for extracting the depth features of the feature vectors through the sample vehicle image and the positive and negative constraint branch model constructed by the corresponding positive and negative samples.

In an exemplary embodiment, the apparatus further comprises:

the sample image acquisition module is used for acquiring a sample vehicle image and attribute categories marked by the sample vehicle image;

and the category branch training module is used for carrying out neural network learning through the sample vehicle image and the marked attribute category to obtain a category prediction branch model for extracting the attribute characteristics.

In an exemplary embodiment, the category branch training module includes:

a sample vector extraction unit for extracting a sample feature vector of the sample vehicle image through the convolutional neural network;

The sample feature extraction unit is used for carrying out convolution calculation on the sample feature vector by constructing a plurality of full-connection layers and extracting sample attribute features corresponding to the sample feature vector;

and the parameter optimization unit is used for optimizing the weight parameter of the full-connection layer according to the attribute category marked by the sample vehicle image, so that the difference between the sample attribute feature and the marked attribute category is minimum, and the optimized full-connection layer forms the category prediction branch model.

In an exemplary embodiment, the apparatus further comprises:

the triplet acquisition module is used for acquiring a sample vehicle image and positive and negative samples corresponding to the sample vehicle image to form a sample triplet;

and the positive and negative constraint branch training module is used for carrying out neural network learning on the positive and negative samples on the sample triples by adopting a loss function suitable for matching calculation to obtain positive and negative constraint branch models for carrying out depth feature extraction.

In one exemplary embodiment, the positive and negative constraint branch training module comprises:

and the feature distance optimizing unit is used for carrying out neural network learning of the depth feature extraction on the sample triplet, and obtaining a positive and negative constraint branch model for carrying out the depth feature extraction by optimizing the feature distance between the sample vehicle image and the positive and negative samples.

In one exemplary embodiment, a vehicle retrieval module includes:

the feature comparison unit is used for comparing the vehicle visual features of the images in the vehicle retrieval database with the vehicle images to be retrieved;

and the image searching unit is used for searching an image with highest similarity to the vehicle visual characteristics of the vehicle image to be searched in the vehicle searching database, and obtaining a target image matched with the vehicle image to be searched.

In another aspect, the present invention also provides an electronic device, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform any one of the vehicle retrieval methods described above.

In another aspect, the present invention also provides a computer readable storage medium storing a computer program executable by a processor to perform any one of the above-described vehicle retrieval methods.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

according to the technical scheme, the depth features and the attribute features of the vehicle image to be searched are extracted, and the target image matched with the vehicle image to be searched is searched from the vehicle searching database based on the vehicle visual features comprising the depth features and the attribute features. Compared with the prior art, the vehicle visual features contain more feature information of the vehicle images to be searched, so that the degree of distinction between the vehicle visual features and other vehicle images is higher, different vehicle images with the same attribute features can be distinguished, the target image similar to the vehicle images to be searched is searched based on the vehicle visual features, the accuracy is higher, and the vehicle searching efficiency is improved because no manual inquiry is needed.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic illustration of an implementation environment in accordance with the present disclosure;

FIG. 2 is a block diagram of a server shown in accordance with an exemplary embodiment;

FIG. 3 is a flowchart illustrating a method of vehicle retrieval according to an exemplary embodiment;

FIG. 4 is a flow chart of a vehicle retrieval method shown in another exemplary embodiment based on the corresponding embodiment of FIG. 3;

FIG. 5 is a detailed flow chart of step 402 of the corresponding embodiment of FIG. 4;

FIG. 6 is a schematic diagram of a model architecture for vehicle retrieval, shown in an exemplary embodiment;

FIG. 7 is a flow chart of a vehicle retrieval method according to another exemplary embodiment shown on the basis of the corresponding embodiment of FIG. 3;

FIG. 8 is a schematic diagram of a model architecture for vehicle retrieval, shown in an exemplary embodiment;

FIG. 9 is a detailed flow chart of step 350 of the corresponding embodiment of FIG. 3;

FIG. 10 is a schematic view of the effect of searching for a vehicle to be retrieved from a vehicle retrieval database using the vehicle retrieval method provided by the present invention;

FIG. 11 is a block diagram of a vehicle retrieval device, according to an exemplary embodiment;

fig. 12 is a block diagram of a vehicle retrieval device shown in another exemplary embodiment on the basis of the corresponding embodiment of fig. 11;

FIG. 13 is a detailed block diagram of a class branch training module in the corresponding embodiment of FIG. 12;

fig. 14 is a block diagram of a vehicle retrieval device shown in another exemplary embodiment on the basis of the corresponding embodiment of fig. 11;

fig. 15 is a detailed block diagram of the vehicle retrieval model in the corresponding embodiment of fig. 11.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

FIG. 1 is a schematic diagram illustrating an implementation environment in accordance with the present invention, according to an exemplary embodiment. The implementation environment to which the present invention relates includes a server 110. The vehicle searching database can be configured in the server 110, so that for a certain vehicle image to be searched, the server 110 can find out an image matched with the vehicle image to be searched from the vehicle searching database by adopting the vehicle searching method provided by the invention.

In a specific application, the vehicle searching method provided by the invention can be adopted to find out the picture matched with the vehicle image to be searched from the mass road monitoring pictures, so that the track tracking of the suspected vehicle can be carried out.

The implementation environment will also include data sources that provide data, i.e., vehicle images in the vehicle retrieval database and vehicle images to be retrieved, as desired. Specifically, in the present implementation environment, the data source may be the camera 120 or the mobile terminal 130 with a camera function. The server 110 is connected with the camera 120 or the mobile terminal 130 through a wireless network to acquire the vehicle image acquired by the camera 120 or the mobile terminal 130, so as to form a vehicle retrieval database. Moreover, for the vehicle image to be retrieved acquired by the camera 120 or the mobile terminal 130, the server 110 may find the vehicle image matching the vehicle image to be retrieved from the vehicle retrieval database by adopting the scheme provided by the invention.

It should be noted that the vehicle searching method provided by the present invention is not limited to the arrangement of the corresponding processing logic in the server 110, but may be the processing logic arranged in other machines. For example, processing logic for vehicle retrieval is deployed in a computing-capable terminal device.

Fig. 2 is a block diagram of a server 110, according to an example embodiment. The server 110 may vary considerably in configuration or performance and may include at least one central processing unit (central processing units, CPU) 222 (e.g., at least one processor) and memory 232, at least one storage medium 230 (e.g., at least one mass storage device) storing applications 242 or data 244. Wherein the memory 232 and storage medium 230 may be transitory or persistent. The program stored in the storage medium 230 may include at least one module (not shown in the drawing), and each module may include a series of instruction operations on the server. Still further, the central processor 222 may be configured to communicate with the storage medium 230 and execute a series of instruction operations in the storage medium 230 on the server 110. The server 110 may also include at least one power supply 226, at least one wired or wireless network interface 250, at least one input output interface 258, and/or at least one operating system 241, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like. The steps performed by the server 110 described in the embodiments shown in fig. 3, 4, 5, 7, and 9 below may be based on the structure of the server 110 shown in fig. 2.

Fig. 3 is a flow chart illustrating a vehicle retrieval method according to an exemplary embodiment. The vehicle search method may be performed by the server 110 in the implementation environment shown in fig. 1, and the vehicle search method shown in fig. 3 may be performed. As shown in fig. 3, the vehicle retrieval method may include the following steps.

In step 310, a vehicle image to be retrieved is acquired.

The vehicle image to be retrieved may be collected by the camera 120 and then sent to the server 110, or may be stored in a local database of the server 110 in advance. The invention aims to search the vehicle image with highest similarity with the vehicle from a large number of vehicle images, and is not limited to searching the vehicle images with the same vehicle type or the same color. The scheme provided by the invention can be used for a traffic management department to inquire pictures of the suspected vehicle in various scenes from the monitoring video and assist in judging the running track of the suspected vehicle.

In step 320, feature extraction of the vehicle image to be retrieved is performed through a convolutional neural network, so as to obtain a feature vector.

The convolutional neural network may be a deep convolutional neural network model which is trained in advance and has a classification module removed, such as an acceptance v3 model. The weight parameters of the deep convolutional neural network model may use weights trained by an image data set (the database name with the largest image recognition in the world at present), and the partial weights may not be updated later. The convolutional neural network can also be other trained picture classification networks, such as restNet, VGG, denseNet, as required.

The feature vector is a vector which is generated by carrying out convolution operation on the vehicle image to be searched and contains the feature information of the vehicle image to be searched.

Specifically, according to the pixel values of all the pixel points in the vehicle image to be searched, the weight parameters of the convolutional neural network trained in advance are utilized to carry out convolution operation on the pixel values of all the pixel points in the vehicle image to be searched. The convolutional neural network can comprise an input layer, a convolutional layer and a pooling layer, wherein the input layer is used for inputting the vehicle image to be retrieved, the convolutional layer is used for carrying out convolutional calculation on the vehicle image to be retrieved by using the weight parameters, and the pooling layer is used for carrying out dimension reduction on the image characteristics output by the convolutional layer. In one embodiment of the invention, 2048-dimensional feature vectors of the vehicle image to be retrieved are output at the last pooling layer of the convolutional neural network.

In step 330, the feature vector is extracted with an attribute feature and a depth feature between the positive and negative samples, respectively.

The attribute features comprise features in a face, a window, a vehicle type, a vehicle body color, a license plate bottom plate color, a annual inspection sign, a pendant, a placement object and the like of the vehicle to be searched. In one exemplary embodiment, the attribute features may be a model feature and a body color feature. The vehicle type can be a car type, an off-road vehicle type, a minibus type, a large bus type, a small truck type, a large truck type, a tank car type and the like. For example, assume that there are a total of 5 possible body colors: red, silver, gray, black, white. The body color of the vehicle image to be searched is silvery, and the body color feature extracted from the 2048-dimensional feature vector can be [0,1, 0] according to the mapping rule trained in advance. Assuming a model minibus for which a vehicle image is to be retrieved, model features [0,0,1,0,0,0,0] can be extracted from the 2048-dimensional feature vectors. The attribute features 0,1,0,0,0,0,0,1,0,0,0,0 are obtained by combining the body color features and the vehicle model features.

The positive and negative samples refer to positive samples and negative samples adopted when deep learning is performed, wherein the positive samples represent sample vehicle images with similarity, and the negative samples represent sample vehicle images with dissimilarity. The depth features of the vehicle image to be retrieved may represent the similarity between the vehicle image to be retrieved and positive and negative samples provided during the deep learning phase. If the Euclidean distance between the depth feature of the vehicle image to be retrieved and the depth feature of a certain sample vehicle image is very close, the vehicle image to be retrieved is similar to the sample vehicle image, so the depth feature of the vehicle image to be retrieved can be regarded as the similarity feature between the vehicle image to be retrieved and the sample vehicle image during deep learning.

Specifically, the depth feature extraction can perform convolution operation on the feature vector according to the trained parameters when the positive and negative samples are subjected to the deep learning, and the depth feature of the vehicle image to be searched is extracted. In an exemplary embodiment, three full-connection layers are connected after the last pooling layer of the convolutional neural network, convolutional calculation is sequentially carried out on 2048-dimensional feature vectors according to trained network parameters and bias values of each full-connection layer, and finally 256-dimensional depth features are output.

In step 340, the attribute features and the depth features are combined to obtain vehicle visual features of the vehicle image to be retrieved.

Wherein, merging the attribute features and the depth features may be stitching the attribute features and the depth features. For example, the attribute features are [1, 0], the depth features are [1,2,1,2], and the concatenation of the attribute features and the depth features may be the formation of [1,0,0,0,1,2,1,2] or [1,2,1,2,1,0,0,0]. The merging attribute feature and the depth feature may be addition or subtraction between vectors, as needed.

The vehicle visual features are features generated by combining the attribute features and the depth features, in an exemplary embodiment, 1024-dimensional attribute features are extracted from 2048-dimensional feature vectors through an attribute prediction branch model, 256-dimensional depth features are extracted from 2048-dimensional feature vectors through a positive and negative constraint branch model, and 1280-dimensional vehicle visual features are obtained by combining the attribute features and the depth features. And matching the vehicle visual characteristics with the vehicle visual characteristics of the images in the vehicle retrieval database. Compared with the prior art, the vehicle visual features contain more feature information of the vehicle images to be searched, so that the degree of distinction between the vehicle visual features of the vehicle images and the vehicle visual features of other vehicle images is stronger, and further, different vehicle images can be distinguished accurately by comparing the vehicle visual features.

It should be noted that, the searching of the target image is simply performed based on the color features of the vehicle body or the model features of the vehicle image to be searched, and only the vehicle image with the same color as the vehicle body of the vehicle image to be searched or the vehicle image with the same model as the vehicle image to be searched can be generally found, but the vehicle image of the specified vehicle in different scenes cannot be accurately found. For example, assuming that the vehicle body color of the vehicle to be searched is black and the vehicle type is a car, searching the vehicle image through the vehicle type feature and the vehicle body color feature can find all black car images, and the found vehicle image data volume is still large, so that the images of a suspected vehicle in other scenes can not be accurately found.

In step 350, a matching calculation of the vehicle visual characteristics between the image in the vehicle searching database and the vehicle image to be searched is performed, and a target image matched with the vehicle image to be searched is obtained.

The vehicle search database stores a large number of images in advance, and the images may be monitoring pictures collected by the road monitoring camera 120, and the road monitoring camera may take a large number of vehicle pictures passing through the road section and store the vehicle pictures in the vehicle search database of the server 110.

The matching calculation refers to similarity comparison of the vehicle visual features of the vehicle image to be retrieved with the vehicle visual features of each image in the vehicle retrieval database. And then the image with the highest similarity with the visual characteristics of the vehicle image to be searched is found out from the vehicle searching database. For example, the similarity between the vehicle image to be retrieved and the images in the vehicle retrieval database may be determined by calculating the euclidean distance between the vehicle visual features of the vehicle image to be retrieved and the vehicle visual features of each image in the vehicle retrieval database.

The target image refers to an image found from the vehicle retrieval database that is most similar to the image of the vehicle to be retrieved.

Specifically, the visual features of the vehicle corresponding to each image in the vehicle search database may be calculated in advance and stored in the vehicle search database. When the vehicle image to be searched is needed to be searched, the vehicle visual features of the vehicle image to be searched are compared with the vehicle visual features and the similarity of each image in the vehicle searching database, and the image with the highest similarity with the vehicle image to be searched is found out and used as the target image.

According to the technical scheme provided by the above-mentioned exemplary embodiment of the invention, the depth characteristics and the attribute characteristics of the vehicle image to be searched are extracted, and the target image matched with the vehicle image to be searched is searched from the vehicle searching database based on the vehicle visual characteristics comprising the depth characteristics and the attribute characteristics. Compared with the prior art, the vehicle visual features contain more feature information of the vehicle images to be searched, so that the degree of distinction between the vehicle visual features and other vehicle images is higher, different vehicle images with the same attribute features can be distinguished, the target image similar to the vehicle images to be searched is searched based on the vehicle visual features, the accuracy is higher, and the vehicle searching efficiency is improved because no manual inquiry is needed.

In an exemplary embodiment, the step 330 specifically includes:

Wherein the sample vehicle image refers to a vehicle image of a known attribute class for model training. The sample vehicle image may be a vehicle image stored in the vehicle search database or may be another vehicle image. The sample vehicle image may also be collected by the road monitoring camera 120 and stored in the server 110. The attribute categories include a body color category (e.g., one of red, silver, black, etc.) of the vehicle in the sample vehicle image, a model category (e.g., one of a car, van, passenger car), or other category.

The class prediction branch model is the network model following the convolutional neural network in step 320. Specifically, neural network learning of attribute feature extraction can be performed according to a plurality of sample vehicle images and corresponding attribute categories thereof, so as to generate a category prediction branch model. And inputting the feature vector of the vehicle image to be searched into the category prediction branch model, and outputting the attribute feature of the vehicle image to be searched.

In an exemplary embodiment, the step 330 may further include:

The positive and negative samples comprise a positive sample and a negative sample, wherein the positive sample refers to a vehicle image containing the same vehicle as the sample vehicle image, and the same vehicle refers to a specific vehicle instead of a vehicle of the same type with the same vehicle body color or vehicle type. Negative examples refer to vehicle images that contain a different vehicle than the sample vehicle image. For example, a vehicle image of the same type of vehicle system but different in color from the sample vehicle image, a vehicle image of the same color but different in vehicle type from the sample vehicle image, a vehicle image of another vehicle type and color, and the like. It should be noted that, the positive and negative samples of the labeled attribute categories may also be used as sample vehicle images to train the category prediction branch model.

The positive and negative constraint branch model is a network model which is connected after the convolutional neural network in the step 320, and the positive and negative constraint branch model and the category prediction branch model belong to two branches which are built after the convolutional neural network is subjected to multi-task learning. The positive and negative constraint branch model can be constructed in parallel with the class prediction branch model, or after the class prediction branch model is constructed, the positive and negative constraint branch model can be constructed again, and parameters of the class prediction branch model can be further optimized based on newly added positive and negative samples.

It should be noted that, according to the correlation between the sample vehicle image and the positive sample, the sample vehicle image and the negative sample are not correlated, and the sample vehicle image, the positive sample and the negative sample are subjected to order learning to construct a positive and negative constraint branch model. And inputting the feature vector of the vehicle image to be searched into the positive and negative constraint branch model to output the depth feature of the vehicle image to be searched. The depth features represent similarity relations between samples participating in training in constructing positive and negative constraint branch models. Assuming that the depth feature is 256 dimensions, the depth feature of each sample vehicle image can be mapped to obtain a point in a 256-dimension feature space, and the depth feature of the vehicle image to be retrieved can also be mapped to a corresponding point in the feature space, so that the distance between the corresponding points of the depth feature in the feature space can reflect the similarity between the corresponding vehicle images.

In an exemplary embodiment, as shown in fig. 4, the method provided by the present invention may further include the following steps before step 330:

in step 401, a sample vehicle image and a property category marked by the sample vehicle image are acquired.

The sample vehicle image refers to a vehicle image of a known attribute class for class prediction branch model training. The attribute category marked by the sample vehicle image may be obtained from tag information carried by the sample vehicle image.

In step 402, neural network learning is performed through the sample vehicle image and the labeled attribute categories, and a category prediction branch model for performing the attribute feature extraction is obtained.

The neural network learning is to learn attribute features by building a neural network architecture according to a sample vehicle image and corresponding attribute categories thereof, so that the difference between the learned attribute features and the attribute categories is minimized, and a category prediction branch model can be obtained according to the built network parameters of the neural network architecture. It should be noted that the neural network architecture includes a backbone convolutional neural network (an input layer, a convolutional layer, and a pooling layer) and a plurality of fully-connected layers. The optimized multi-layer full-connection layer is the category prediction branch model.

In an exemplary embodiment, as shown in fig. 5, the step 402 specifically includes:

in step 501, sample feature vectors of the sample vehicle image are extracted by the convolutional neural network.

The convolutional neural network refers to a deep convolutional neural network model without a classification module. Namely, the backbone convolutional neural network at least comprising an input layer, a convolutional layer and a pooling layer. The network parameters of the convolutional neural network can be obtained by training through an image data set in advance. The parameters of the trained convolutional neural network can be used when extracting the feature vector of the vehicle image to be retrieved or extracting the sample feature vector of the sample vehicle image.

The sample feature vectors of the sample vehicle image are similar to the feature vectors of the vehicle image to be retrieved, and all contain feature information of the corresponding vehicle image. Sample attribute features and sample depth features of the sample vehicle image may also be extracted from the sample feature vector.

In step 502, convolution calculation is performed on the sample feature vector by constructing multiple full connection layers, and sample attribute features corresponding to the sample feature vector are extracted.

As shown in fig. 6, feature vectors are extracted from a single input sample vehicle image through the convolutional neural network, and 2048-dimensional sample feature vectors are output at the last layer of the convolutional neural network. The two branches are respectively a category prediction branch model and a positive and negative constraint branch model after the feature vector is accessed.

The class prediction branch model may include a plurality of fully connected layers, as shown in fig. 6, where 2048-dimensional feature vectors fully connect the next 1024 neurons and then divide into two branches, a model prediction branch 601 and a body color prediction branch 602. In the

model prediction branch

601, 1024 neurons are fully connected with the subsequent n neurons, and classified by using the softmax activation layer. n represents the total number of vehicle model categories. Such as passenger cars, off-road vehicles, minibuses, buses, vans, and tank cars, are 7 types. In the body

color prediction branch

602, 1024 neurons are fully connected with m subsequent neurons and classified by a softmax activation layer, where m represents the number of body color classifications. For example red, grey, silver, black, white, for a total of 5 body color class numbers.

The sample feature vector of the sample vehicle image is convolved through network parameters of a plurality of fully connected layers, and sample attribute features are extracted, wherein the sample attribute features can comprise vehicle features output by a vehicle model prediction branch 601 and vehicle body color features output by a vehicle body color prediction branch 602.

In step 503, according to the attribute type marked by the sample vehicle image, the weight parameter of the fully connected layer is optimized, so that the difference between the sample attribute feature and the marked attribute type is minimized, and the optimized fully connected layer forms the type prediction branch model.

Specifically, the weight parameters of the multi-layer full-connection layer can be adjusted according to the attribute category marked by the sample vehicle image, for example, the vehicle body color category and the vehicle type category, so that the difference between the finally output sample attribute feature and the marked attribute category is minimized. The adjusted multi-layer full-connection layer and the softmax activation layer together form a category prediction branch model.

Wherein the loss function of the class prediction branch model may be cross entropy,

c is the number of categories, such as the number of 7 types of vehicle models, the number of vehicle body color categories in 5,/- >

To predict the probability that all samples belong to a certain class k, y _k In order to reduce the probability that all samples belong to the category k under the real condition, training is performed by reducing the error between the predicted result and the real result, namely J is made as small as possible during training.

In an exemplary embodiment, as shown in fig. 7, the method provided by the present invention may further include the following steps before the above step 330.

In step 701, a sample vehicle image and positive and negative samples corresponding to the sample vehicle image are obtained to form a sample triplet;

the positive and negative samples are referred to as positive samples of the same vehicle as the sample vehicle image, and the negative samples of different vehicles are referred to as negative samples of the sample vehicle image. It should be emphasized that the same vehicle as referred to in the present invention refers to a specific vehicle, and not to a general type of vehicle having the same color or type of vehicle. The positive and negative samples are a relative concept, and when the attribute prediction branch model is constructed, the positive and negative samples with marked attribute classifications can also be used as sample vehicle images to train the attribute prediction branch model.

It should be noted that, in order to ensure that there are enough positive and negative samples, the sampling method may be to first take a vehicle picture (Anchor), then ensure that there are at least 2 colors in the same vehicle system to which the anchor belongs, and the number of vehicles in the same vehicle system and the same color corresponding to the anchor is not less than 2, so as to obtain as many negative samples as possible. And the positive sample may be a picture taken by the vehicle in the anchor in other scenes. The positive sample, negative sample, and sample vehicle images constitute a sample triplet. Before the convolution operation of the convolutional neural network is performed on the sample triplets, all images can be scaled to a proper size, such as 299 pixels wide and 299 pixels high, and the scaling size of the images can be determined according to a subsequent convolutional neural network model.

In step 702, neural network learning of the positive and negative samples is performed on the sample triplets by using a loss function adapted to match calculation, so as to obtain positive and negative constraint branch models for extracting the depth features.

Wherein, the loss function adapted to the matching calculation refers to a principle that the loss function and the matching calculation are both based on feature approximation of similar images. For example, the loss function may be to calculate the distance between the depth features of the sample vehicle image and the depth features of the positive sample such that the distance is as small as possible, and calculate the distance between the depth features of the sample vehicle image and the depth features of the negative sample such that the distance is as large as possible. The matching calculation may be to calculate the distance between the image in the vehicle search database and the visual feature of the vehicle, and find the image with the minimum distance in the vehicle search database as the target image with the highest similarity with the image of the vehicle to be searched. The impairment function can be considered to be suitable for the matching calculation at this time.

Specifically, as shown in fig. 8, a sample vehicle image 801, a positive sample 802 and a negative sample 803 in a sample triplet are subjected to convolution operation through a convolution neural network, a sample feature vector (2048 dimensions) corresponding to each sample is output, three full-connection layers are subsequently connected to the convolution neural network, and the sample feature vector is subjected to convolution operation through the three full-connection layers, so that feature vectors of 1024 dimensions, 1024 dimensions and 256 dimensions are sequentially obtained. By adjusting the weight parameters of the multi-layer full-connection layer, the difference between the 256-dimensional feature vector to which the sample vehicle image 801 belongs and the 256-dimensional feature vector to which the positive sample 802 belongs is made as small as possible, and the difference between the 256-dimensional feature vector to which the sample vehicle image 801 belongs and the 256-dimensional feature vector to which the negative sample 903 belongs is made as large as possible. The positive and negative constraint branch model can be obtained through the optimized multi-layer full-connection layer. The 256-dimensional feature vector finally output is the depth feature of the corresponding input sample.

In an exemplary embodiment, the step 701 specifically includes:

The feature distance refers to a first distance between a depth feature learned by a sample vehicle image and a depth feature learned by a positive sample, and a second distance between the depth feature learned by the sample vehicle image and a depth feature learned by a negative sample, when the neural network is optimized, the first distance needs to be as small as possible, and the second distance is as large as possible so as to meet the characteristics that the similarity between the positive sample and the sample vehicle image is higher, and the similarity between the negative sample and the sample vehicle image is lower. In an exemplary embodiment, the feature distance may be a euclidean distance or a cosine distance.

In one exemplary embodiment, the loss function of the positive and negative constraint branch model may be

Wherein->

For a sample vehicle image, +.>

Depth features of the last output sample vehicle image of the positive and negative constraint branch model, < >>

Depth characteristic of positive sample, ++ >

Depth features that are negative samples; />

For the Euclidean distance between the sample vehicle image and the positive sample on the learned depth feature, ++>

The loss function ensures that the feature distance between the sample vehicle image and the positive sample is smaller than the feature distance between the sample vehicle image and the negative sample, i.e. the positive sample pair distance is as close as possible and the negative sample pair distance is as far as possible, so as to achieve the aim of sequencing learning. According to the invention, the network parameters of the neural network learning for carrying out depth feature extraction on the sample triples are adjusted, and specifically, the weight parameters of the multi-layer full-connection layers built by the positive and negative constraint branch models are adjusted, so that the loss function reaches the minimum value, and the positive and negative constraint branch models formed by the optimized multi-layer full-connection layers are obtained.

In an exemplary embodiment, as shown in fig. 9, the step 350 specifically includes:

in step 351, the image in the vehicle search database is compared with the vehicle image to be searched for to perform vehicle visual feature comparison.

Specifically, the visual features of the vehicle in the images in the vehicle search database may be extracted with reference to steps 320-340 of the present invention as described above. And then comparing the vehicle visual characteristics of the images in the vehicle retrieval database with the vehicle visual characteristics of the images of the vehicle to be retrieved. The comparison may be performed by calculating the Euclidean distance between the visual features of the vehicle being compared, the smaller the distance, the higher the similarity, and the closer the image of the vehicle to be retrieved.

In step 352, an image with highest similarity to the vehicle visual characteristics of the vehicle image to be searched is searched in the vehicle searching database, and a target image matched with the vehicle image to be searched is obtained.

Specifically, the image in the vehicle retrieval database with the minimum distance is screened out by comparing the distance between the image in the vehicle retrieval database and the vehicle visual feature to be retrieved, and the image is the image with the highest similarity with the vehicle image to be retrieved, namely the target image matched with the vehicle image to be retrieved.

Fig. 10 is a schematic view of the effect of searching for a vehicle 1001 to be searched from a vehicle search database 1002 by using the vehicle search method provided by the present invention. As shown in fig. 10, a picture, labeled "v", is found from the vehicle retrieval database 1002 that the image 1001 of the vehicle to be retrieved contains the same vehicle. Otherwise, labeled "? ". By adopting the vehicle searching method provided by the invention, the server 110 can find out the appointed vehicle from the mass monitoring pictures provided by the camera 120, so as to realize the vehicle searching by the graph, further track tracking of the appointed vehicle, reduce the workload of traffic management personnel and improve the efficiency of suspicious vehicle searching.

In conclusion, the vehicle retrieval method provided by the invention has high retrieval accuracy, can assist traffic management personnel to quickly inquire to obtain pictures of the appointed vehicle in other scenes, and users can find out the approximate vehicle from the vehicle retrieval database by only inputting one picture of the vehicle to be retrieved, so that the retrieval efficiency is high, and the labor cost is reduced. In addition, the model can be further optimized through the data which are continuously increased, and the vehicle retrieval accuracy is further improved.

The following are embodiments of the apparatus of the present disclosure that may be used to perform embodiments of the vehicle retrieval method performed by the server 110 of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method for searching vehicles of the present disclosure.

Fig. 11 is a block diagram illustrating a vehicle retrieval device that may be used in the server 110 of the implementation environment shown in fig. 1 to perform all or part of the steps of the vehicle retrieval method shown in any one of fig. 3, 4, 5, 7, 9, according to an exemplary embodiment. As shown in fig. 11, the vehicle retrieval device includes, but is not limited to: an image acquisition module 1110, a vector acquisition module 1120, a feature extraction module 1130, a feature combination module 1140, and a vehicle retrieval module 1150.

An image acquisition module 1110 for acquiring an image of a vehicle to be retrieved;

the vector obtaining module 1120 is configured to extract features of the vehicle image to be retrieved through a convolutional neural network, and obtain feature vectors;

the feature extraction module 1130 is configured to extract attribute features and depth features between the positive and negative samples of the feature vector respectively;

a feature merging module 1140, configured to merge the attribute feature and the depth feature to obtain a vehicle visual feature of the vehicle image to be retrieved;

the vehicle searching module 1150 is configured to perform matching calculation of vehicle visual features between an image in a vehicle searching database and the vehicle image to be searched, and obtain a target image matched with the vehicle image to be searched.

The implementation process of the functions and roles of each module in the above device is specifically detailed in the implementation process of the corresponding steps in the above vehicle searching method, and will not be described herein.

The image acquisition module 1110 may be, for example, a physical structure wired/wireless network interface 250 of fig. 2.

The vector obtaining module 1120, the feature extracting module 1130, the feature merging module 1140, and the vehicle retrieving module 1150 may also be functional modules for performing the corresponding steps in the vehicle retrieving method described above. It is to be understood that these modules may be implemented in hardware, software, or a combination of both. When implemented in hardware, these modules may be implemented as one or more hardware modules, such as one or more application specific integrated circuits. When implemented in software, the modules may be implemented as one or more computer programs executing on one or more processors, such as the program stored in memory 232 executed by central processor 222 of fig. 2.

In an exemplary embodiment, the feature extraction module 1130 includes:

In an exemplary embodiment, as shown in fig. 12, the apparatus further includes:

a sample image acquisition module 1210 for acquiring a sample vehicle image and an attribute category marked by the sample vehicle image;

the class branch training module 1220 is configured to perform neural network learning through the sample vehicle image and the labeled attribute class, and obtain a class prediction branch model for performing the attribute feature extraction.

In an exemplary embodiment, as shown in fig. 13, the category branch training module 1220 includes:

A sample vector extraction unit 1221 for extracting a sample feature vector of the sample vehicle image through the convolutional neural network;

the sample feature extraction unit 1222 is used for performing convolution calculation on the sample feature vector by constructing a plurality of full-connection layers, and extracting sample attribute features corresponding to the sample feature vector;

and a parameter optimization unit 1223, configured to optimize a weight parameter of the fully connected layer according to the attribute category marked by the sample vehicle image, so as to minimize a difference between the sample attribute feature and the marked attribute category, where the optimized fully connected layer forms the category prediction branch model.

In an exemplary embodiment, as shown in fig. 14, the apparatus further includes:

the triplet obtaining module 1410 is configured to obtain a sample vehicle image and positive and negative samples corresponding to the sample vehicle image, so as to form a sample triplet;

and a positive and negative constraint branch training module 1420, configured to perform neural network learning on the sample triplets by using a loss function adapted to match calculation, to obtain positive and negative constraint branch models for performing the depth feature extraction.

In one exemplary embodiment, the positive and negative constraint branch training module 1420 includes:

In one exemplary embodiment, as shown in FIG. 15, the vehicle retrieval module 1150 includes:

a feature comparison unit 1151, configured to compare the image in the vehicle retrieval database with the vehicle image to be retrieved for vehicle visual features;

an image searching unit 1152, configured to search the vehicle retrieval database for an image with highest similarity to the vehicle visual feature of the vehicle image to be retrieved, and obtain a target image matched with the vehicle image to be retrieved.

Optionally, the present disclosure further provides an electronic device, which may be used in the server 110 of the implementation environment shown in fig. 1, to perform all or part of the steps of the vehicle retrieval method shown in any one of fig. 3, 4, 5, 7, and 9. The electronic device includes:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the vehicle retrieval method described in the above exemplary embodiment.

The specific manner in which the processor of the electronic device performs the operations in this embodiment has been described in detail in relation to the embodiments of the vehicle retrieval method and will not be described in detail herein.

In an exemplary embodiment, a storage medium is also provided, which is a computer-readable storage medium, such as may be a transitory and non-transitory computer-readable storage medium including instructions. The storage medium stores a computer program executable by the central processor 222 of the server 110 to perform the vehicle retrieval method described above.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A vehicle retrieval method, the method comprising:

acquiring a vehicle image to be retrieved;

Performing neural network learning of depth feature extraction on the sample triples, and obtaining positive and negative constraint branch models for depth feature extraction by optimizing feature distances between sample vehicle images and positive and negative samples; wherein the positive and negative samples include positive and negative samples, the optimizing the feature distance between the sample vehicle image and the positive and negative samples includes: optimizing the neural network with the aim of reducing the feature distance between the depth features of the sample vehicle image and the depth features of the positive sample, and with the aim of increasing the feature distance between the depth features of the sample vehicle image and the depth features of the negative sample;

extracting attribute features of the feature vectors, and extracting depth features of the feature vectors through the positive and negative constraint branch models; the depth feature extraction of the feature vector through the positive and negative constraint branch model comprises the following steps: performing convolution operation on the feature vector through the positive and negative constraint branch model to obtain a depth feature, wherein the depth feature is a similarity feature between the vehicle image to be retrieved and the sample vehicle image during deep learning;

2. The method according to claim 1, wherein the extracting the attribute feature and the depth feature between the positive and negative samples from the feature vector respectively includes:

3. The method of claim 1, wherein prior to extracting the attribute features and extracting the depth features between the relatively positive and negative samples, respectively, the method further comprises:

4. A method according to claim 3, said learning through said sample vehicle image and said tagged attribute categories through neural networks to obtain category prediction branch models for said attribute feature extraction, comprising:

5. The method according to any one of claims 1-4, wherein the performing a matching calculation of vehicle visual features between an image in a vehicle search database and the vehicle image to be searched for, to obtain a target image matching the vehicle image to be searched for, comprises:

6. A vehicle retrieval device, the device comprising:

the positive and negative constraint branch training module is used for performing neural network learning of depth feature extraction on the sample triplet, and obtaining a positive and negative constraint branch model of depth feature extraction by optimizing feature distance between a sample vehicle image and positive and negative samples; wherein the positive and negative samples include positive and negative samples, the optimizing the feature distance between the sample vehicle image and the positive and negative samples includes: optimizing the neural network with the aim of reducing the feature distance between the depth features of the sample vehicle image and the depth features of the positive sample, and with the aim of increasing the feature distance between the depth features of the sample vehicle image and the depth features of the negative sample;

The feature extraction module is used for extracting attribute features of the feature vectors and extracting depth features of the feature vectors through the positive and negative constraint branch model; the depth feature extraction of the feature vector through the positive and negative constraint branch model comprises the following steps: performing convolution operation on the feature vector through the positive and negative constraint branch model to obtain a depth feature, wherein the depth feature is a similarity feature between the vehicle image to be retrieved and the sample vehicle image during deep learning;

the feature merging module is used for merging the attribute features and the depth features to obtain the vehicle visual features of the vehicle images to be retrieved;

7. The apparatus of claim 6, wherein the feature extraction module comprises:

8. The apparatus of claim 6, wherein the apparatus further comprises:

9. The apparatus of claim 8, the class branch training module comprising:

10. The apparatus of any one of claims 6-9, wherein the vehicle retrieval module comprises:

11. An electronic device, the electronic device comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the vehicle retrieval method of any one of claims 1-5.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program executable by a processor to perform the vehicle retrieval method according to any one of claims 1-5.