CN111008294A

CN111008294A - Traffic image processing and image retrieval method and device

Info

Publication number: CN111008294A
Application number: CN201811168510.4A
Authority: CN
Inventors: 赵一儒; 金仲明; 黄建强; 华先胜
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-10-08
Filing date: 2018-10-08
Publication date: 2020-04-14
Anticipated expiration: 2038-10-08
Also published as: CN111008294B

Abstract

The application discloses a traffic image processing method, which comprises the following steps: determining a traffic image to be inquired; acquiring an image data set to be retrieved for retrieving the traffic image to be queried according to the video traffic information; obtaining an image feature extraction model for extracting image retrieval features; extracting image retrieval characteristics of the traffic image to be inquired by using the image characteristic extraction model; and retrieving images similar to the traffic image to be queried in the image data set to be retrieved according to the image retrieval characteristics of the traffic image to be queried. So as to more accurately retrieve the traffic image.

Description

Traffic image processing and image retrieval method and device

Technical Field

The present application relates to the field of image processing, and in particular, to a traffic image processing method and apparatus, an electronic device, and a storage device. The application also relates to a commodity image processing method. The application also relates to an image retrieval method. The application also relates to a training method of the image feature extraction model.

Background

With the rapid development of computer vision technology and deep learning, the application of retrieving semantically similar images in an image library by using an image feature extraction model based on deep learning in traffic image processing is more and more important. At present, traffic images are processed by using a model obtained by common training, the intra-class distance of extracted image retrieval features is large, and similar images of the traffic images retrieved according to the image retrieval features are not accurate enough.

Disclosure of Invention

The application provides a traffic image processing method, which aims to more accurately retrieve traffic images.

The application provides a traffic image processing method, which comprises the following steps:

determining a traffic image to be inquired;

acquiring an image data set to be retrieved for retrieving the traffic image to be queried according to the video traffic information;

obtaining an image feature extraction model for extracting image retrieval features;

extracting image retrieval characteristics of the traffic image to be inquired by using the image characteristic extraction model;

and retrieving images similar to the traffic image to be queried in the image data set to be retrieved according to the image retrieval characteristics of the traffic image to be queried.

Optionally, the obtaining, according to the video traffic information, an image data set to be retrieved for retrieving the traffic image to be queried includes:

obtaining video data containing video traffic information;

acquiring at least one video frame from video data;

determining the at least one video frame as the image data set to be retrieved.

Optionally, the obtaining an image feature extraction model for extracting an image retrieval feature includes:

determining a network structure for training the image feature extraction model for extracting the image retrieval features; the network structure comprises an image feature extraction model and a difficult sample generator;

acquiring image features of image data used for training the image feature extraction model as training samples, and performing interference processing on the training samples by using a difficult sample generator to obtain countermeasure samples of the training samples;

training an initial image feature extraction model and an initial difficult sample generator according to the training sample and the confrontation sample to obtain a trained image feature extraction model;

and taking the trained image feature extraction model as the image feature extraction model for extracting the image retrieval features.

Optionally, the training samples are subjected to interference processing by using a difficult sample generator according to at least one of the following modes:

increasing the similarity between elements of the same category;

reducing the similarity between elements of different classes.

Optionally, the method further includes:

classifying the training samples and the confrontation samples using a discriminator;

determining the accuracy of a classification result according to the preset class to which the training sample belongs and the preset class to which the confrontation sample belongs;

the training of the initial image feature extraction model and the initial difficult sample generator according to the training sample and the confrontation sample to obtain the trained image feature extraction model comprises the following steps:

and training an initial discriminator, an initial image feature extraction model and an initial difficult sample generator according to the accuracy of the training sample, the confrontation sample and the classification result to obtain a trained image feature extraction model.

Optionally, the training an initial discriminator, an initial image feature extraction model, and an initial difficult sample generator according to the accuracy of the training sample, the challenge sample, and the classification result to obtain a trained image feature extraction model includes: keeping the class of the confrontation sample consistent with the class of the training sample in the training process.

Optionally, the training an initial discriminator, an initial image feature extraction model, and an initial difficult sample generator according to the accuracy of the training sample, the challenge sample, and the classification result to obtain a trained image feature extraction model includes:

determining an interference cost function for training the initial difficult sample generator;

training the initial difficult sample generator by minimizing the interference cost function;

determining a target cost function for training the initial image feature extraction model;

and training the initial image feature extraction model by minimizing the target cost function to obtain a trained image feature extraction model.

Optionally, the method further includes:

extracting the image characteristics of the image to be inquired by using the trained image characteristic extraction model; wherein the intra-class distance of the image feature is smaller than an intra-class distance threshold, and the intra-class distance is a similarity distance between an anchor point and an element belonging to the same class as the anchor point.

Optionally, the method further includes:

determining a classification cost function for training an initial discriminator;

training the initial arbiter by minimizing the classification cost function.

Optionally, the classification cost function includes:

the discriminator judges a loss value for a true class of the training sample classification and the discriminator discriminates a false class of the countermeasure sample judgment loss value.

Optionally, the target cost function includes:

the discriminator judges a loss value and a target loss value for the real category of the training sample classification, wherein the target loss value comprises the intra-class spacing of the countermeasure sample, the inter-class spacing of the countermeasure sample and a similarity distance threshold, and the inter-class spacing is the similarity distance between the anchor point and the elements belonging to different categories with the anchor point.

Optionally, the interference cost function includes a countermeasure loss value and a category consistent loss value; wherein the challenge loss values comprise intra-class spacing, inter-class spacing, and similarity distance thresholds for challenge samples generated by the difficult sample generator, the class-consistent loss values comprising softmax loss values for keeping the class of the challenge samples and the class of the training samples consistent during a training process.

Optionally, the method further includes:

acquiring the image characteristics of the image to be inquired by using the trained image characteristic extraction model;

and retrieving images associated with the images to be inquired from the image sets to be retrieved according to the image characteristics of the images to be inquired.

Optionally, the image data is an image triple including an anchor point, an element belonging to the same category as the anchor point, and an element belonging to a different category from the anchor point;

the training sample is an image feature triple comprising the image feature of the anchor point, the image feature of the element belonging to the same category as the anchor point and the image feature of the element belonging to different categories as the anchor point;

the countermeasure sample is an interference feature triplet containing post-interference features of the anchor point, post-interference features of elements belonging to the same category as the anchor point, and post-interference features of elements belonging to a different category from the anchor point.

The application also provides a commodity image processing method, which comprises the following steps:

determining an image of a commodity to be inquired;

acquiring an image data set to be retrieved for retrieving the commodity image to be queried;

extracting image retrieval characteristics of the commodity image to be inquired by using the image characteristic extraction model;

and retrieving images similar to the images of the commodities to be inquired in the image data set to be retrieved according to the image retrieval characteristics of the images of the commodities to be inquired.

The application also provides an image retrieval method, which comprises the following steps:

determining an image to be queried;

determining an image set to be retrieved;

extracting image retrieval characteristics of the image to be inquired by using the image characteristic extraction model;

and searching out images similar to the image to be inquired in the image set to be searched according to the image searching characteristics.

The application also provides a training method of the image feature extraction model, which comprises the following steps:

obtaining image data for training an image feature extraction model;

extracting image features of the image data as training samples, and performing interference processing on the training samples by using a difficult sample generator to obtain confrontation samples of the training samples;

and training an initial image feature extraction model and an initial difficult sample generator according to the training sample and the confrontation sample to obtain a trained image feature extraction model.

The present application further provides a training device for an image feature extraction model, including:

the query image determining unit is used for determining the traffic image to be queried;

the retrieval image set determining unit is used for obtaining an image data set to be retrieved for retrieving the traffic image to be queried according to the video traffic information;

an image feature extraction model acquisition unit configured to acquire an image feature extraction model for extracting an image retrieval feature;

the retrieval feature extraction unit is used for extracting the image retrieval features of the traffic image to be inquired by using the image feature extraction model;

and the retrieval unit is used for retrieving images similar to the traffic image to be queried in the image data set to be retrieved according to the image retrieval characteristics of the traffic image to be queried.

The present application further provides an electronic device, comprising:

a memory, and a processor;

the memory is to store computer-executable instructions, and the processor is to execute the computer-executable instructions to:

determining a traffic image to be inquired;

The present application further provides a storage device storing instructions that can be loaded by a processor and perform the steps of:

determining a traffic image to be inquired;

Compared with the prior art, the method has the following advantages:

according to the traffic image processing method, the image retrieval features of the traffic image to be inquired are extracted by using the image feature extraction model, and the intra-class distance of the image retrieval features is small, so that the traffic image to be inquired can be more accurately retrieved in the image data set to be retrieved obtained based on the video traffic information, and the problem that the traffic image retrieval is not accurate enough is solved.

According to the image feature extraction model training method, the countercheck sample is obtained by performing interference processing on the training sample through the difficult sample generator, the image feature extraction model after training is obtained by training the initial image feature extraction model and the initial difficult sample generator through the countercheck sample, the difficult sample does not need to be selected in the training sample, and therefore the problem that the training time is prolonged due to sample selection is solved.

Drawings

Fig. 1 is a processing flow chart of a traffic image processing method according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of an image feature extraction model training method according to a first embodiment of the present application;

FIG. 3 is a flowchart of a training method for an image feature extraction model according to a first embodiment of the present application;

FIG. 4 is a schematic diagram of an optimization effect of using an image feature extraction model obtained by training with the method to extract image features according to a first embodiment of the present application;

fig. 5 is a processing flow chart of a commodity image processing method according to a second embodiment of the present application;

FIG. 6 is a flowchart of an image retrieval method according to a third embodiment of the present application;

fig. 7 is a schematic view of a traffic image processing apparatus according to a fifth embodiment of the present application;

fig. 8 is a schematic diagram of an electronic device provided herein.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The application provides a traffic image processing method and device, electronic equipment and storage equipment. The application also provides a commodity image processing method. The application also provides an image retrieval method. The application also provides an image feature extraction model training method. Details are described in the following examples one by one.

The first embodiment of the present application provides a traffic image processing method. The traffic image processing method is described below with reference to fig. 1 to 4.

The traffic image processing method shown in fig. 1 includes: step S101 to step S105.

And step S101, determining a traffic image to be inquired.

The traffic image processing method provided by the embodiment of the application utilizes the image feature extraction model based on deep learning obtained through confrontation training to search the traffic image. The image retrieval can be to input an image or picture to be queried, search images with similar or similar semantics in an image library or image set to be retrieved, and actually can be used in traffic image processing scenes, for example, in video traffic information, to perform same-example matching on images of people, vehicles and non-motor vehicles to be queried to acquire information of flow tracks of vehicles and people.

The step is to determine the traffic image to be inquired. The traffic image to be queried may be an image of a vehicle, a person, a non-motor vehicle.

And S102, acquiring an image data set to be retrieved for retrieving the traffic image to be queried according to the video traffic information.

The method comprises the step of determining a traffic image data set for retrieving an image to be queried.

At present, a large amount of video acquisition equipment is widely arranged in road traffic systems, particularly urban road networks, to acquire road network information. For example, a ball machine capable of rotating 360 degrees is arranged at the intersection to acquire information in each direction of the intersection; a fixed camera bolt is generally arranged on a general road section and is used for monitoring the traffic condition in a specific direction on the road section. The video traffic information collected by all the video collecting devices can be transmitted to a traffic control center through a network, so that a traffic management department can master the road and network traffic condition in real time. Such video traffic information is also typically stored via a data storage device.

In the embodiment of the present application, the source of the image data set to be retrieved includes a camera covered in a road network. The traffic image data set for retrieving the image to be queried can be obtained by analyzing the video traffic information acquired by the high-coverage camera in the city. Specifically, a to-be-retrieved image data set for retrieving the to-be-queried traffic image is obtained through the following processing:

obtaining video data containing video traffic information;

acquiring at least one video frame from video data;

determining the at least one video frame as the image data set to be retrieved.

In step S103, an image feature extraction model for extracting an image retrieval feature is obtained.

The image feature extraction model adopted in the embodiment of the application is an image feature extraction model based on countermeasure training, and comprises the following steps: and constructing and generating a confrontation network aiming at the initial image feature extraction model based on deep learning, training the initial image feature extraction model based on deep learning in a confrontation training mode, and taking the trained image feature extraction model based on deep learning as the image feature extraction model for extracting the image retrieval features. In one embodiment, the image feature extraction model for extracting the image retrieval features is obtained by:

Specifically, a difficulty sample generator is introduced into the training of the image feature extraction model to form a generation countermeasure Network (GAN) to perform countermeasure training on the initial image feature extraction model and the initial difficulty sample generator to obtain a trained image feature extraction model, and the trained image feature extraction model is used for extracting image retrieval features of the traffic image to be queried, so that images similar to the image to be queried are retrieved in the image data set to be retrieved according to the image retrieval features, including retrieving images with similar or similar semantics. Searching the image similar to the image to be inquired comprises searching the image library or the image set to be searched for the image with the similarity distance meeting the distance threshold value with the image characteristic as the image similar to or similar to the image to be inquired in a semantic meaning mode.

For ease of understanding, the following similarity distances will be described first. The similarity distance is an index for measuring the similarity of features or feature vectors, and may be a euclidean distance. For example, for vectors x1 and x2, then the euclidean distance may be expressed as:

in the embodiment of the application, the training of the initial image feature extraction model and the initial difficult sample generator forms confrontation training. Taking input x as an example, taking F as an initial image feature extraction model to be trained, extracting x features through F to obtain a feature vector, expressing the feature vector as F (x), expressing an initial difficult sample generator to be trained through G, and generating countermeasures through G (F (x)) by expanding the similarity distance of the feature vectors of the same category and simultaneously reducing the similarity distance of the feature vectors of different categories.

Furthermore, the training of the difficult sample generator is restricted to obtain a reliable difficult sample generator through training, and finally a better image feature extraction model is trained and a discriminator is introduced. The discriminator is used for classifying the image features, for example, the number of classes is K +1, that is, K real classes and 1 pseudo class, classifying f (x) to the corresponding real class, and G (f (x)) to the pseudo class.

In a specific implementation manner of the embodiment of the present application, a network structure for training the image feature extraction model includes a difficult sample generator and a discriminator. Referring to fig. 2, fig. 2 shows a schematic block diagram of a system including a difficult sample generator and an arbiter. The input image 201 in the figure represents the input image data for training, which may be an image triplet, where < a, p, n > represents the image triplet, a is an anchor point (anchor), p and a belong to the same category (positive), and n and a belong to different categories (negative). The image feature extraction model 202 may be a depth convolution network, and maps an input image into a feature vector, that is, extracts an image feature of an image triplet, and if F denotes the image feature extraction model 202, the image feature < F (a), F (p), and F (n) >, is obtained for the image triplet < a, p, n >. If the difficulty sample generator 203 is represented by G, the obtained countermeasure characteristics are < G (f (a)), G (f (p)), and G (f (n)) >. The discriminator 204 is denoted by D. The FC block 205 in the figure represents the fully connected layer of the convolutional network for converting multidimensional vectors into one-dimensional vectors. And training the image feature extraction model based on deep learning by adopting a triple cost function (triple Loss) as a Loss function. The Triplet cost function may be Triplet Loss max (d (f (a), f (p)) -d (f (a), f (n)) + m,0) which is expected to have a similarity distance between features of the same category < a, p > that is closer than the similarity distance between features of different categories < a, n >. Where m is the distance threshold. In the training process of the embodiment of the application, G and F form a confrontation pair, and the image feature confrontation difficulty feature is extracted through competitive learning; d and G form another countermeasure pair, and training G generates image features which are in a false-true mode and keeps the class consistent.

In the embodiment of the application, the image feature extraction model for extracting the image retrieval features is obtained based on a confrontation training mode. Please refer to fig. 3 for the processing flow of the countertraining mode. FIG. 3 shows a process of training the image feature extraction model, including: step S301 to step S303.

Step S301, image data for training an image feature extraction model is obtained.

This step is to acquire image data for training.

In the embodiment of the application, the acquired image triple information is image triple information. The image data is an image triplet including an anchor point, elements belonging to the same category as the anchor point, and elements belonging to a different category from the anchor point. For example, an image triplet is denoted by < a, p, n >, a being an anchor point (anchor), p and a belonging to the same category (positive), n and a belonging to different categories (negative).

Step S302, extracting image features of the image data as training samples, and performing interference processing on the training samples by using a difficult sample generator to obtain countermeasure samples of the training samples.

This step is to obtain the confrontation sample of the training sample.

In the embodiment of the present application, the training sample is an image feature triple including an image feature of an anchor point, an image feature of an element belonging to the same category as the anchor point, and an image feature of an element belonging to a different category from the anchor point.

In the embodiment of the application, the training samples are subjected to interference processing by using the difficult sample generator according to at least one of the following modes:

increasing the similarity between elements of the same category;

reducing the similarity between elements of different classes.

Considering that the image feature extraction model to be trained can be arbitrarily manipulated without constraints, for example, designing a difficult sample generator to generate random vectors can also train a convergence network but cannot train a better image feature extraction model. Therefore, the discriminator is also introduced in the embodiment of the application. Classifying the training samples and the confrontation samples using a discriminator; and determining the accuracy of the classification result according to the preset class to which the training sample belongs and the preset class to which the confrontation sample belongs. The specific training process comprises the following steps:

The method has the advantages that the common training samples are changed into the difficult samples by using the difficult generator, the time for selecting the samples in the training process is reduced, and meanwhile, the robustness to noise is achieved.

Step S303, training an initial image feature extraction model and an initial difficult sample generator according to the training sample and the confrontation sample to obtain a trained image feature extraction model.

In the step, an image feature extraction model is obtained through training.

In the embodiment of the application, training cost functions of the training initial discriminator, the initial image feature extraction model and the initial difficult sample generator are constructed, and the training processes of the three are combined to obtain the trained image feature extraction model. When the image feature extraction model is trained, ensuring that the training samples are classified into real classes by the discriminator while enabling or enabling countermeasures samples obtained after the interference of the difficult sample generator to meet a triple cost function; when the difficult sample generator is trained, enabling or enabling the confrontation samples obtained after the difficult sample generator is interfered to not meet the triple cost function as much as possible, and enabling the discriminator to classify the confrontation samples into pseudo categories as much as possible; in training the discriminator, the discriminator is caused to classify the training samples into true classes and the countermeasure samples into false classes. The method specifically comprises the following steps:

An exemplary implementation of the embodiments of the present application further includes:

training the initial arbiter by minimizing the classification cost function.

The classification cost function includes: the discriminator judges a loss value for a true class of the training sample classification and the discriminator discriminates a false class of the countermeasure sample judgment loss value.

The classification cost function is expressed by the following formula:

L_D＝L_D，real+βL_D，fake(ii) a (formula 1)

Wherein L is_DRepresenting a classification penalty of the discriminator; l is_D，realA true class judgment loss value representing the training sample classification by the discriminator; l is_D，fakeA pseudo class judgment loss value indicating that the discriminator discriminates the confrontation sample, and β a pseudo class judgment loss coefficient.

Further, image triplets are used<a，p，n>Examples of (3). The image feature extraction model consists of<a，p，n>Extracting image features<F(a)，F(p)，F(n)>Generating interference signatures by difficult sample generators<G(F(a))，G(F(p))，G(F(n))>The discriminator is used forClassifying image and interference features, e.g. number of classes K +1, i.e. K true classes and 1 false class, will<F(a)，F(p)，F(n)>Classifying into corresponding real categories<G(F(a))，G(F(p))，G(F(n))>A pseudo category is classified.<a，p，n>Class labels of<l_a，l_p，l_n>And in the presence of l_a＝l_p，l_a≠l_n. The training goal for the discriminator is to minimize the classification cost function shown in (equation 1).

Wherein L is_D，realCan be expressed by the following formula:

wherein L is_D；realDetermining losses for the true class;

L_sm() Represents the softmax loss function;

d (x) represents the classification of x by the discriminator;

f (a) representing image features of an image anchor point a,/_aA category label representing an image anchor point a;

f (p) image features, l, representing elements of the same class of image anchor points a_pA category label representing the same category element of the image anchor point a;

f (n) image features, l, representing different classes of elements of the image anchor a_nClass labels representing different class elements of the image anchor a.

L_D，fakeCan be expressed by the following formula:

wherein L is_D；fakeDetermining a loss for the pseudo category;

L_sm() Represents the softmax loss function;

d (x) represents the classification of x by the discriminator;

g (F (a)) represents that the image feature of the image anchor point a is interfered by a difficult generator to obtainResulting anchor point characteristics of the challenge sample,/_fakeA category label representing the confrontation sample;

g (F (p)) represents anchor point homogeneous characteristics of the countermeasure sample obtained after the image characteristics of the homogeneous elements of the image anchor point a are interfered by using a difficulty generator;

g (F (n)) represents different types of anchor point features of the countermeasure samples obtained after the image features of different types of elements of the image anchor point a are interfered by a difficulty generator.

The target cost function includes:

The target cost function is expressed by the following formula:

L_F＝L_F，tri+μL_D，real(ii) a (formula 4)

Wherein L is_FRepresenting a feature extraction loss of the image feature extraction model; l is_F，triRepresenting the target loss; l is_D，realRepresenting the real class judgment loss value of the training sample classified by the discriminator; μ denotes a true class judgment loss coefficient.

Wherein L is_D，realMay be as described above (equation 3).

Wherein L is_F，triCan be expressed by the following formula:

L_F；tri＝[d(G(F(a))，G(F(p)))-d(G(F(a))，G(F(n)))+m]₊(ii) a (formula 5)

Wherein a is an anchor point of the image data, n is an element belonging to the same category as the anchor point, and p is an element belonging to a different category from the anchor point;

f (a) is the image characteristic of the anchor point a, F (n) is the image characteristic of n, and F (p) is the image characteristic of p;

g (F (a)) is an interfered characteristic of a corresponding anchor point of the antagonistic sample after interference by the difficult generator, G (F (n)) is an interfered characteristic of a corresponding n of the antagonistic sample after interference by the difficult generator, and G (F (p)) is an interfered characteristic of a corresponding p of the antagonistic sample after interference by the difficult generator;

d (G (F (a))), G (F (n))) is the inter-class spacing of the features after interference, d (G (F (a))), G (F (p))) is the intra-class spacing of the features after interference;

m is a distance threshold.

The interference cost function comprises a counter-measure loss value and a category-consistent loss value; wherein the confrontation loss values comprise intra-class spacing, inter-class spacing, and similarity distance thresholds of confrontation samples generated by the difficult sample generator, and the class consistency loss values comprise softmax loss values (softmax loss) for keeping the classes of the confrontation samples and the classes of the training samples consistent during training.

In the embodiment of the application, in the process of training the initial discriminator, the initial image feature extraction model and the initial difficult sample generator according to the accuracy of the training samples, the confrontation samples and the classification results, the class consistency loss value is used as a constraint factor of the training difficult sample generator, so that the constraint is added to the difficult sample generator in the training process to keep the class of the confrontation samples consistent with the class of the training samples, and the image feature extraction model capable of obtaining more effective image retrieval performance can be obtained.

The interference cost function is expressed by the following formula:

L_G＝L_G，tri+γL_G，cls(ii) a (formula 6)

Wherein L is_GInterference loss representing a difficult sample generator; l is_G，triRepresenting the challenge loss value; l is_G，clsRepresenting a category consistent loss value; γ represents the class consistent loss coefficient.

The category uniform loss value can be expressed by the following formula:

L_sm() Represents the softmax loss function;

d (x) represents the classification of x by the discriminator.

The resistance loss value can be expressed by the following equation:

L_G，tri＝＝[d(G(F(a))，G(F(n)))-d(G(F(a))，G(F(p)))+m]₊(ii) a (formula 8)

m is a distance threshold.

And step S104, extracting the image retrieval characteristics of the traffic image to be inquired by using the image characteristic extraction model.

The method comprises the following steps of extracting image retrieval characteristics of the traffic image to be inquired.

In the embodiment of the application, the image characteristics of the image to be queried are obtained by using the image characteristic extraction model trained based on the countertraining mode. So as to further retrieve the image associated with the image to be inquired from the image set to be retrieved according to the image characteristics of the image to be inquired.

And S105, retrieving images similar to the traffic image to be queried in the image data set to be retrieved according to the image retrieval characteristics of the traffic image to be queried.

The step is that images which are similar to or similar to the semanteme of the traffic image to be inquired are searched in the image data set.

In the embodiment of the application, the image characteristics of the image to be inquired are obtained by using the image characteristic extraction model trained based on the countertraining mode; the intra-class distance of the image features is smaller than an intra-class distance threshold, wherein the intra-class distance is a similarity distance between an anchor point and an element belonging to the same class as the anchor point, so that the traffic image can be more accurately retrieved, and the image retrieval performance is better. Referring to fig. 4, fig. 4 shows an optimization effect of an image feature extraction model obtained by training with the method for extracting image features, which is included in an embodiment of the present application. The effect 401 of extracting features by using the image feature extraction model obtained by common training is that the features extracted by the image feature extraction model without adding countertraining are shown that the intra-class distance is larger; the interference variation intra-class spacing through the difficult sample generator is further expanded as shown by the effect 402 of the difficult sample generator generated sample features obtained by the countertraining in the figure; the feature class extracted by the image feature extraction model trained under the countermeasure training method provided by the embodiment of the application has a small inner distance as shown in an effect 403 of extracting features by using the image feature extraction model obtained by common training in the figure, so that the traffic image can be more accurately retrieved, and the image retrieval performance is better.

Based on the first embodiment of the present application, a second embodiment of the present application provides a commodity image processing method. The product image processing method will be described below with reference to fig. 5. Since the first embodiment is taken as a basis, the second embodiment is relatively simple to describe, and please refer to the corresponding description of the first embodiment for related parts.

The commodity image processing method shown in fig. 5 includes: step S501 to step S505.

Step S501, determining an image of a commodity to be inquired.

The commodity image processing method provided by the embodiment of the application utilizes the image feature extraction model based on deep learning obtained through confrontation training to search commodity images. At present, the method for searching the commodities similar or similar to the specific commodity image on the e-commerce platform according to the specific commodity image is a common scene, and the similar commodity image can be provided for a consumer so that the consumer can select the commodity image, and the online shopping experience of the consumer is improved.

The step is to determine the commodity image to be inquired. Further, information of the commodity image to be queried, such as name information and label information, can be acquired.

Step S502, obtaining an image data set to be retrieved for retrieving the commodity image to be queried.

The method comprises the step of determining a retrieval image data set of the commodity image to be queried.

In the embodiment of the application, the image data set to be retrieved is determined according to the label information of the commodity to be queried. For example, after determining the commodity image to be queried, an SPU (standardized product Unit) or SKU (Stock Keeping Unit) of the commodity corresponding to the commodity image to be queried is obtained, and similar commodity images are retrieved under the same SPU or SKU, so as to further obtain information of similar commodities in the following.

In step S503, an image feature extraction model for extracting an image search feature is obtained.

In an embodiment of the present application, in a specific implementation manner, the Network structure is a generation countermeasure Network (GAN) including an image feature extraction model and a difficulty sample generator. The method comprises the steps of carrying out countermeasure training on an initial image feature extraction model and an initial difficult sample generator to obtain a trained image feature extraction model, using the trained image feature extraction model to extract image retrieval features of a commodity image to be queried, so as to retrieve images similar to the image to be queried in the image data set to be retrieved according to the image retrieval features, wherein the images with similar or similar semantics are retrieved. Searching the image similar to the image to be inquired comprises searching the image library or the image set to be searched for the image with the similarity distance meeting the distance threshold value with the image characteristic as the image similar to or similar to the image to be inquired in a semantic meaning mode. The similarity distance is an index for measuring the similarity of features or feature vectors, and may be a euclidean distance.

In an implementation of the present application, in another specific embodiment, the network structure is a generation countermeasure network including an image feature extraction model, a difficulty sample generator, and a discriminator. And (3) constraining the training of the difficult sample generator by a discriminator to train and obtain a reliable difficult sample generator, and finally training a better image feature extraction model. The discriminator is used for classifying the image features, for example, the number of classes is K +1, that is, K real classes and 1 pseudo class, classifying f (x) to the corresponding real class, and G (f (x)) to the pseudo class.

And step S504, extracting the image retrieval characteristics of the to-be-inquired commodity image by using the image characteristic extraction model.

The method comprises the following steps of extracting image retrieval characteristics of an image of a commodity to be inquired.

And step S505, retrieving images similar to the images of the commodities to be inquired in the data set of the images to be retrieved according to the image retrieval characteristics of the images of the commodities to be inquired.

In the embodiment of the application, the image characteristics of the image to be inquired are obtained by using the image characteristic extraction model trained based on the countertraining mode; the intra-class distance of the image features is smaller than an intra-class distance threshold, wherein the intra-class distance is a similarity distance between an anchor point and an element belonging to the same class as the anchor point, so that the commodity image can be more accurately retrieved, and the image retrieval performance is better.

Based on the first embodiment of the present application, a third embodiment of the present application provides an image retrieval method. The image search method is described below with reference to fig. 6. Since the first embodiment is taken as a basis, the third embodiment is relatively simple to describe, and please refer to the corresponding description of the first embodiment for related parts.

The image retrieval method shown in fig. 6 includes: step S601 to step S605.

Step S601, determining an image to be queried.

The image retrieval method provided by the embodiment of the application utilizes the image feature extraction model based on deep learning obtained through countertraining to perform image retrieval. The image retrieval can be to input an image or picture to be queried, search images with similar or similar semantics in an image library or image set to be retrieved, and can be used in traffic image processing, security image processing, commodity image processing and other scenes in practice, for example, to perform uniform matching on images of people, vehicles and non-motor vehicles to be queried in video traffic information to acquire vehicle and personnel flow track information.

The step is to determine the image to be inquired.

Step S602, determining an image set to be retrieved.

The method comprises the step of determining an image data set to be retrieved for retrieving an image to be queried.

Specifically, the image retrieval range may be determined in different ways according to the application scene.

In step S603, an image feature extraction model for extracting an image retrieval feature is obtained.

In an embodiment of the present application, in a specific implementation manner, the Network structure is a generation countermeasure Network (GAN) including an image feature extraction model and a difficulty sample generator. The method comprises the steps of carrying out countermeasure training on an initial image feature extraction model and an initial difficult sample generator to obtain a trained image feature extraction model, using the trained image feature extraction model to extract image retrieval features of an image to be queried, so as to retrieve images similar to the image to be queried in the image data set to be retrieved according to the image retrieval features, wherein the images with similar or similar semantics are retrieved. Searching the image similar to the image to be inquired comprises searching the image library or the image set to be searched for the image with the similarity distance meeting the distance threshold value with the image characteristic as the image similar to or similar to the image to be inquired in a semantic meaning mode. The similarity distance is an index for measuring the similarity of features or feature vectors, and may be a euclidean distance.

Step S604, extracting the image retrieval characteristics of the image to be inquired by using the image characteristic extraction model.

The method comprises the following steps of extracting image retrieval characteristics of an image to be inquired.

And step S605, retrieving images similar to the image to be inquired in the image set to be retrieved according to the image retrieval characteristics.

In the embodiment of the application, the image characteristics of the image to be inquired are obtained by using the image characteristic extraction model trained based on the countertraining mode; the intra-class distance of the image features is smaller than an intra-class distance threshold value, wherein the intra-class distance is a similarity distance between an anchor point and an element belonging to the same class as the anchor point, so that the image retrieval performance is better.

Based on the first embodiment of the present application, a fourth embodiment of the present application provides a training method for an image feature extraction model. Referring to fig. 3, fig. 3 shows a processing flow of the image feature extraction model training method. Since the first embodiment is taken as a basis, the fourth embodiment is relatively simple to describe, and please refer to the corresponding description of the first embodiment for related parts.

The image feature extraction model training method provided by the fourth embodiment of the application comprises the following steps: step S301 to step S303.

At present, a triple cost function (triple Loss) is generally adopted to train an image feature extraction model based on deep learning, and the trained image feature extraction model is used to extract image features of an image to be queried, which are used as retrieval features to query an image with semantic close to that of the image to be queried in an image library. In the existing image feature extraction model training scheme, in order to provide enough information for a training model, a difficult triple is generally selected by adopting a difficult sample Mining (Hard sample Mining) technology for training. The negative sample closest to the anchor point and the positive sample farthest from the anchor point need to be searched in all training samples for each training triplet anchor point, and the problem of prolonged training time caused by sample selection exists.

The image feature extraction model training method provided by the embodiment of the application is used for carrying out countermeasure training on an image feature extraction model based on deep learning, and a difficulty sample generator is introduced into the image feature extraction model training to form a generation countermeasure Network (GAN) for carrying out countermeasure training on an initial image feature extraction model and the initial difficulty sample generator so as to obtain a trained image feature extraction model which is used for image retrieval. The method does not need to select difficult samples in training samples, thereby solving the problem of prolonged training time caused by sample selection.

The image retrieval can be to input an image or picture to be queried, search images with similar or similar semantemes in an image library or an image set to be retrieved, and can be used for searching image scenes and traffic security scenes by using the image in practice, for example, images of people, vehicles and non-motor vehicles are matched in the same example to acquire information of flow tracks of vehicles and people. Specifically, in the embodiment of the application, the trained image feature extraction model is adopted to extract the image features of the image to be queried, and the image with the similarity distance meeting the distance threshold value with the image features is searched in the image library or the image set to be retrieved and serves as the image with semantic similarity or similarity with the image to be queried.

The training method of the image feature extraction model shown in fig. 3 includes: step S301 to step S303.

This step is to acquire image data for training.

This step is to obtain the confrontation sample of the training sample.

increasing the similarity between elements of the same category;

reducing the similarity between elements of different classes.

In the step, an image feature extraction model is obtained through training.

training the initial arbiter by minimizing the classification cost function.

The classification cost function is expressed by the following formula:

L_D＝L_D，real+βL_D，fake(ii) a (formula 1)

Further, image triplets are used<a，p，n>Examples of (3). The image feature extraction model consists of<a，p，n>Extracting image features<F(a)，F(p)，F(n)>Generating interference signatures by difficult sample generators<G(F(a))，G(F(p))，G(F(n))>The discriminator is used for classifying the image characteristics and the interference characteristics, for example, the number of classes is K +1, namely K real classes and 1 false class<F(a)，F(p)，F(n)>Classifying into corresponding real categories<G(F(a))，G(F(p))，G(F(n))>A pseudo category is classified.<a，p，n>Class labels of<l_a，l_p，l_n>And in the presence of l_a＝l_p，l_a≠l_n. The training goal for the discriminator is to minimize the classification cost function shown in (equation 1).

Wherein L is_D，realCan be expressed by the following formula:

wherein L is_D；realDetermining losses for the true class;

L_sm() Represents the softmax loss function;

d (x) represents the classification of x by the discriminator;

L_D，fakeCan be expressed by the following formula:

wherein L is_D；fakeDetermining a loss for the pseudo category;

L_sm() Represents the softmax loss function;

d (x) represents the classification of x by the discriminator;

g (F (a)) represents the anchor point characteristics of the countermeasure sample obtained after the image characteristics of the image anchor point a are interfered by a difficulty generator, l_fakeA category label representing the confrontation sample;

The target cost function includes:

The target cost function is expressed by the following formula:

L_F＝L_F，tri+μL_D，real(ii) a (formula 4)

Wherein L is_D，realMay be as described above (equation 3).

Wherein L is_F，triCan be expressed by the following formula:

L_F；tri＝[d(G(F(a))，G(F(p)))-d(G(F(a))，G(F(n)))+m]₊(ii) a (formula 5)

m is a distance threshold.

The interference cost function is expressed by the following formula:

L_G＝L_G，tri+γL_G，cls(ii) a (formula 6)

The category uniform loss value can be expressed by the following formula:

L_sm() Represents the softmax loss function;

d (x) represents the classification of x by the discriminator.

The resistance loss value can be expressed by the following equation:

m is a distance threshold.

In the embodiment of the application, the trained image feature extraction model can be used for obtaining the image retrieval features of the image to be queried. The intra-class distance of the image features is smaller than an intra-class distance threshold, and the intra-class distance is a similarity distance between an anchor point and an element belonging to the same class as the anchor point. Therefore, the image characteristics are used for searching the image associated with the image to be inquired in the image set to be searched, and a more accurate image searching result can be obtained.

Corresponding to the embodiment of the traffic image processing method provided by the application, the fifth embodiment of the application also provides a traffic image processing device.

Referring to fig. 7, there is shown a schematic diagram of an apparatus provided in accordance with a fifth embodiment of the present application. Since the device embodiment is basically similar to the method embodiment, the description is relatively simple, and the relevant portions only need to refer to the corresponding description of the method embodiment.

The traffic image processing apparatus shown in fig. 7 includes:

an inquiry image determining unit 701, configured to determine a traffic image to be inquired;

a retrieval image set determining unit 702, configured to obtain, according to video traffic information, an image data set to be retrieved for retrieving the traffic image to be queried;

an image feature extraction model acquisition unit 703 for acquiring an image feature extraction model for extracting an image retrieval feature;

a retrieval feature extraction unit 704, configured to extract an image retrieval feature of the traffic image to be queried by using the image feature extraction model;

the retrieving unit 705 is configured to retrieve, according to the image retrieval feature of the traffic image to be queried, an image similar to the traffic image to be queried in the image dataset to be retrieved.

The retrieval image set determining unit 702 is specifically configured to:

obtaining video data containing video traffic information;

acquiring at least one video frame from video data;

determining the at least one video frame as the image data set to be retrieved.

The image feature extraction model obtaining unit 703 is specifically configured to:

The image feature extraction model obtaining unit 703 includes a countermeasure sample generating subunit, which is configured to perform interference processing on the training samples by using the difficult sample generator in at least one of the following manners:

increasing the similarity between elements of the same category;

reducing the similarity between elements of different classes.

The image feature extraction model obtaining unit 703 includes a discriminator subunit, where the discriminator subunit is configured to:

and determining the accuracy of the classification result according to the preset class to which the training sample belongs and the preset class to which the confrontation sample belongs.

The image feature extraction model obtaining unit 703 includes a training subunit, and the training subunit is configured to:

Wherein the training subunit is specifically configured to: keeping the class of the confrontation sample consistent with the class of the training sample in the training process.

Wherein the training subunit is specifically configured to:

Wherein the retrieval feature extraction unit 704 is configured to:

Wherein the training subunit is further configured to:

training the initial arbiter by minimizing the classification cost function.

Wherein the classification cost function comprises:

Wherein the target cost function comprises:

Wherein the interference cost function comprises a counter-penalty value and a category-consistent penalty value; wherein the challenge loss values comprise intra-class spacing, inter-class spacing, and similarity distance thresholds for challenge samples generated by the difficult sample generator, the class-consistent loss values comprising softmax loss values for keeping the class of the challenge samples and the class of the training samples consistent during a training process.

Wherein the retrieving unit 705 is further configured to:

The image data is an image triple containing an anchor point, elements belonging to the same category as the anchor point and elements belonging to different categories as the anchor point; the training sample is an image feature triple comprising the image feature of the anchor point, the image feature of the element belonging to the same category as the anchor point and the image feature of the element belonging to different categories as the anchor point; the countermeasure sample is an interference feature triplet containing post-interference features of the anchor point, post-interference features of elements belonging to the same category as the anchor point, and post-interference features of elements belonging to a different category from the anchor point.

The sixth embodiment of the present application further provides an electronic device for implementing the traffic image processing method, and fig. 8 shows a schematic diagram of an electronic device provided in the sixth embodiment of the present application.

The embodiments of the electronic device provided in the present application are described more simply, and please refer to the corresponding description of the first embodiment.

The electronic device shown in fig. 8 includes:

a memory 801, and a processor 802;

the memory 801 is configured to store computer-executable instructions, and the processor 802 is configured to execute the computer-executable instructions to:

determining a traffic image to be inquired;

Optionally, the processor 802 is further configured to execute the following computer-executable instructions:

obtaining video data containing video traffic information;

acquiring at least one video frame from video data;

determining the at least one video frame as the image data set to be retrieved.

Optionally, the processor 802 is further configured to execute at least one of the following computer-executable instructions: increasing the similarity between elements of the same category;

reducing the similarity between elements of different classes.

Optionally, the processor 802 is further configured to execute the following computer-executable instructions: keeping the class of the confrontation sample consistent with the class of the training sample in the training process.

training the initial arbiter by minimizing the classification cost function.

Optionally, the classification cost function includes:

Optionally, the target cost function includes:

The seventh embodiment of the present application further provides a storage device for the traffic image processing method, which is described relatively simply, and please refer to the corresponding description of the first embodiment.

A storage device storing instructions that can be loaded by a processor and that perform the steps of:

determining a traffic image to be inquired;

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.

2. As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

Claims

1. A traffic image processing method, characterized by comprising:

determining a traffic image to be inquired;

2. The method of claim 1, wherein obtaining the image data set to be retrieved for retrieving the traffic image to be queried according to the video traffic information comprises:

obtaining video data containing video traffic information;

acquiring at least one video frame from video data;

determining the at least one video frame as the image data set to be retrieved.

3. The method of claim 1, wherein obtaining an image feature extraction model for extracting image retrieval features comprises:

4. The method of claim 3, wherein the training samples are interference processed using a difficult sample generator in at least one of:

increasing the similarity between elements of the same category;

reducing the similarity between elements of different classes.

5. The method of claim 3, further comprising:

6. The method of claim 5, wherein the training an initial discriminator, an initial image feature extraction model and an initial difficult sample generator according to the accuracy of the training samples, the confrontation samples and the classification results to obtain a trained image feature extraction model comprises: keeping the class of the confrontation sample consistent with the class of the training sample in the training process.

7. The method of claim 5, wherein the training an initial discriminator, an initial image feature extraction model and an initial difficult sample generator according to the accuracy of the training samples, the confrontation samples and the classification results to obtain a trained image feature extraction model comprises:

8. The method of claim 7, further comprising:

9. The method of claim 7, further comprising:

training the initial arbiter by minimizing the classification cost function.

10. The method of claim 9, wherein the classification cost function comprises:

11. The method of claim 7, wherein the target cost function comprises:

12. The method of claim 7, wherein the interference cost function comprises a countervailing loss value and a class consistent loss value; wherein the challenge loss values comprise intra-class spacing, inter-class spacing, and similarity distance thresholds for challenge samples generated by the difficult sample generator, the class-consistent loss values comprising softmax loss values for keeping the class of the challenge samples and the class of the training samples consistent during a training process.

13. The method of claim 3, further comprising:

14. The method of claim 3, wherein the image data is an image triplet containing an anchor point, elements belonging to the same class as the anchor point, and elements belonging to a different class than the anchor point;

15. A commodity image processing method is characterized by comprising the following steps:

determining an image of a commodity to be inquired;

16. An image retrieval method, comprising:

determining an image to be queried;

determining an image set to be retrieved;

17. A training method of an image feature extraction model is characterized by comprising the following steps:

obtaining image data for training an image feature extraction model;

18. An apparatus for training an image feature extraction model, comprising:

19. An electronic device, comprising:

a memory, and a processor;

determining a traffic image to be inquired;

20. A storage device having stored thereon instructions capable of being loaded by a processor and performing the steps of:

determining a traffic image to be inquired;