CN111046971A

CN111046971A - Image recognition method, device, equipment and computer readable storage medium

Info

Publication number: CN111046971A
Application number: CN201911345426.XA
Authority: CN
Inventors: 周康明; 戚风亮
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2020-04-21

Abstract

The invention provides an image identification method, an image identification device, image identification equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring a data set to be trained, wherein the data set to be trained comprises a plurality of images to be trained; extracting a global feature vector and a background feature vector of the image to be trained aiming at each image to be trained in the data set to be trained; constructing incremental sample data through the global feature vector and the background feature vector; determining a loss function according to the incremental sample data, and training a preset model to be trained through the loss function to obtain an image recognition model; and adopting the image recognition model to recognize a preset target object. The training of the model to be trained is realized through the incremental sample data, no additional new data set needs to be introduced, the requirement on the data set is low, and the recognition accuracy of the image recognition model obtained through training is high.

Description

Image recognition method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of image recognition, and in particular, to an image recognition method, an image recognition apparatus, an image recognition device, and a computer-readable storage medium.

Background

The Re-identification (Re-id) task is used as an important branch in computer vision, and is widely applied to the fields of smart cities, smart traffic and the like. In practical applications, a moving object, which may be a person or a vehicle, may appear in different cameras one after another, and needs to be identified by using the Re-id task.

In order to recognize a target object through a re-recognition task, global and local features are generally extracted through a preset neural network model in the prior art to judge two pictures.

However, when the method is used for re-recognition, operations such as key point detection, semantic segmentation and the like are often required, an additional data set needs to be introduced, and the requirement on the data set during model training is high.

Disclosure of Invention

The invention provides an image recognition method, an image recognition device, image recognition equipment and a computer-readable storage medium, which are used for solving the technical problems that in the existing re-recognition method, an additional data set needs to be introduced in the model training process, and the requirement on the data set during model training is high.

A first aspect of the present invention provides an image recognition method, including:

acquiring a data set to be trained, wherein the data set to be trained comprises a plurality of images to be trained;

extracting a shallow feature vector, a global feature vector and a background feature vector of the image to be trained aiming at each image to be trained in the data set to be trained;

constructing incremental sample data through the global feature vector and the background feature vector;

determining a loss function according to the incremental sample data, and training a preset model to be trained through the loss function to obtain an image recognition model;

and adopting the image recognition model to recognize a preset target object.

A second aspect of the present invention provides an image recognition apparatus comprising:

the device comprises an acquisition module, a training module and a training module, wherein the acquisition module is used for acquiring a data set to be trained, and the data set to be trained comprises a plurality of images to be trained;

the extraction module is used for extracting a shallow feature vector, a global feature vector and a background feature vector of the image to be trained aiming at each image to be trained in the data set to be trained;

the sample construction module is used for constructing incremental sample data through the global feature vector and the background feature vector;

the determining module is used for determining a loss function according to the incremental sample data, and training a preset model to be trained through the loss function to obtain an image recognition model;

and the recognition module is used for recognizing a preset target object by adopting the image recognition model.

A third aspect of the present invention provides an image recognition apparatus comprising: a memory, a processor;

a memory; a memory for storing the processor-executable instructions;

wherein the processor is configured to perform the image recognition method of the first aspect by the processor.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the image recognition method according to the first aspect when the computer-executable instructions are executed by a processor.

According to the image identification method, the image identification device, the image identification equipment and the computer readable storage medium, the global feature vector and the background feature vector in the image to be trained are extracted, so that the feature vector of the target object can be determined according to the global feature vector and the background feature vector, and further, the generation of training sample data can be realized according to the random combination of the feature vector of the target object and the background feature vector. The training of the model to be trained is realized through the incremental sample data, no additional new data set needs to be introduced, the requirement on the data set is low, and the recognition accuracy of the image recognition model obtained through training is high.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a schematic diagram of a network architecture on which the present invention is based;

fig. 2 is a schematic flowchart of an image recognition method according to an embodiment of the present invention;

FIG. 3 is a network architecture diagram of a network model provided by an embodiment of the present invention;

fig. 4 is a schematic flowchart of an image recognition method according to a second embodiment of the present invention;

FIG. 5 is a network architecture diagram of another network model provided by an embodiment of the present invention;

fig. 6 is a schematic flowchart of an image recognition method according to a third embodiment of the present invention;

fig. 7 is a schematic structural diagram of an image recognition apparatus according to a fourth embodiment of the present invention;

fig. 8 is a schematic structural diagram of an image recognition apparatus according to a fifth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other examples obtained based on the examples in the present invention are within the scope of the present invention.

Aiming at the technical problems that in the process of model training, an additional data set needs to be introduced and the requirement on the data set is high in the process of model training in the conventional re-recognition method, the invention provides an image recognition method, an image recognition device, image recognition equipment and a computer-readable storage medium. In order to avoid introducing too much new data sets in the model training process, the existing data sets can be used for incremental data construction, the model to be trained is trained through incremental sample data, and the use of the data sets can be reduced on the basis of ensuring the model identification accuracy.

It should be noted that the image recognition method, apparatus, device and computer-readable storage medium provided in the present application may be applied in a scenario of testing application software in each application.

Fig. 1 is a schematic diagram of a system architecture based on the present invention, and as shown in fig. 1, the system architecture based on the present invention at least includes: an image recognition apparatus 1 and a data server 2. The image recognition device 1 is written by C/C + +, Java, Shell or Python languages and the like; the data server 2 may be a cloud server or a server cluster, and a large amount of data is stored therein. The image recognition apparatus 1 is communicatively connected to the data server 2, so that the image recognition apparatus 1 can acquire data from the data server 2.

Fig. 2 is a schematic flowchart of an image recognition method according to an embodiment of the present invention, and as shown in fig. 2, the method includes:

step 101, acquiring a data set to be trained, wherein the data set to be trained comprises a plurality of images to be trained.

The implementation subject of the embodiment is an image recognition device, which is connected to a data server in communication, so that the image recognition device can acquire a data set to be trained from the data server. Before re-identifying the preset target object through the image identification model, the image identification model needs to be established firstly. Specifically, in order to implement establishment of an image recognition model, a data set to be trained needs to be acquired from a data server, where the data set to be trained includes a plurality of images to be trained. The data set to be trained may specifically be an open-source image data set, and optionally, the image to be trained may be an image of labeled target object information and background information, and the image to be trained may be acquired by a plurality of image acquisition devices.

Optionally, on the basis of the foregoing embodiment, before the step 102, the method further includes:

and carrying out size adjustment and channel mean value removal processing on each image to be trained.

In this embodiment, after the image to be trained is acquired, in order to improve the processing efficiency of the subsequent image to be trained, a preprocessing operation may be performed on the image to be trained. Specifically, the images to be trained may be resized and channel de-averaged. In addition, the image to be trained can be processed by other preprocessing means according to the actual requirement, which is not limited by the invention. For example, in order to add images to be trained, data enhancement methods such as adding random noise to the acquired images or randomly inverting the acquired images may also be used.

And 102, extracting a global feature vector and a background feature vector of the image to be trained aiming at each image to be trained in the data set to be trained.

In this embodiment, after the data set to be trained is acquired, in order to train the model to be trained through the data set to be trained, first, feature information of an image to be trained in the data set to be trained needs to be extracted. Specifically, for each image to be trained, a global feature vector and a background feature vector of the image to be trained may be extracted respectively. Optionally, a plurality of images to be trained may be read at one time for feature extraction, or after the feature extraction of the current image to be trained, the next image to be trained may be continuously read for feature extraction, or the features of the images to be trained acquired by each image acquisition device may be respectively extracted, which is not limited in the present invention.

And 103, constructing increment sample data through the global feature vector and the background feature vector.

In this embodiment, the image to be trained may specifically be composed of a target object portion and a background portion, so that when the global feature vector and the background feature vector of the image to be trained are obtained, the feature vector of the target object in the image to be trained can be determined. Accordingly, the target object can appear in a plurality of different backgrounds, and therefore the feature vector of the target object is sequentially combined with the feature vectors of other backgrounds, that is, incremental samples of the same target object in a plurality of different backgrounds can be obtained.

By adopting the global feature vector and the background feature vector to construct the incremental sample data, new training data does not need to be introduced in the model training process, and the requirement on the data volume is low.

And step 104, determining a loss function according to the incremental sample data, and training a preset model to be trained through the loss function to obtain an image recognition model.

In this embodiment, after the incremental sample data is constructed by using the global feature vector and the background feature vector, a loss function may be determined by using the incremental sample data, and then a preset model to be trained may be trained by using the loss function to obtain the image recognition model.

And 105, recognizing a preset target object by using the image recognition model.

In this embodiment, the image recognition model may be used to perform a recognition operation on a preset target object. Specifically, an image to be recognized may be acquired, and the position of the target object in the image to be recognized may be recognized.

According to the image identification method provided by the embodiment, the global feature vector and the background feature vector in the image to be trained are extracted, so that the feature vector of the target object can be determined according to the global feature vector and the background feature vector, and further, the generation of training sample data can be realized according to the random combination of the feature vector of the target object and the background feature vector. The training of the model to be trained is realized through the incremental sample data, no additional new data set needs to be introduced, the requirement on the data set is low, and the recognition accuracy of the image recognition model obtained through training is high.

Fig. 3 is a network structure diagram of a network model according to an embodiment of the present invention, and based on any one of the above embodiments, as shown in fig. 3, step 102 specifically includes:

extracting a shallow feature vector of the image to be trained through a preset first feature extraction layer;

extracting a global feature vector of the shallow feature vector through a preset second feature extraction layer;

extracting a background feature vector of the shallow feature vector through a preset third feature extraction layer;

the second feature extraction layer and the third feature extraction layer are two branch feature extraction layers below the first feature extraction layer.

In this embodiment, as shown in fig. 3, the network model specifically includes three feature extraction layers, and the second feature extraction layer and the third feature extraction layer are two branch feature extraction layers below the first feature extraction layer. The first feature extraction layer is specifically used for extracting shallow feature vectors of the image to be trained. The second characteristic extraction layer is used for extracting global characteristic vectors from the shallow characteristic vectors, and the third characteristic extraction layer is used for extracting background characteristic vectors from the shallow characteristic vectors. The extraction of the global feature vector and the background feature vector is realized from the shallow feature vector, so that the accuracy and the efficiency of feature extraction can be improved.

Fig. 4 is a schematic flow chart of an image recognition method according to a second embodiment of the present invention, and based on any of the above embodiments, as shown in fig. 3 to 4, step 103 specifically includes:

step 201, calculating a difference value between the global feature vector and the background feature vector to obtain a feature vector of a target object;

step 202, combining the feature vector of the target object with the background feature vector of each image to be trained in the data set to be trained respectively, and obtaining increment sample data of the target object in different backgrounds.

As shown in fig. 3, in the network structure, after obtaining the global feature vector output by the second feature extraction layer and the background feature vector output by the third feature extraction layer, a difference operation may be performed on the global feature vector and the background feature vector to obtain a difference between the global feature vector and the background feature vector.

Specifically, the image to be trained may specifically be composed of a target object portion and a background portion, and therefore, a difference value between the global feature vector and the background feature vector may represent the feature vector of the target object in the image to be trained. The target object can appear in a plurality of different backgrounds, so that the feature vector of the target object is sequentially combined with the feature vectors of other backgrounds, that is, incremental sample data of the same target object in a plurality of different backgrounds can be obtained.

Fig. 5 is a network structure diagram of another network model according to an embodiment of the present invention, as shown in fig. 5, a first image to be trained and a second image to be trained are input into the network model, and a first background feature vector and a feature vector of a first target object corresponding to the first image to be trained are extracted, and a second background feature vector and a feature vector of a second target object corresponding to the second image to be trained are extracted. Combining the first background feature vector with the feature vector of the second target object to obtain an image to be trained of the second target object in the second image to be trained under the first background vector; and combining the feature vector of the first target object with the second background feature vector to obtain an image to be trained of the first target object in the first image to be trained under the second background vector.

According to the image identification method provided by the embodiment, the global feature vector and the background feature vector of the image to be trained are obtained, so that the feature vector of the target object can be obtained through the global feature vector and the background feature vector, incremental sample data can be generated through the feature vector of the target object and the background feature vector, a new data set does not need to be introduced too much, and the requirement on training data is low.

Further, on the basis of any of the above embodiments, the data set to be trained includes an image identifier corresponding to the image to be trained and an image acquisition device identifier for shooting the image to be trained;

correspondingly, step 104 specifically includes:

performing supervised learning on the background characteristic vector through an image acquisition device identifier for shooting the image to be trained to obtain a first classification loss function;

and carrying out supervised learning on the feature vector of the target object through a preset identifier of the target object to obtain a second classification loss function.

In this embodiment, in order to make the two branches of the model learn the features we want, we need to specify a loss function for the model. The third feature extraction layer aims at obtaining a background feature vector, and the background feature vector is only related to the position of the camera, so that supervised learning can be carried out on the third feature extraction layer based on the identification of the image acquisition device for shooting the image to be trained, and a first classification loss function is obtained through calculation; and simultaneously, monitoring the obtained characteristic vector of the target object, using a preset target object identifier, and calculating to obtain a second classification loss function. For the feature vector of the newly generated sample, we adopt the id of its corresponding target vector as the real label of its classification. The first classification loss function and the second classification loss function may be a contrast loss function or a triple loss function, respectively. The classification loss function may be specifically a cross entropy loss function, or may be any other loss function, which is not limited in the present invention. In addition, in the generation process of the loss function, the identification of the image acquisition device is used as a basis for calculating the first classification loss, so that the accuracy of model training can be improved.

Optionally, after the first classification loss function and the second classification loss function are obtained, the model to be trained may be trained by using a target classification loss function after the first classification loss function and the second classification loss function are combined. Specifically, on the basis of any of the above embodiments, the step 104 specifically includes:

performing weighted calculation on the first classification loss function and the second classification loss function to obtain a target classification loss function;

and training the model to be trained by adopting a gradient descent optimization method through the target classification loss function to obtain the image recognition model.

In this embodiment, in order to implement training of the model to be trained, a weighted calculation may be performed on the first classification loss function and the second classification loss function to obtain a target classification loss function. And then training the model to be trained through the target classification loss function to obtain the image recognition model. Specifically, the training may be performed by using an optimization method of gradient descent, and in addition, other optimization methods may also be selected to perform the training of the model to be trained, which is not limited in the present invention.

Optionally, on the basis of any of the above embodiments, the model to be trained may also be trained by using the first classification loss function and the second classification loss function respectively. Specifically, on the basis of any of the above embodiments, the step 104 specifically includes:

training the model to be trained through the first classification loss function or the second classification loss function to obtain a pre-trained model to be trained;

and training the pre-trained model to be trained through the second classification loss function or the first classification loss function to obtain the image recognition model.

In this embodiment, in order to implement training of the model to be trained, the model to be trained may be trained by using the first classification loss function or the second classification loss function, so as to obtain the pre-trained model to be trained. Correspondingly, the pre-trained model to be trained can be trained by continuously adopting the second classification loss function or the first classification loss function, so as to obtain the image recognition model. Specifically, if the first step adopts the first classification loss function to train the model to be trained, the second step may adopt the second classification loss function to train the model to be trained after the pre-training, the third step adopts the first classification loss function to train the model to be trained, the fourth step adopts the second classification loss function to train … … the model to be trained after the pre-training, and so on, and trains the model to be trained in the alternating manner; if the model to be trained is trained by using the second classification loss function in the first step, the model to be trained after pre-training can be trained by using the first classification loss function in the second step, the model to be trained after pre-training can be trained by using the second classification loss function in the third step, the model to be trained after pre-training can be trained by using the first classification loss function in the fourth step … … and so on, and the model to be trained is trained in the alternative mode. Specifically, the training may be performed by using an optimization method of gradient descent, and in addition, other optimization methods may also be selected to perform the training of the model to be trained, which is not limited in the present invention.

In the image recognition method provided by this embodiment, the identifier of the image acquisition device for shooting the image to be trained is used to perform supervised learning on the background feature vector to obtain the first classification loss function, and the preset identifier of the target object is used to perform supervised learning on the feature vector of the target object to obtain the second classification loss function, so that training of the model to be trained can be realized. In addition, the number and the types of the classification loss functions are small, so that the training difficulty of the model to be trained is low, and the training efficiency of the model to be trained is improved.

Fig. 6 is a schematic flow chart of an image recognition method according to a third embodiment of the present invention, where on the basis of any of the foregoing embodiments, step 105 specifically includes:

301, acquiring an image to be identified;

step 302, identifying a characteristic vector of at least one object to be detected in the image to be identified through the image identification model;

step 303, sequentially calculating Euclidean distances between the characteristic vector of the at least one object to be detected and a preset characteristic vector of a target object;

and 304, taking the object to be detected with the shortest Euclidean distance as the target object.

In this embodiment, the re-recognition of the target object may be implemented by using an image recognition model. Specifically, an image to be recognized, which may possibly include a target object, may be obtained, the image to be recognized is input into an image recognition model, and a feature vector of the object to be recognized in the image to be recognized is output. In order to realize re-identification of the target object, a feature vector of the target object may be set in advance. And calculating the Euclidean distance between the characteristic vector of the object to be detected in the image to be recognized and the preset characteristic vector of the target object at one time, and taking the object to be detected with the minimum Euclidean distance as the target object.

The image recognition method provided by this embodiment determines the feature vector of the object to be detected through the image recognition model, and can accurately re-recognize the target object by calculating the euclidean distance between the feature vector of the object to be detected and the preset feature vector of the target object.

Fig. 7 is a schematic structural diagram of an image recognition apparatus according to a fourth embodiment of the present invention, as shown in fig. 7, the apparatus includes: the image training device comprises an acquisition module 41, an extraction module 42, a sample construction module 43, a determination module 44 and an identification module 45, wherein the acquisition module 41 is used for acquiring a data set to be trained, and the data set to be trained comprises a plurality of images to be trained; an extracting module 42, configured to extract, for each image to be trained in the data set to be trained, a global feature vector and a background feature vector of the image to be trained; a sample construction module 43, configured to construct incremental sample data by using the global feature vector and the background feature vector; the determining module 44 is configured to determine a loss function according to the incremental sample data, and train a preset model to be trained through the loss function to obtain an image recognition model; and the recognition module 45 is configured to perform a recognition operation on a preset target object by using the image recognition model.

The image recognition device provided in this embodiment can determine the feature vector of the target object according to the global feature vector and the background feature vector by extracting the global feature vector and the background feature vector in the image to be trained, and further can generate training sample data according to a random combination of the feature vector of the target object and the background feature vector. The training of the model to be trained is realized through the incremental sample data, no additional new data set needs to be introduced, the requirement on the data set is low, and the recognition accuracy of the image recognition model obtained through training is high.

Further, on the basis of any of the above embodiments, the extracting module is configured to:

Further, on the basis of any one of the above embodiments, the apparatus further includes:

and the preprocessing module is used for carrying out size adjustment and channel mean value removal processing on each image to be trained.

Further, on the basis of any of the above embodiments, the sample construction module is configured to:

calculating the difference value of the global feature vector and the background feature vector to obtain the feature vector of the target object;

and combining the characteristic vector of the target object with the background characteristic vector of each image to be trained in the data set to be trained respectively to obtain incremental sample data of the target object in different backgrounds.

accordingly, the determination module is configured to:

Further, on the basis of any of the above embodiments, the determining module is configured to:

and training the model to be trained by adopting an optimization device with gradient descent through the target classification loss function to obtain the image recognition model.

Further, on the basis of any of the above embodiments, the identification module is configured to:

acquiring an image to be identified;

identifying a characteristic vector of at least one object to be detected in the image to be identified through the image identification model;

sequentially calculating Euclidean distances between the characteristic vector of the at least one object to be detected and a preset characteristic vector of a target object;

and taking the object to be detected with the shortest Euclidean distance as the target object.

Fig. 8 is a schematic structural diagram of an image recognition apparatus according to a fifth embodiment of the present invention, and as shown in fig. 8, the apparatus includes: a memory 51, a processor 52;

a memory 51; a memory 51 for storing instructions executable by the processor 52;

wherein the processor 52 is configured to execute the image recognition method according to any of the above embodiments by the processor 52.

The memory 51 stores programs. In particular, the program may include program code comprising computer operating instructions. The memory 51 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 52 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.

Alternatively, in a specific implementation, if the memory 51 and the processor 52 are implemented independently, the memory 51 and the processor 52 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (enhanced Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

Alternatively, in a specific implementation, if the memory 32 and the processor 33 are integrated on a chip, the memory 51 and the processor 52 may perform the same communication through an internal interface.

The invention further provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the computer-readable storage medium is used for implementing the image recognition method according to any one of the above embodiments.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image recognition method, comprising:

extracting a global feature vector and a background feature vector of the image to be trained aiming at each image to be trained in the data set to be trained;

and adopting the image recognition model to recognize a preset target object.

2. The method according to claim 1, wherein the extracting the global feature vector and the background feature vector of the image to be trained comprises:

3. The method according to claim 1 or 2, wherein said constructing incremental sample data by the global feature vector and the background feature vector comprises:

4. The method according to claim 3, wherein the data set to be trained comprises an image identifier corresponding to the image to be trained and an image acquisition device identifier for shooting the image to be trained;

accordingly, the determining a loss function from the delta sample data comprises:

5. The method according to claim 4, wherein the training of the preset model to be trained by the loss function comprises:

6. The method according to claim 4, wherein the training of the preset model to be trained by the loss function comprises:

7. The method according to any one of claims 1-2 and 4-6, wherein the performing of the recognition operation on the preset target object by using the image recognition model comprises:

acquiring an image to be identified;

8. An image recognition apparatus, comprising:

the extraction module is used for extracting a global feature vector and a background feature vector of the image to be trained aiming at each image to be trained in the data set to be trained;

9. An image recognition apparatus characterized by comprising: a memory, a processor;

a memory; a memory for storing the processor-executable instructions;

wherein the processor is configured to perform the image recognition method of any one of claims 1-7 by the processor.

10. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, are configured to implement the image recognition method of any one of claims 1-7.