CN108776787B

CN108776787B - Image processing method and device, electronic device and storage medium

Info

Publication number: CN108776787B
Application number: CN201810565733.8A
Authority: CN
Inventors: 杨硕
Original assignee: JD Digital Technology Holdings Co Ltd
Current assignee: JD Digital Technology Holdings Co Ltd; Jingdong Technology Holding Co Ltd
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2020-09-29
Anticipated expiration: 2038-06-04
Also published as: CN108776787A; WO2019233421A1

Abstract

The disclosure relates to an image processing method and device, electronic equipment and a storage medium, and relates to the technical field of machine learning, wherein the method comprises the following steps: performing feature extraction on an image to be recognized through a plurality of feature extraction models to obtain a plurality of first feature vectors, and fusing the plurality of first feature vectors to obtain a second feature vector; matching the second feature vector with preset feature vectors of a plurality of reference images to determine one of the preset feature vectors as a target feature vector; and determining the recognition result of the image to be recognized through the target feature vector. The present disclosure can improve the accuracy of image processing.

Description

Image processing method and device, electronic device and storage medium

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium.

Background

In the internet transaction process, in order to facilitate a user to log in a website or an application such as a mobile phone APP, the user can log in through a face recognition method.

When face recognition is performed in the related art, features that are designed manually, such as SIFT and HOG, are generally extracted, or features that are extracted based on a deep learning model are extracted. When extracting features based on the deep learning model, the method can also comprise the step of carrying out feature fusion on different image features to obtain new fused features. For example, the extracted different features are directly subjected to simple operations such as summation, multiplication, averaging, maximum and minimum taking and the like by a simple model fusion method, or the features are converted and then fused into new features.

However, in the above method, the feature extraction and model fusion are two separate steps, i.e., feature extraction is performed first, and then model fusion is performed. Due to the limitation of the feature extraction method, the extracted features cannot be guaranteed to be the optimal features; moreover, different characteristics are fused by using the same method, and the fused new characteristics cannot be guaranteed to obtain the optimal result, so that the accuracy of the identification result is influenced, and the safety and the stability of the system are influenced.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide an image processing method and apparatus, an electronic device, and a storage medium, which overcome, at least to some extent, the problem of low image processing accuracy due to the limitations and disadvantages of the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided an image processing method including: performing feature extraction on an image to be recognized through a plurality of feature extraction models to obtain a plurality of first feature vectors, and fusing the plurality of first feature vectors to obtain a second feature vector; matching the second feature vector with preset feature vectors of a plurality of reference images to determine one of the preset feature vectors as a target feature vector; and determining the recognition result of the image to be recognized through the target feature vector.

In an exemplary embodiment of the present disclosure, extracting features of an image to be recognized through a plurality of feature extraction models, and obtaining a plurality of first feature vectors includes: training a plurality of initial feature models according to the sample images and the initial labels of the sample images to obtain a plurality of feature extraction models; and respectively carrying out feature extraction on the image to be identified through the plurality of feature extraction models to obtain a plurality of first feature vectors associated with the feature extraction models.

In an exemplary embodiment of the present disclosure, training a plurality of initial feature models according to a sample image and an initial label of the sample image, and obtaining a plurality of feature extraction models includes: training a plurality of convolutional neural network models according to the sample images and the initial labels of the sample images to obtain a plurality of feature models; selecting one of the characteristic models as a target model, and superposing another characteristic model of the characteristic models into the target model for joint training to obtain a preset model; and taking the preset model as the target model, continuing to perform joint training with the rest models in the plurality of feature models until all the feature models are subjected to joint training, and taking the plurality of target models as the plurality of feature extraction models.

In an exemplary embodiment of the present disclosure, training a plurality of convolutional neural network models according to the sample image and the initial label of the sample image, and obtaining a plurality of feature models includes: inputting the sample image and the initial label into a plurality of convolutional neural network models to obtain a plurality of initial feature vectors; classifying the initial characteristic vectors to obtain prediction labels; and updating the weight parameters of each convolutional neural network model through the initial label and the prediction label to obtain the plurality of feature models.

In an exemplary embodiment of the present disclosure, updating the weight parameter of each of the convolutional neural network models by the initial label and the predictive label includes: and carrying out backward calculation on the initial label and the prediction label in each convolutional neural network model so as to update the weight parameter of each convolutional neural network model.

In an exemplary embodiment of the present disclosure, selecting one of the feature models as a target model, and superimposing another feature model of the feature models into the target model for joint training, to obtain a preset model includes: locking the weight parameters of the target model; adjusting the weight parameter of the other characteristic model to obtain a target weight parameter; and jointly adjusting the target weight parameters and the weight parameters of the target model to obtain the preset model.

In an exemplary embodiment of the present disclosure, the performing feature extraction on the image to be recognized through the plurality of feature extraction models respectively, and obtaining a plurality of first feature vectors associated with each of the feature extraction models includes: and performing forward calculation on the image to be identified in each feature extraction model to obtain a plurality of first feature vectors of the image to be identified.

In an exemplary embodiment of the present disclosure, fusing the plurality of first feature vectors to obtain a second feature vector includes: and performing fusion calculation on the plurality of first feature vectors to obtain the second feature vector.

In an exemplary embodiment of the present disclosure, matching the second feature vector with a plurality of preset feature vectors, and determining one of the preset feature vectors as a target feature vector includes: calculating the similarity between the second feature vector and a plurality of preset feature vectors in a database; and determining the preset feature vector with the similarity degree with the second feature vector larger than a preset threshold value as the target feature vector.

In an exemplary embodiment of the present disclosure, determining the preset feature vector having a similarity greater than a preset threshold with the second feature vector as the target feature vector includes: calculating the Euclidean distance between the second feature vector and the preset feature vector; if the Euclidean distance is smaller than a preset distance, determining that the image to be recognized is the same as a reference image in a database; and taking the preset feature vector corresponding to the reference image as the target feature vector.

According to an aspect of the present disclosure, there is provided an image processing apparatus including: the feature extraction module is used for extracting features of the image to be recognized through a plurality of feature extraction models to obtain a plurality of first feature vectors, and fusing the plurality of first feature vectors to obtain a second feature vector; the matching control module is used for matching the second characteristic vector with preset characteristic vectors of a plurality of reference images so as to determine one of the preset characteristic vectors as a target characteristic vector; and the identification control module is used for determining the identification result of the image to be identified through the target characteristic vector.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the image processing methods described above via execution of the executable instructions.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of any one of the above.

In the image processing method, the image processing apparatus, the electronic device, and the computer-readable storage medium provided in the exemplary embodiment of the present disclosure, on one hand, the features of the image to be recognized are extracted through the plurality of feature extraction models, and the obtained plurality of first feature vectors are fused to obtain the second feature vector, so that the features of the image to be recognized can be obtained more accurately, and the accuracy of feature extraction is improved; on the other hand, one of the preset characteristic vectors is used as a target characteristic vector by matching the first characteristic vector with the preset characteristic vectors, so that the image to be recognized is recognized and processed through the target characteristic vector, the accuracy of image processing can be improved, and the safety and the stability of the system are ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 schematically illustrates a system architecture diagram for implementing an image processing method in an exemplary embodiment of the present disclosure;

FIG. 2 schematically illustrates an image processing method in an exemplary embodiment of the disclosure;

FIG. 3 schematically illustrates a feature extraction process in an exemplary embodiment of the disclosure;

fig. 4 schematically illustrates a block diagram of an image processing apparatus in an exemplary embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of an electronic device in an exemplary embodiment of the disclosure;

fig. 6 schematically illustrates a program product in an exemplary embodiment of the disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The present exemplary embodiment first provides a system architecture for implementing an image processing method, which can be applied to various image recognition scenes for logging in a website or browsing a page in a face recognition manner. Referring to fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send request instructions or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a photo processing application, a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for shopping-like websites browsed by users using the

terminal devices

101, 102, 103. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the image processing method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the image processing apparatus is generally disposed in the client 101.

Based on the system architecture 100 described above, an image processing method is provided in the present example, and as shown in fig. 2, the image processing method may include the following steps:

in step S210, performing feature extraction on an image to be recognized through a plurality of feature extraction models to obtain a plurality of first feature vectors, and fusing the plurality of first feature vectors to obtain a second feature vector;

in step S220, matching the second feature vector with preset feature vectors of a plurality of reference images to determine one of the preset feature vectors as a target feature vector;

in step S230, the recognition result of the image to be recognized is determined by the target feature vector.

In the image processing method provided in the exemplary embodiment, on one hand, the features of the image to be recognized are extracted through the plurality of feature extraction models, and the obtained plurality of first feature vectors are fused to obtain the second feature vector, so that the features of the image to be recognized can be more accurately obtained, and the accuracy of feature extraction is improved; on the other hand, one of the preset characteristic vectors is used as a target characteristic vector by matching the first characteristic vector with the preset characteristic vectors, so that the image to be recognized is recognized and processed through the target characteristic vector, the accuracy of image processing can be improved, and the safety and the stability of the system are ensured.

Next, the image processing method in the present exemplary embodiment is further explained with reference to the drawings.

In step S210, feature extraction is performed on the image to be recognized through the plurality of feature extraction models to obtain a plurality of initial feature vectors, and the plurality of initial feature vectors are fused to obtain a first feature vector.

In the present exemplary embodiment, the image to be recognized may be, for example, a human face image, an animal image, or another image, and the like, and the size, the color pixel, and the like of the image to be recognized are not particularly limited. The application scenario particles in the present exemplary embodiment are as follows: when a user logs in a certain application platform or website, the user can log in by using face recognition, a camera of the terminal can collect a face image of the user to be logged in, and the face image of the user to be logged in is compared with face images of all users who have registered the application platform and the website so as to control the user to log in. In the scene, the face image of the user to be logged in can be used as the image to be recognized. Before the image to be recognized is recognized, the image to be recognized may be preprocessed. The preprocessing process here may include a face image alignment process. The face alignment process mainly comprises face detection and face key point positioning, then the face key points detected in all images are overlapped with preset face key point positions as much as possible, finally a face area is cut out from the images, and the resolution of the face area is adjusted to a preset size, such as 224 x 224. And then, specific operation can be carried out on the preprocessed image to be recognized.

The plurality of feature extraction models refer to trained final feature extraction models, and specifically, the plurality of initial feature models can be obtained by respectively training. The number of the initial feature models can be set according to actual requirements, generally speaking, under a certain number, the more the number of the models is, the more the obtained feature vectors are, the more accurate the extracted features are, and the higher the obtained final recognition rate is.

Each initial feature model may be trained by a suitable machine learning algorithm or other algorithm, which is described herein as a neural network algorithm. Specifically, each initial feature model may be a convolutional neural network model, and all convolutional neural network models may be a plurality of networks having the same structure and different weight parameters; it is also possible to have a plurality of networks with different structures and different weighting parameters.

In the exemplary embodiment, the features of the image to be recognized can be extracted by a plurality of trained feature extraction models respectively, so as to obtain a plurality of different or same first feature vectors. For example, feature extraction model 1 corresponds to a first feature vector f1, feature extraction model 2 corresponds to a first feature vector f2, and so on.

Specifically, a plurality of initial feature models can be trained according to a sample image and an initial label of the sample image to obtain a plurality of feature extraction models; and then respectively carrying out feature extraction on the image to be identified through the plurality of feature extraction models to obtain a plurality of first feature vectors associated with the feature extraction models. The sample image may be, for example, a plurality of existing face images having labels distinguishable from other persons, for example, a data set D { (X)₁,Y₁)，(X₂,Y₂) …, (Xn, Yn) }. Wherein X₁To X_nRefers to a plurality of sample images, Y₁To Y_nRefers to the corresponding initial label for each sample image. The plurality of convolutional neural network models may include, for example, convolutional neural network N₁To convolutional neural network N_xThe plurality of sample images in the data set D and the initial label corresponding to the sample images may be circularly and sequentially input to the network N₁To network N_xTo obtain a feature extraction model for each network, e.g. by fitting a convolutional neural network N₁Training to obtain a feature extraction model 1 by fitting a convolutional neural network N₂Training is carried out to obtain a feature extraction model 2 and the like.

Specifically, the process of training a plurality of initial feature models according to the sample images and the initial labels of the sample images to obtain a plurality of feature extraction models includes the following steps: firstly, training a plurality of convolutional neural network models according to the sample images and the initial labels of the sample images to obtain a plurality of feature models. Firstly, inputting the sample image and the initial label into a plurality of convolutional neural network models to obtain initial characteristic vectors corresponding to the convolutional neural network models; classifying the initial characteristic vectors to obtain prediction labels; and updating the weight parameters of each convolutional neural network model through the initial label and the prediction label to obtain the plurality of feature models.

In this exemplary embodiment, the sample image and the initial label of the sample image may be input to a plurality of convolutional neural network models based on the target task, and the plurality of convolutional neural network models may be trained, so that a plurality of initial feature vectors may be obtained. The objective task refers to an objective loss function determined by the face recognition task. The initial feature vectors may then be classified to obtain a prediction label of the sample image, and the prediction label may be the same as or different from the initial label. Further, the initial label and the prediction label may be calculated backward in a convolutional neural network model to update a weight parameter of each convolutional neural network model, thereby obtaining the plurality of feature models. The feature model refers to a model after the weight parameters of each convolutional neural network model are updated.

And secondly, selecting one of the characteristic models as a target model, and superposing another characteristic model of the characteristic models into the target model for joint training to obtain a preset model. That is, the plurality of feature models may include, for example, a feature model 1, a feature model 2, a feature model 3, and a feature model 4. The feature model 1 is a model of the convolutional neural network model 1 after updating the weight parameters, the feature model 2 is a model of the convolutional neural network model 2 after updating the weight parameters, the feature model 3 is a model of the convolutional neural network model 3 after updating the weight parameters, and the feature model 4 is a model of the convolutional neural network model 4 after updating the weight parameters. Any one of the plurality of feature models may be used as a target model, which is a trained feature extraction model, for example, the feature model 1 may be used as a target model. Then, another feature model of the plurality of feature models may be sequentially superimposed on the determined target model for joint training, so as to obtain a preset model. The other feature model may be any one of the feature model 2, the feature model 3, and the feature model 4. The preset model refers to a determined target model and a new target model obtained by combined training of another characteristic model. That is, each time the number of iterations increases, the number of the jointly trained models also increases by one, and the target model and the preset model are also updated according to the jointly trained models. For example, the feature model 2 may be added to the target model for the first joint training to obtain the preset model.

When a preset model is obtained through combined training, the weight parameters of the target model need to be locked; adjusting the weight parameter of the other characteristic model to obtain a target weight parameter; and adjusting the target weight parameter and the weight parameter of the target model to obtain the preset model. That is, the weight parameters of the target model are kept unchanged, and only the weight parameters of the feature model 2 are adjusted until convergence, and further, the weight parameters of the target model and the adjusted weight parameters of the feature model 2 are continuously adjusted in a combined manner until convergence, so as to obtain the preset model.

And thirdly, taking the preset model as the target model, and continuing to perform joint training with the rest models in the plurality of feature models until all the plurality of feature models are subjected to joint training. That is, any one of the remaining models of the plurality of feature models may be jointly trained with the target model on the basis of the target model until all of the plurality of feature models are jointly trained. For example, the preset model obtained by the first joint training is used as a new target model, the feature models 3 in the remaining models are added to the new target model for the second joint training, and the preset model is obtained again. And performing joint training on the rest models in the feature models in sequence according to the method until all the feature models are subjected to joint training, so that a plurality of target models can be obtained, and the target models can be used as a plurality of final feature extraction models for image recognition.

For example, according to the first to third steps, the sample image X and its label Y in the data set D may be input to the convolutional neural network model N₁The weight parameter of the model is theta₁X in the network N₁The initial feature vector F1 is obtained by the forward calculation, a classification function c and a weight parameter gamma are also needed, and the prediction label is obtained by the initial feature vector F1 through the classification function c

As shown in equation (1):

obtaining the loss according to a predetermined loss function L

Calculating a weight parameter gamma and a weight parameter theta₁Gradient of

And

and to the weight parameters gamma and theta₁Updating the updated weight parameters

And

as shown in equation (2):

next, the convolutional neural network model N may be updated after the target model, i.e., the weight parameters are adjusted₁On the basis, a feature fusion method is combined, and a feature model 2, namely a convolutional neural network model N after weight parameters are adjusted, is added and trained₂. When training, firstly fixing all weight parameters theta of the finished target model₁Adjusting only network N₂Weight parameter θ of₂Then, the two are adjusted together to obtain the optimal weight parameter at present. In particular, the sample image X and its label Y, network N, are iteratively input₁The weight parameter is theta₁Network N₂The weight parameter is theta₂A classification function c, a weight parameter gamma, for the sample image X in the network N₁The initial characteristic vector F1 is obtained by forward calculation, and the sample image X is processed in the network N₂The forward calculation is performed to obtain an initial feature vector F2, which is shown in equation (3):

furthermore, a plurality of initial feature vectors of the obtained sample image can be fused according to a feature fusion algorithm to obtain a prediction label

Obtaining the loss from a loss function L

Next, the network N can be fixed₁Weight parameter θ of₁Constant, calculate gamma, theta₂Gradient of gradient

And

and the weight parameters gamma, theta₂Is updated to

And

until convergence; then simultaneously updating the parameters gamma, theta₁，θ₂Until convergence. Network N continuously after updating weight parameters₁，N₂，……，N_xOn the basis of (2), adding a characteristic model N_x+1And performing model training until an optimal result or the maximum number of feature extraction networks is achieved, wherein the final flow structure is shown in fig. 2, and the obtained multiple target models are used as the final multiple feature extraction models to perform feature extraction on the image to be recognized.

After determining the plurality of feature extraction models, the step S110 may be executed to perform feature extraction on the image to be recognized through the plurality of feature extraction models, specifically, forward calculation may be performed on the image to be recognized in the plurality of feature extraction models to obtain a plurality of first feature vectors, such as f1, f2, … fn, of the image to be recognized, and the plurality of first feature vectors are fused to obtain a second feature vector. The feature fusion algorithm may include summing all the first feature vectors to obtain a second feature vector, where the second feature vector may be a fused feature vector f ═ (f ═ f₁,f₂,...,f_x+1). In addition to thisBesides, other fusion algorithms such as multiplication, averaging, maximum and minimum taking and the like can be adopted to fuse the plurality of first feature vectors to obtain a second feature vector. It should be noted that the second feature vector can be regarded as an optimal feature vector of the image to be recognized. Through training a plurality of feature models and fusing a plurality of obtained first feature vectors, the method has global optimality, can improve the accuracy of the extracted feature vectors, and further improves the image recognition effect.

Next, in step S220, the second feature vector is matched with preset feature vectors of a plurality of reference images, so as to determine one of the preset feature vectors as a target feature vector.

In this exemplary embodiment, when a user logs in a certain application platform or a website by using face recognition, the face image of the user to be logged in may be used as an image to be recognized, and the face images of all users who have registered the application platform or the website may be used as reference images. And simultaneously, the obtained feature extraction model can be used for extracting features of all reference images to obtain a plurality of preset feature vectors. Then, the second feature vector of the image to be recognized and the preset feature vectors of the multiple reference images can be matched and calculated, and the preset feature vector of the reference image which is successfully matched is used as the target feature vector.

The specific matching process comprises the following steps: calculating the similarity between the second feature vector and a plurality of preset feature vectors in a database; and determining the preset feature vector with the similarity degree with the second feature vector larger than a preset threshold value as the target feature vector. The similarity may be represented by calculating an euclidean distance, or may be represented by a cosine similarity or the like. The specific value of the preset threshold value can be set according to actual requirements.

And if the Euclidean distance between the second feature vector and the preset feature vector is smaller than the preset distance, determining that the image to be recognized is the same as the reference image in the database, and the image to be recognized and the reference image belong to the same person, determining that the image to be recognized and the reference image are successfully matched at the moment, and taking the preset feature vector corresponding to the successfully matched reference image as the target feature vector.

After the image to be recognized is successfully matched with the reference image, the image to be recognized can be recognized according to the target characteristic vector of the reference image, and a recognition result is determined. For example, the second feature vector of the image a to be recognized is fa, the preset feature vector of the reference image B is fb, and the preset distance is 1 unit length, and if the euclidean distance between fa and fb is less than 1 unit length, it is determined that the image a to be recognized and the reference image B belong to the same person, and the user corresponding to the image a to be recognized can successfully log in the application platform in a face recognition manner.

Through steps S210 to S230 in the present exemplary embodiment, the accuracy of image processing can be improved, thereby ensuring system safety.

The present disclosure also provides an image processing apparatus. Referring to fig. 4, the image processing apparatus 400 may include:

the feature extraction module 401 may be configured to perform feature extraction on an image to be recognized through a plurality of feature extraction models to obtain a plurality of first feature vectors, and fuse the plurality of first feature vectors to obtain a second feature vector;

a matching control module 402, configured to match the second feature vector with preset feature vectors of multiple reference images, so as to determine one of the preset feature vectors as a target feature vector;

the identification control module 403 may be configured to determine an identification result of the image to be identified according to the target feature vector.

It should be noted that, the specific details of each module in the image processing apparatus have been described in detail in the corresponding image processing method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 that couples various system components including the memory unit 520 and the processing unit 510.

Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 510 may perform the steps as shown in fig. 2: in step S210, performing feature extraction on an image to be recognized through a plurality of feature extraction models to obtain a plurality of first feature vectors, and fusing the plurality of first feature vectors to obtain a second feature vector; in step S220, the second feature vector is matched with preset feature vectors of a plurality of reference images to determine one of the preset feature vectors as a target feature vector, and in step S230, an identification result of an image to be identified is determined according to the target feature vector.

The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.

Storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 500 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. As shown, the network adapter 560 communicates with the other modules of the electronic device 500 over the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

Referring to fig. 6, a program product 700 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. An image processing method, comprising:

performing feature extraction on an image to be recognized through a plurality of feature extraction models to obtain a plurality of first feature vectors, and fusing the plurality of first feature vectors to obtain a second feature vector;

matching the second feature vector with preset feature vectors of a plurality of reference images to determine one of the preset feature vectors as a target feature vector;

determining the recognition result of the image to be recognized through the target feature vector;

the method for extracting the features of the image to be recognized through the plurality of feature extraction models to obtain a plurality of first feature vectors comprises the following steps:

performing joint training on a plurality of initial feature models according to the sample images and the initial labels of the sample images to obtain a plurality of feature extraction models;

respectively extracting the features of the image to be identified through the feature extraction models to obtain a plurality of first feature vectors associated with the feature extraction models;

training a plurality of initial feature models according to the sample images and the initial labels of the sample images to obtain a plurality of feature extraction models comprises:

training a plurality of convolutional neural network models according to the sample images and the initial labels of the sample images to obtain a plurality of feature models;

selecting one of the characteristic models as a target model, and superposing another characteristic model of the characteristic models into the target model for joint training to obtain a preset model;

and taking the preset model as the target model, continuing to perform joint training with the rest models in the plurality of feature models until all the feature models are subjected to joint training, and taking the plurality of target models as the plurality of feature extraction models.

2. The image processing method of claim 1, wherein training a plurality of convolutional neural network models according to the sample image and the initial label of the sample image to obtain a plurality of feature models comprises:

inputting the sample image and the initial label into a plurality of convolutional neural network models to obtain a plurality of initial feature vectors;

classifying the plurality of initial feature vectors to obtain a prediction label;

and updating the weight parameters of each convolutional neural network model through the initial label and the prediction label to obtain the plurality of feature models.

3. The image processing method of claim 2, wherein updating the weight parameters of each of the convolutional neural network models by the initial labels and the predictive labels comprises:

and carrying out backward calculation on the initial label and the prediction label in each convolutional neural network model so as to update the weight parameter of each convolutional neural network model.

4. The image processing method according to claim 1, wherein selecting one of the feature models as a target model, and superimposing another feature model of the feature models on the target model for joint training to obtain a preset model comprises:

locking the weight parameters of the target model;

adjusting the weight parameter of the other characteristic model to obtain a target weight parameter;

and jointly adjusting the target weight parameters and the weight parameters of the target model to obtain the preset model.

5. The image processing method according to claim 1, wherein performing feature extraction on the image to be recognized through the plurality of feature extraction models respectively to obtain a plurality of first feature vectors associated with each of the feature extraction models comprises:

and performing forward calculation on the image to be identified in each feature extraction model to obtain a plurality of first feature vectors of the image to be identified.

6. The image processing method according to claim 1, wherein fusing the plurality of first feature vectors to obtain a second feature vector comprises:

and performing fusion calculation on the plurality of first feature vectors to obtain the second feature vector.

7. The image processing method according to claim 1, wherein matching the second feature vector with a plurality of preset feature vectors and determining one of the preset feature vectors as a target feature vector comprises:

calculating the similarity between the second feature vector and a plurality of preset feature vectors in a database;

and determining the preset feature vector with the similarity degree with the second feature vector larger than a preset threshold value as the target feature vector.

8. The image processing method according to claim 7, wherein determining the preset feature vector having a similarity greater than a preset threshold with the second feature vector as the target feature vector comprises:

calculating the Euclidean distance between the second feature vector and the preset feature vector;

if the Euclidean distance is smaller than a preset distance, determining that the image to be recognized is the same as a reference image in a database;

and taking the preset feature vector corresponding to the reference image as the target feature vector.

9. An image processing apparatus characterized by comprising:

the model training module is used for training a plurality of convolutional neural network models according to the sample images and the initial labels of the sample images to obtain a plurality of characteristic models; selecting one of the characteristic models as a target model, and superposing another characteristic model of the characteristic models into the target model for joint training to obtain a preset model; taking the preset model as the target model, continuing to perform joint training with the rest models in the plurality of feature models until all the feature models are subjected to joint training, and taking the plurality of target models as a plurality of feature extraction models;

the feature extraction module is used for respectively extracting features of the image to be recognized through the feature extraction models to obtain a plurality of first feature vectors associated with the feature extraction models, and fusing the first feature vectors to obtain a second feature vector;

the matching control module is used for matching the second characteristic vector with preset characteristic vectors of a plurality of reference images so as to determine one of the preset characteristic vectors as a target characteristic vector;

and the identification control module is used for determining the identification result of the image to be identified through the target characteristic vector.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the image processing method of any one of claims 1-8 via execution of the executable instructions.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the image processing method of any one of claims 1 to 8.