CN113255441A - Image processing method, image processing apparatus, electronic device, and medium - Google Patents

Image processing method, image processing apparatus, electronic device, and medium Download PDF

Info

Publication number
CN113255441A
CN113255441A CN202110396669.7A CN202110396669A CN113255441A CN 113255441 A CN113255441 A CN 113255441A CN 202110396669 A CN202110396669 A CN 202110396669A CN 113255441 A CN113255441 A CN 113255441A
Authority
CN
China
Prior art keywords
network model
student
network
features
teacher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110396669.7A
Other languages
Chinese (zh)
Inventor
门鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN202110396669.7A priority Critical patent/CN113255441A/en
Publication of CN113255441A publication Critical patent/CN113255441A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an image processing method, an image processing device, an electronic device and a medium, wherein the method comprises the following steps: performing feature extraction on the base map through one of the first network model and the second network model to obtain base map features; performing feature extraction on the image to be compared through the other one of the first network model and the second network model to obtain features to be compared; comparing the features to be compared with the base map features to obtain a comparison result; wherein the first network model and the second network model have different numbers of layers and/or network parameters; the first network model and the second network model satisfy at least one of the following conditions: the first network model and the second network model are distilled from the third network model; the second network model is distilled from the first network model. The invention realizes the feature comparability of the second network model and the first network model and solves the problem that the features of a plurality of models are not comparable.

Description

Image processing method, image processing apparatus, electronic device, and medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a medium.
Background
At present, the face recognition technology mainly relies on a deep neural network model to encode face pictures, and measures whether two pictures are the same person according to the distance between the encoded features. A neural network model is trained on a large number of face data sets in advance, and the output features of the neural network are optimally constrained to meet the requirement that the feature distance of the same face is closer during training. For a trained network, the network parameters are fixed, and the network performs complex nested nonlinear conversion on input parameters to encode pictures into features.
The existing deep neural network model can ensure that the closer constraint of the same face feature is met under the respective unique conversion, but the coding features of different models to the same person do not meet the closer constraint of the same face feature in a feature space. Therefore, whether different pictures to be compared are the same person or not needs to use the same model to extract features of the different pictures. When the face recognition model of the system is upgraded, the method needs to re-extract the features of the upgraded model for the historical data, consumes a great deal of computing power, and the pictures need to be transmitted to a machine for deploying the model to extract the features, so that a great deal of network bandwidth consumption is introduced into a high-density video stream scene.
Disclosure of Invention
In view of the above problems, embodiments of the present invention are proposed to provide an image processing method, apparatus, electronic device, and medium that overcome or at least partially solve the above problems.
According to a first aspect of embodiments of the present invention, there is provided an image processing method, including:
performing feature extraction on the base map through one of the first network model and the second network model to obtain base map features;
performing feature extraction on the image to be compared through the other one of the first network model and the second network model to obtain features to be compared;
comparing the features to be compared with the base map features to obtain a comparison result;
wherein the first network model and the second network model have different numbers of layers and/or network parameters;
the first network model and the second network model satisfy at least one of the following conditions:
the first network model and the second network model are distilled from the third network model;
the second network model is distilled from the first network model.
According to a second aspect of embodiments of the present invention, there is provided an image processing apparatus including:
the base map feature extraction module is used for extracting features of the base map through one of the first network model and the second network model to obtain base map features;
the to-be-compared feature extraction module is used for extracting features of the to-be-compared image through the other one of the first network model and the second network model to obtain to-be-compared features;
the characteristic comparison module is used for comparing the characteristics to be compared with the characteristics of the base map to obtain a comparison result;
wherein the first network model and the second network model have different numbers of layers and/or network parameters;
the first network model and the second network model satisfy at least one of the following conditions:
the first network model and the second network model are distilled from the third network model;
the second network model is distilled from the first network model.
According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including: a processor, a memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, implements the image processing method as described in the first aspect.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method according to the first aspect.
The image processing method, the device, the electronic equipment and the medium provided by the embodiment of the invention perform feature extraction on the base map by using one of the first network model and the second network model to obtain base map features, perform feature extraction on an image to be compared by using the other one of the first network model and the second network model to obtain features to be compared, because the first network model and the second network model are obtained by distilling from the third network model or the second network model is obtained by distilling from the first network model, the projection spaces of the first network model, the second network model and the third network model are consistent, the features extracted by using different network models are comparable, the problem that the features extracted by using a plurality of models are not comparable is solved, and the feature extraction of historical data is not required when the models before and after being upgraded have a distilling relation, saving computing power and reducing bandwidth occupation.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
FIG. 1 is a flowchart illustrating steps of an image processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of the steps of a training process for a student network model in an embodiment of the invention;
fig. 3 is a block diagram of an image processing apparatus according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
Fig. 1 is a flowchart illustrating steps of an image processing method according to an embodiment of the present invention, where as shown in fig. 1, the method may include:
and 101, performing feature extraction on the base map through one of the first network model and the second network model to obtain base map features.
Wherein the base map is an image stored in a base library. For each base map, one of the first network model and the second network model can be used for extracting features in advance to obtain base map features, and the map features and the base map are correspondingly stored. For example, for the face features, feature extraction may be performed on the face images in the base library in advance, and the extracted face features and face information (e.g., face images, face IDs, etc.) corresponding to the extracted face features are stored correspondingly, so as to be used for retrieving or identifying the captured images to be compared.
And 102, extracting the features of the image to be compared through the other one of the first network model and the second network model to obtain the features to be compared.
And when the features of the image to be compared are extracted, the other network model of the first network model and the second network model is used for extracting the features of the image to be compared to obtain the features to be compared.
Wherein the first network model and the second network model have different numbers of layers and/or network parameters. The method comprises the steps that internal resources of terminal equipment are limited, a network model with a smaller scale in a first network model and a second network model can be deployed in the terminal equipment, after the terminal equipment obtains an image to be compared, the network model with the smaller scale in the first network model and the second network model can be used for carrying out feature extraction on the image to be compared to obtain a feature to be compared, the feature to be compared is sent to a server, the feature to be compared is compared with a base map feature by the server, sending of the image to be compared can be avoided, and consumption of network bandwidth is reduced. Of course, the first network model and the second network model may also be models deployed in the server at the same time.
And 103, comparing the features to be compared with the base map features to obtain a comparison result.
Wherein the first network model and the second network model satisfy at least one of the following conditions:
the first network model and the second network model are distilled from the third network model;
the second network model is distilled from the first network model.
The second network model may be an upgraded model of the first network model or a model derived from the first network model, for example, the first network model is a model with more layers and more parameters for the server, and a second network model with fewer layers and less parameters for the terminal may be obtained by distilling from the first network model.
Alternatively, the first network model and the second network model are both upgraded or derived from a third network model, for example, the third network model is a model with more layers and more parameters for the server, and the first and second network models with fewer layers and less parameters for the terminal can be obtained by distilling the third network model.
In the embodiment of the present application, firstly, since the first network model and the second network model are obtained by distilling from the third network model, or the second network model is obtained by distilling from the first network model, the base map feature and the feature to be compared, which are extracted by using the first network model and the second network model, respectively, have comparability. Thus, when one of the network model for extracting the features of the image to be compared and the network model for extracting the features of the base map is upgraded or derived, as long as the network model for extracting the features of the image to be compared and the network model for extracting the features of the base map satisfy a certain distillation relationship, the features of the image to be compared and the features of the base map can still be extracted respectively by the two models, and a comparison result with acceptable accuracy is obtained (for example, the comparison result may be the features of the base map matched with the features to be compared, or may not be matched with the corresponding features of the base map). That is to say, the embodiment of the present invention enables the features extracted by different models to be compared, and breaks through the limitation that the feature to be compared and the base map feature must be extracted by a single model. On the contrary, if the first network model and the second network model do not have the above distillation relationship, even if the image to be compared and the base map belong to the same object, the features to be compared are not comparable to the features of the base map, and the features to be compared are compared with all the features of the base map, so that the comparison result is definitely not matched.
In addition, the embodiment of the application realizes that the characteristics extracted by using different network models are all comparable, and solves the problem that the characteristics extracted by using a plurality of models are not comparable, so that the characteristics do not need to be re-extracted from historical data when a distillation relation exists between the model before upgrading and the model after upgrading, the calculation power is saved, and the bandwidth occupation caused by the transmission of the re-extracted characteristics is reduced.
In one embodiment of the present invention, when both the first network model and the second network model are distilled from a third network model, the first network model, the second network model, and the third network model share a projection basis; the first network model and the second network model share a projection basis when the second network model is distilled from the first network model. Therefore, the coordinates of the projection of each dimension of the features extracted by the first network model and the features extracted by the second network model in the feature space correspond to each other, so that the comparable degree of the first network model and the second network model can be further improved, and the comparison accuracy of the image to be compared and the base map can be improved.
In one embodiment of the present invention, when both the first network model and the second network model are distilled from a third network model, the last layer of the first network model, the second network model and the third network model is a fully connected layer with the same network parameters; when the second network model is distilled from the first network model, the last layer of the first network model and the second network model is a fully connected layer with the same network parameters. In this embodiment, the network model shares the projection basis by using the full connection layers with the same network parameters, so that the projection spaces of the first network model, the second network model and the third network model are consistent and the projection basis is shared, thereby further improving the degree of comparability of the characteristics of the first network model and the second network model, and further improving the comparison accuracy between the image to be compared and the base map.
On the basis of the technical scheme, the image processing method further comprises a training process of the student network model. Fig. 2 is a flowchart of steps of a training process of a student network model in an embodiment of the present invention, and as shown in fig. 2, the training process of the student network model may include:
step 201, acquiring network parameters of the final full connection layer of the trained teacher network model.
Wherein the teacher network model may be the first network model or the third network model. The teacher network model is a trained network model and can be used as a basis for distilling other network models.
And 202, assigning network parameters of the last full connection layer of the teacher network model to the last full connection layer of the student network model.
The teacher network model and the student network model are both feature extraction models, and of course, may be other feature extraction models. The teacher network model is an existing network model and is trained. The student network model is a model to be trained, and a model which is comparable with the teacher network model in characteristics can be obtained through training. The teacher network model and the student network model are both provided with full connection layers.
When a student network model with characteristics comparable to those of the teacher network model is required to be obtained under the condition of the existing teacher network model, a new neural network model can be constructed firstly to serve as the student network model, and the structure of the student network model is not limited by the teacher network model and can be a neural network with any size structure. The network model outputs characteristics through the last full connection layer, so that the last full connection layer in the network model has a characteristic mapping function, in order to enable the student network model and the teacher network model to share a characteristic space, network parameters of the last full connection layer of the teacher network model are assigned to the last full connection layer of the student network model, the network parameters of the last full connection layer of the student network model are the same as those of the last full connection layer of the teacher network model, and the student network model with the same characteristic space as the teacher network model can be obtained conveniently.
Step 203, distilling and training the student network model according to the teacher network model until a trained student network model is obtained; and keeping the network parameters of the last full-connection layer of the student network model unchanged in the training process.
When the first network model and the second network model are distilled from a third network model, the teacher network model is the third network model, and the trained student network models are the first network model and the second network model; when the second network model is obtained by distillation from the first network model, the teacher network model is the first network model, and the trained student network model is the second network model.
After the student network model is constructed, the network parameters except the last full connection layer in the student network model can be initialized, and then the student network model is distilled and trained based on the teacher network model and the training samples, so that the student network model learns the feature extraction behavior of the teacher network model, thereby improving the similarity between the feature space of the student network model and the feature space of the teacher network model, so that the student network model and the teacher network model have homology and certain compatibility, in the training process, the network parameters of the last full connection layer of the student network model are kept unchanged, the feature projection space of the student network model is consistent with that of the teacher network model, so that the semantics represented by the extracted features of the student network model and the extracted features of the teacher network model in each dimension are consistent after the student network model is trained.
In an embodiment of the present invention, the training, by distilling, on the basis of the teacher network model, the student network model until a trained student network model is obtained includes:
inputting the training sample into a teacher network model to obtain a first characteristic;
training the student network model according to the training samples; wherein, training the student network model according to the training sample comprises:
inputting the training sample into a student network model to obtain a second characteristic;
determining a loss function value of the student network model according to the labeled data and the second characteristics of the training sample;
determining a distillation loss value of the student network model relative to the teacher network model according to the first characteristic and the second characteristic;
updating network parameters of the student network model except the last full connection layer according to the loss function value and the distillation loss value;
and circularly executing the operation of training the student network model according to the training samples until the network parameters of the student network model are converged to obtain the trained student network model.
Wherein the Loss function comprises a triple Loss (triple Loss) function or a contrast Loss (contrast Loss) function. The distillation loss value may be the distance between the first feature and the second feature, which may be, for example, the L2 distance, although other distances are possible.
When a teacher network model is used for carrying out distillation training on a student network model, firstly, one or a plurality of training samples are selected from a training sample set, the selected training samples are respectively input into the teacher network model and the student network model, forward derivation is carried out on the training samples through the teacher network model to obtain a first characteristic, and forward derivation is carried out on the training samples through the student network model to obtain a second characteristic; calculating a loss function value of the student network model according to the labeled data and the second characteristics of the training samples, and calculating a distillation loss value of the student network relative to the teacher network model according to the first characteristics and the second characteristics; updating network parameters of the student network model except the last full connection layer by combining the loss function value and the distillation loss value, and ensuring that the network parameters of the last full connection layer are unchanged; and selecting the next training sample or the next batch of training samples, and executing the training operation on the student network model until the network parameters of the student network model are converged to obtain the trained student network model. In the training process of the student network model, the network parameters of the last full connection layer of the student network model are always kept unchanged, namely the network parameters of the last full connection layer of the student network model are always kept the same as the network parameters of the last full connection layer of the teacher network model, so that the student network model can share a feature space with the teacher network model, and the second feature can be constrained to be similar to the first feature as much as possible through a distillation loss value.
In one embodiment of the present invention, updating the network parameters of the student network model except the last layer of the fully-connected layer according to the loss function value and the distillation loss value comprises: determining an overall loss value according to the loss function value and the distillation loss value; and updating the network parameters of the student network model except the last full connection layer according to the overall loss value.
And performing integral calculation on the loss function value and the distillation loss value to obtain an integral loss value, and updating network parameters of the student network model except the last full connection layer according to the integral loss value. The training target of the student network model is guaranteed through the loss function value, the second characteristic output by the student network model is guaranteed to be similar to the first characteristic output by the teacher network model as far as possible through distillation loss, and therefore the trained student network model and the teacher network model have characteristic comparability.
In one possible embodiment, determining an overall loss value based on the loss function value and the distillation loss value optionally comprises:
(ii) taking the sum of the loss function value and the distillation loss value as the overall loss value; or
And according to the preset weight of the loss function value and the preset weight of the distillation loss value, carrying out weighted summation on the loss function value and the distillation loss value to obtain an overall loss value.
When the overall loss value corresponding to the loss function value and the distillation loss value is determined, the sum of the loss function value and the distillation loss value can be used as the overall loss value, or a user can preset the weights of the loss function value and the distillation loss value according to needs to obtain the preset weight of the loss function value and the preset weight of the distillation loss value, so that the loss function value and the distillation loss value can be weighted and summed according to the preset weight of the loss function value and the preset weight of the distillation loss value, and the weighted and summed result is used as the overall loss value.
In one scenario, the teacher network model may be a network model that needs to be upgraded, and the teacher network model has performed feature extraction on the base pictures, in order to avoid extracting the characteristics of the image of the base library again by using the upgraded model, when the teacher network model is upgraded, a student network model can be constructed, distillation training is carried out on the student network model according to the teacher network model by the method of the embodiment of the application, the network parameters of the last fully-connected layer of the student network model are ensured to be consistent with the network parameters of the last fully-connected layer of the teacher network model, therefore, the characteristic comparability of the student network model and the teacher network model can be realized, the student network model does not need to be used for extracting the characteristics of the image of the base library again, and the captured image can be directly compared with the characteristics of the image of the base library extracted by the teacher network model after the student network model is used for extracting the characteristics of the captured image.
In another scenario, the teacher network model may also be a large model with higher precision, which has higher requirement on resources and is not suitable for being deployed in the user terminal, but when the features of the picture acquired by the user terminal are to be extracted, the picture acquired by the user terminal is transmitted to a background server for feature extraction in the prior art, which consumes a large amount of network bandwidth, in order to solve the problem, a small model sharing a feature space with the teacher network model is obtained through distillation training by the method of the embodiment of the present invention, the small model occupies fewer resources and can be deployed in the user terminal, so that the user terminal can directly extract features from the acquired picture, only the extracted features are transmitted to the background server, and the background server can compare the features transmitted back from the user terminal with the features in the images in the bottom library extracted by the large model, this reduces the consumption of network bandwidth and the two models are also feature comparable.
When a student network model is trained, network parameters of a final full connection layer of a teacher network model after training are obtained, the network parameters of the final full connection layer of the teacher network model are assigned to the final full connection layer of the student network model, the student network model is subjected to distillation training according to the teacher network model, and the network parameters of the final full connection layer of the student network model are kept unchanged in the training process, so that the student network model and the teacher network model share a projection base, each dimension of output characteristics of the student network model after training is corresponding to the dimension of the output characteristics of the teacher network model in a characteristic space, the characteristic comparability of the student network model and the teacher network model is realized, the problem of incomparability of characteristics extracted by a plurality of models is solved, and historical data does not need to be extracted again when a distillation relationship exists between the model before upgrading and the model after upgrading The characteristics save the calculation power, and the student network model can be deployed on a machine for capturing pictures to directly extract the characteristics of the pictures, and only the extracted characteristics need to be transmitted, so that the consumption of network bandwidth is reduced.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Fig. 3 is a block diagram of an image processing apparatus according to an embodiment of the present invention, and as shown in fig. 3, the image processing apparatus may include:
the base map feature extraction module 301 is configured to perform feature extraction on a base map through one of the first network model and the second network model to obtain base map features;
a to-be-compared feature extraction module 302, configured to perform feature extraction on the to-be-compared image through the other of the first network model and the second network model to obtain a to-be-compared feature;
a feature comparison module 303, configured to compare the feature to be compared with the base map feature to obtain a comparison result;
wherein the first network model and the second network model have different numbers of layers and/or network parameters;
the first network model and the second network model satisfy at least one of the following conditions:
the first network model and the second network model are distilled from the third network model;
the second network model is distilled from the first network model.
Optionally, when the first network model and the second network model are distilled from the third network model, the last layer of the first network model, the second network model and the third network model is a full connection layer with the same network parameters; when the second network model is distilled from the first network model, the last layer of the first network model and the second network model is a fully connected layer with the same network parameters.
Optionally, the apparatus further comprises:
the network parameter acquisition module is used for acquiring network parameters of the final full-connection layer of the trained teacher network model;
the network parameter assignment module is used for assigning the network parameters of the last full connection layer of the teacher network model to the last full connection layer of the student network model;
the network model training module is used for carrying out distillation training on the student network model according to the teacher network model until a trained student network model is obtained; keeping the network parameters of the last full connection layer of the student network model unchanged in the training process;
when the first network model and the second network model are distilled from a third network model, the teacher network model is the third network model, and the trained student network models are the first network model and the second network model; when the second network model is obtained by distillation from the first network model, the teacher network model is the first network model, and the trained student network model is the second network model.
Optionally, the network model training module includes:
the first characteristic acquisition unit is used for inputting the training samples into the teacher network model to obtain first characteristics;
the model training unit is used for training the student network model according to training samples, and comprises:
the second characteristic obtaining subunit is used for inputting the training samples into the student network model to obtain second characteristics;
a loss value determining subunit, configured to determine, according to the labeled data and the second feature of the training sample,
determining a loss function value of the student network model;
a distillation loss value determination subunit, configured to determine, according to the first feature and the second feature, a distillation loss value of the student network model with respect to the teacher network model;
the network parameter updating subunit is used for updating the network parameters of the student network model except the last layer of full connection layer according to the loss function value and the distillation loss value;
and the training control unit is used for circularly executing the operation of training the student network model according to the training samples until the network parameters of the student network model are converged to obtain the trained student network model.
Optionally, the network parameter updating subunit includes:
the integral loss value determining submodule is used for determining an integral loss value according to the loss function value and the distillation loss value;
and the network parameter updating submodule is used for updating the network parameters of the student network model except the last full connection layer according to the overall loss value.
Optionally, the overall loss value determining submodule is specifically configured to:
(ii) taking the sum of the loss function value and the distillation loss value as the overall loss value; or
And according to the preset weight of the loss function value and the preset weight of the distillation loss value, carrying out weighted summation on the loss function value and the distillation loss value to obtain an overall loss value.
Optionally, the loss function includes a triplet loss function or a contrast loss function.
The image processing apparatus provided in this embodiment performs feature extraction on the base map by using one of the first network model and the second network model to obtain base map features, performs feature extraction on the image to be compared by using the other of the first network model and the second network model to obtain features to be compared, and since the first network model and the second network model are both distilled from the third network model or the second network model is distilled from the first network model, projection spaces of the first network model, the second network model, and the third network model are consistent, the features extracted by using different network models are comparable, the problem that features extracted by using multiple models are not comparable is solved, and thus it is not necessary to re-extract features from historical data when there is a distillation relationship between a model before and a model after upgrade, saving computing power and reducing bandwidth occupation.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Further, according to an embodiment of the present invention, there is provided an electronic device, which may be a computer, a server, or the like, including: a processor, a memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, implements the image processing method of the aforementioned embodiments.
According to an embodiment of the present invention, there is also provided a computer readable storage medium including, but not limited to, a disk memory, a CD-ROM, an optical memory, etc., having stored thereon a computer program which, when executed by a processor, implements the image processing method of the foregoing embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The image processing method, the image processing apparatus, the electronic device and the medium according to the present invention are described in detail above, and the principles and embodiments of the present invention are described herein by applying specific examples, and the descriptions of the above examples are only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (11)

1. An image processing method, comprising:
performing feature extraction on the base map through one of the first network model and the second network model to obtain base map features;
performing feature extraction on the image to be compared through the other one of the first network model and the second network model to obtain features to be compared;
comparing the features to be compared with the base map features to obtain a comparison result;
wherein the first network model and the second network model have different numbers of layers and/or network parameters;
the first network model and the second network model satisfy at least one of the following conditions:
the first network model and the second network model are distilled from the third network model;
the second network model is distilled from the first network model.
2. The method of claim 1, wherein the first network model, the second network model, and the third network model share a projection basis when both the first network model and the second network model are distilled from the third network model; the first network model and the second network model share a projection basis when the second network model is distilled from the first network model.
3. The method of claim 2, wherein when both the first network model and the second network model are distilled from a third network model, the last layer of the first network model, the second network model, and the third network model is a fully connected layer having the same network parameters; when the second network model is distilled from the first network model, the last layer of the first network model and the second network model is a fully connected layer with the same network parameters.
4. The method of claim 3, further comprising:
acquiring network parameters of a final full-connection layer of the trained teacher network model;
assigning the network parameters of the last full connection layer of the teacher network model to the last full connection layer of the student network model;
according to the teacher network model, carrying out distillation training on the student network model until a trained student network model is obtained; keeping the network parameters of the last full connection layer of the student network model unchanged in the training process;
when the first network model and the second network model are distilled from a third network model, the teacher network model is the third network model, and the trained student network models are the first network model and the second network model; when the second network model is obtained by distillation from the first network model, the teacher network model is the first network model, and the trained student network model is the second network model.
5. The method of claim 4, wherein said training said student network model according to said teacher network model until a trained student network model is obtained comprises:
inputting the training sample into a teacher network model to obtain a first characteristic;
training the student network model according to training samples, comprising:
inputting the training sample into a student network model to obtain a second characteristic;
determining a loss function value of the student network model according to the labeled data and the second characteristics of the training sample;
determining a distillation loss value of the student network model relative to the teacher network model according to the first characteristic and the second characteristic;
updating network parameters of the student network model except the last full connection layer according to the loss function value and the distillation loss value;
and circularly executing the operation of training the student network model according to the training samples until the network parameters of the student network model are converged to obtain the trained student network model.
6. The method of claim 5, wherein updating the network parameters of the student network model except for the last fully-connected layer according to the loss function value and the distillation loss value comprises:
determining an overall loss value according to the loss function value and the distillation loss value;
and updating the network parameters of the student network model except the last full connection layer according to the overall loss value.
7. The method of claim 6, wherein determining an overall loss value based on the loss function value and the distillation loss value comprises:
(ii) taking the sum of the loss function value and the distillation loss value as the overall loss value; or
And according to the preset weight of the loss function value and the preset weight of the distillation loss value, carrying out weighted summation on the loss function value and the distillation loss value to obtain an overall loss value.
8. The method of any of claims 5-7, wherein the loss function comprises a triplet loss function or a contrast loss function.
9. An image processing apparatus characterized by comprising:
the base map feature extraction module is used for extracting features of the base map through one of the first network model and the second network model to obtain base map features;
the to-be-compared feature extraction module is used for extracting features of the to-be-compared image through the other one of the first network model and the second network model to obtain to-be-compared features;
the characteristic comparison module is used for comparing the characteristics to be compared with the characteristics of the base map to obtain a comparison result;
wherein the first network model and the second network model have different numbers of layers and/or network parameters;
the first network model and the second network model satisfy at least one of the following conditions:
the first network model and the second network model are distilled from the third network model;
the second network model is distilled from the first network model.
10. An electronic device, comprising: processor, memory and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, implements the image processing method according to any of claims 1 to 8.
11. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the image processing method according to any one of claims 1 to 8.
CN202110396669.7A 2021-04-13 2021-04-13 Image processing method, image processing apparatus, electronic device, and medium Pending CN113255441A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110396669.7A CN113255441A (en) 2021-04-13 2021-04-13 Image processing method, image processing apparatus, electronic device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110396669.7A CN113255441A (en) 2021-04-13 2021-04-13 Image processing method, image processing apparatus, electronic device, and medium

Publications (1)

Publication Number Publication Date
CN113255441A true CN113255441A (en) 2021-08-13

Family

ID=77220671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110396669.7A Pending CN113255441A (en) 2021-04-13 2021-04-13 Image processing method, image processing apparatus, electronic device, and medium

Country Status (1)

Country Link
CN (1) CN113255441A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492793A (en) * 2022-01-27 2022-05-13 北京百度网讯科技有限公司 Model training and sample generating method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114492793A (en) * 2022-01-27 2022-05-13 北京百度网讯科技有限公司 Model training and sample generating method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108280477B (en) Method and apparatus for clustering images
CN111062871B (en) Image processing method and device, computer equipment and readable storage medium
US8144920B2 (en) Automated location estimation using image analysis
KR20210074360A (en) Image processing method, device and apparatus, and storage medium
CN110009059B (en) Method and apparatus for generating a model
JP6892606B2 (en) Positioning device, position identification method and computer program
CN110765882B (en) Video tag determination method, device, server and storage medium
CN112668608B (en) Image recognition method and device, electronic equipment and storage medium
CN113128419B (en) Obstacle recognition method and device, electronic equipment and storage medium
CN110457523B (en) Cover picture selection method, model training method, device and medium
CN113204659B (en) Label classification method and device for multimedia resources, electronic equipment and storage medium
CN112818995B (en) Image classification method, device, electronic equipment and storage medium
JP7504192B2 (en) Method and apparatus for searching images - Patents.com
CN114896067A (en) Automatic generation method and device of task request information, computer equipment and medium
CN115222845A (en) Method and device for generating style font picture, electronic equipment and medium
CN115222443A (en) Client group division method, device, equipment and storage medium
CN113435531B (en) Zero sample image classification method and system, electronic equipment and storage medium
CN108921138B (en) Method and apparatus for generating information
CN113255441A (en) Image processing method, image processing apparatus, electronic device, and medium
CN113128278B (en) Image recognition method and device
CN111783734A (en) Original edition video identification method and device
CN115756821A (en) Online task processing model training and task processing method and device
CN113822324A (en) Image processing method and device based on multitask model and related equipment
CN114581706B (en) Method and device for configuring certificate recognition model, electronic equipment and storage medium
CN113537104B (en) Intelligent prescription picture identification method and system based on Internet hospital

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination