CN114359649A - Image processing method, apparatus, device, storage medium, and program product - Google Patents

Image processing method, apparatus, device, storage medium, and program product Download PDF

Info

Publication number
CN114359649A
CN114359649A CN202111386959.XA CN202111386959A CN114359649A CN 114359649 A CN114359649 A CN 114359649A CN 202111386959 A CN202111386959 A CN 202111386959A CN 114359649 A CN114359649 A CN 114359649A
Authority
CN
China
Prior art keywords
image
hash
sample
task model
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111386959.XA
Other languages
Chinese (zh)
Other versions
CN114359649B (en
Inventor
郭卉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202111386959.XA priority Critical patent/CN114359649B/en
Publication of CN114359649A publication Critical patent/CN114359649A/en
Application granted granted Critical
Publication of CN114359649B publication Critical patent/CN114359649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides an image processing method, an image processing device, electronic equipment, a computer readable storage medium and a computer program product based on artificial intelligence; relates to artificial intelligence technology; the method comprises the following steps: performing hash processing on a first image sample in the first image field through a first image task model in the first image field to obtain a first hash sample characteristic of the first image sample; performing hash processing on a first image sample in the first image field through a second image task model in the second image field to obtain a second hash sample characteristic of the first image sample; performing feature distillation processing on the second Hash sample features based on the first Hash sample features to obtain distillation features of the second image task model; performing hash processing on a second image sample in the second image field through a second image task model to obtain a third hash sample characteristic of the second image sample; training a second image task model based on the third hashed sample features and the distillation features.

Description

Image processing method, apparatus, device, storage medium, and program product
Technical Field
The present application relates to artificial intelligence technology, and in particular, to an image processing method and apparatus based on artificial intelligence, an electronic device, a computer-readable storage medium, and a computer program product.
Background
Artificial Intelligence (AI) is a comprehensive technique in computer science, and by studying the design principles and implementation methods of various intelligent machines, the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to a wide range of fields, for example, natural language processing technology and machine learning/deep learning, etc., and along with the development of the technology, the artificial intelligence technology can be applied in more fields and can play more and more important values.
The transfer learning is one of important applications in the field of artificial intelligence, and is a machine learning method which takes an A model developed by a task A as an initial point and reuses the A model to develop a B model for a task B.
In the related art, the image task model to be migrated is trained through limited image samples, and the method easily causes the problems of poor training effect and the like.
Disclosure of Invention
The embodiment of the application provides an image processing method and device based on artificial intelligence, an electronic device, a computer readable storage medium and a computer program product, which can make full use of limited image samples in a second field and improve the effect of transfer learning.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides an image processing method based on artificial intelligence, which comprises the following steps:
performing hash processing on a first image sample in a first image field through a first image task model in the first image field to obtain a first hash sample characteristic of the first image sample;
performing hash processing on a first image sample in the first image field through a second image task model in a second image field to obtain a second hash sample characteristic of the first image sample;
performing feature distillation processing on the second Hash sample feature based on the first Hash sample feature to obtain a distillation feature of the second image task model;
performing hash processing on a second image sample in the second image field through the second image task model to obtain a third hash sample characteristic of the second image sample;
and training the second image task model based on the third hash sample characteristic and the distillation characteristic, wherein the trained second image task model is used for extracting the hash characteristic of the image to be processed in the second image field, and the hash characteristic of the image to be processed is used for executing an image task.
The embodiment of the application provides an image processing device based on artificial intelligence, includes:
the system comprises a first hash module, a second hash module and a third hash module, wherein the first hash module is used for performing hash processing on a first image sample in a first image field through a first image task model in the first image field to obtain a first hash sample characteristic of the first image sample;
the second hash module is used for carrying out hash processing on a first image sample in the first image field through a second image task model in a second image field to obtain a second hash sample characteristic of the first image sample;
the distillation module is used for carrying out characteristic distillation processing on the second Hash sample characteristic based on the first Hash sample characteristic to obtain a distillation characteristic of the second image task model;
the third hash module is used for performing hash processing on a second image sample in the second image field through the second image task model to obtain a third hash sample characteristic of the second image sample;
and the training module is used for training the second image task model based on the third Hash sample characteristics and the distillation characteristics, wherein the trained second image task model is used for extracting the Hash characteristics of the image to be processed in the second image field, and the Hash characteristics of the image to be processed are used for executing an image task.
An embodiment of the present application provides an electronic device for image processing, where the electronic device includes:
a memory for storing executable instructions;
and the processor is used for realizing the image processing method based on artificial intelligence provided by the embodiment of the application when the executable instructions stored in the memory are executed.
The embodiment of the application provides a computer-readable storage medium, which stores executable instructions for causing a processor to execute, so as to implement the artificial intelligence based image processing method provided by the embodiment of the application.
The embodiment of the present application provides a computer program product, which includes a computer program or instructions, and is characterized in that the computer program or instructions, when executed by a processor, implement the artificial intelligence based image processing method provided by the embodiment of the present application.
The embodiment of the application has the following beneficial effects:
the method comprises the steps of carrying out feature distillation on an image sample in the first image field and a first image task model to obtain distillation features of a second image task model, and training the second image task model based on the distillation features and the image sample in the second image field, so that the transfer learning effect under the limited image sample is improved by combining the feature distillation, the model training efficiency is improved, and related communication resources and computing resources are saved.
Drawings
FIG. 1 is a schematic diagram of an application scenario of an image processing system provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
3-5 are schematic flow diagrams of an artificial intelligence-based image processing method provided by an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a first image task model provided in an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a second image task model provided in an embodiment of the present application;
FIG. 8 is a schematic diagram of image retrieval provided by an embodiment of the present application;
fig. 9 is a schematic diagram of a learning framework provided in an embodiment of the present application.
Detailed Description
In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
In the following description, references to the terms "first", "second", and the like are only used for distinguishing similar objects and do not denote a particular order or importance, but rather the terms "first", "second", and the like may be used interchangeably with the order of priority or the order in which they are expressed, where permissible, to enable embodiments of the present application described herein to be practiced otherwise than as specifically illustrated and described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
1) Metric Learning (Metric Learning): by learning a similarity (distance) metric, it is meant that given some similar samples and some dissimilar samples, the similarity between otherwise similar samples is increased (or the distance is decreased) while the similarity between otherwise dissimilar samples is decreased (or the distance is increased). In the embodiment of the application, the video processing model can be trained in a metric learning mode.
2) Image recognition: it is a practical application of deep learning algorithms to process, analyze and understand images with a computer to identify various patterns of objects and objects. Image recognition is a category level recognition that is based only on the category of the object (e.g., person, dog, cat, bird, etc.) and gives the category to which the object belongs, regardless of the particular instance of the object. For example, a large generic object identifies an identification task in a source data set (imagenet), identifying which of 1000 categories a certain object is.
3) Binary quantization: for a D-dimensional feature vector (D is a positive integer), the floating point number with a value range of [ -1, 1] is obtained after vector normalization, the feature vector is compressed to a binary code with a specified number of bits (e.g. 48 bits) and a value of 0 or 1 (referred to as 48-byte compression), and the process is vector binary quantization or binary encoding.
4) Binary quantization index: the D-dimensional feature vector obtains a binary vector with limited bit through a certain calculation process (model), and the binary vector is used as an index to recall the image during retrieval.
5) Knowledge Distillation (Knowledge Distillation): a model compression method is different from pruning and quantification in model compression, knowledge distillation is to train a small light-weight model by constructing the small model and utilizing supervision information of a large model with better performance so as to achieve better performance and precision. This large model is called a teacher model, the small model is called a student model, the supervised information from the teacher model is called Knowledge (Knowledge), and the process of the student model learning to migrate the supervised information from the teacher model is called Distillation (Distillation).
In the large-scale retrieval of image rearrangement, retrieval is carried out by using binarization embedding (binarization) features (binarization features for short, namely Hash features) (the efficiency of the binarization features is higher than that of floating point), the binarization features are learned by means of a deep Hash model, a learning method is that a Convolutional Neural Network (CNN) learns through quantization loss and measurement loss, and triple sample training is required. In a service migration scenario, when a hash model trained by service a data already exists, the hash model needs to be migrated to a service B domain, and the service B is similar to the data body of the service a, but a part of newly introduced data domain exists (the service B is mixed domain data with a new domain compared with the service a). Since the new service has limited triple training samples, and the service data A and the service data B cannot be shared due to the data sensitivity problem, how to effectively learn the hash characteristic of the service B is a problem in this scenario.
In the related art, there are two schemes (scheme 1 and scheme 2) for the migration learning. Scheme 1: the method for quantizing B service data based on deep learning directly learns B service data, and learns quantization targets by adopting image triples as input, and the method cannot learn embedded features at the same time, so that after the quantization trained by the method is used for indexing recall samples, results cannot be further effectively sequenced, namely the embedded features and the quantization features are non-homologous, and the condition that recall can be performed based on the embedded features but not based on the quantization features inevitably exists; scheme 2: the a + B learning method is to use a model trained by the a service data as a pre-trained model (pre-trained model) to fine-tune the B service data.
However, the above scheme 1 has the following problems: only limited B service data are used for training a quantization model, an effective AB joint result cannot be obtained, and limited data bring a series of learning effect problems, such as poor Hash representation, less Hash bit activation and the like. The above scheme 2 has the following problems: 1) model learning is difficult, and under the condition that B service data is limited, fine tuning easily causes that the trained parameters of the original A service data are guided to a bad local optimum by the biased data of the B service data, rather than being drawn close to a better global optimum position; 2) the hash bits are difficult to activate, and because the hash bits required for representing limited data are few, the new model cannot activate all the hash bits as much as the old model, so that the problem that the utilization rate of the hash bits of the new model is not high is caused.
In order to solve the above problem, embodiments of the present application provide an image processing method and apparatus based on artificial intelligence, an electronic device, and a computer-readable storage medium, which can make full use of limited image samples in the second field to improve the effect of transfer learning.
The image processing method based on artificial intelligence provided by the embodiment of the application can be independently realized by a terminal or a server; the image processing method based on artificial intelligence can also be cooperatively realized by a terminal and a server, for example, the terminal solely undertakes the image processing method based on artificial intelligence described below, or the terminal sends an application request for an image to be processed to the server, the server executes the image processing method based on artificial intelligence according to the received application request for the image to be processed, trains a second image task model, extracts the hash feature of the image to be processed in the second image field based on the trained second image task model, and executes image tasks such as image recognition, image classification, image retrieval and the like based on the hash feature of the image to be processed.
The electronic device for image processing provided by the embodiment of the application can be various types of terminals or servers, wherein the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and an artificial intelligence platform; the terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, a smart television, a smart car device, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
Taking a server as an example, for example, the server cluster may be deployed in a cloud, and open an artificial intelligence cloud Service (AI as a Service, AIaaS) to users, the AIaaS platform may split several types of common AI services, and provide an independent or packaged Service in the cloud, this Service mode is similar to an AI theme mall, and all users may access one or more artificial intelligence services provided by the AIaaS platform by using an application programming interface.
For example, one of the artificial intelligence cloud services may be an image processing service, that is, a cloud server is packaged with the image processing program provided in the embodiment of the present application. A user calls an image processing service in a cloud service through a terminal (running a client, such as an image retrieval client and an image recognition client) to enable a server deployed in the cloud to call a packaged image processing program, performs hash processing on a first image sample (namely, A business data) in a first image field (namely, an A field) through a first image task model (namely, a trained A model) in the first image field to obtain a first hash sample characteristic of the first image sample, performs hash processing on the first image sample in the first image field through a second image task model (namely, a B model to be trained) in a second image field (namely, a B field) to obtain a second hash sample characteristic of the first image sample, performs characteristic distillation processing on the second image sample characteristic based on the first hash sample characteristic to obtain a distillation characteristic of the second image task model, the method comprises the steps of conducting Hash processing on a second image sample (namely B business data) in a second image field through a second image task model to obtain a third Hash sample characteristic of the second image sample, training the second image task model based on the third Hash sample characteristic and distillation characteristics, extracting the Hash characteristic of an image to be processed in the second image field based on the trained second image task model, and improving accuracy of an image task based on the Hash characteristic of the image to be processed, for example, improving accuracy of image retrieval based on similarity, improving accuracy of image recognition, improving image recommendation efficiency, and improving effect of image recommendation based on user behaviors in a later period.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of an image processing system 10 provided in an embodiment of the present application, a terminal 200 is connected to a server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of both.
The terminal (running a client, such as an image retrieval client and an image recognition client) may be used to obtain an application request for an image to be processed, for example, after a user opens the client running on the terminal and inputs an image to be processed, the terminal automatically obtains an application request for an image to be processed (including an image to be processed for image processing).
In some embodiments, an image processing plug-in can be implanted in a client running in the terminal, so that the image processing method based on artificial intelligence can be realized locally in the client. For example, the terminal 200 calls an image processing plug-in to implement an artificial intelligence-based image processing method, trains a second image task model based on a third hash sample characteristic and a distillation characteristic, extracts a hash characteristic of a to-be-processed image in a second image field based on the trained second image task model, and improves accuracy of an image task based on the hash characteristic of the to-be-processed image, for example, accuracy of image retrieval based on similarity is improved, accuracy of image recognition is improved, image recommendation efficiency is improved, and an effect of image recommendation based on user behavior in a later stage is improved.
As an application example, for an image recognition application, a terminal calls an image processing plug-in to implement an image processing method based on artificial intelligence, trains a second image task model, extracts a hash feature of an image to be processed in a second image field based on the trained second image task model, performs image recognition based on the hash feature of the image to be processed to obtain a category to which the image to be processed belongs (for example, an automatic classification application of an album), and classifies the image to be processed based on the category, thereby improving the efficiency and accuracy of image recognition.
In some embodiments, after the terminal obtains the application request for the image to be processed, an image processing interface of the server 100 is called (which may be provided in a cloud service form, that is, an image processing service), the server 100 is based on the application request for the image to be processed to implement an image processing method based on artificial intelligence, a second image task model is trained based on a third hash sample characteristic and a distillation characteristic, a hash characteristic of the image to be processed in a second image field is extracted based on the trained second image task model, and accuracy of an image task is improved based on the hash characteristic of the image to be processed, for example, accuracy of image retrieval based on similarity is improved, accuracy of image recognition is improved, image recommendation efficiency is improved, and an effect of image recommendation based on user behavior in a later stage is improved.
As an application example, for an image retrieval application, after a retrieval request for an image to be processed is obtained, an image processing interface of a server is called, the server is used for the retrieval request for the image to be processed to implement an artificial intelligence-based image processing method, a second image task model is trained, a hash feature of the image to be processed in a second image field is extracted based on the trained second image task model, and image retrieval is performed based on the hash feature of the image to be processed to recall a similar image similar to the image to be processed from an image library, so as to improve the accuracy of the image retrieval.
In some embodiments, the terminal or the server may implement the artificial intelligence based image processing method provided by the embodiments of the present application by running a computer program, for example, the computer program may be a native program or a software module in an operating system; can be a local (Native) Application program (APP), i.e. a program that needs to be installed in an operating system to run; or may be an applet, i.e. a program that can be run only by downloading it to the browser environment; but also an applet that can be embedded into any APP. In general, the computer programs described above may be any form of application, module or plug-in.
In some embodiments, multiple servers may be grouped into a blockchain, and the server 100 is a node on the blockchain, and there may be an information connection between each node in the blockchain, and information transmission between the nodes may be performed through the information connection. Data (for example, logic of image processing and results of image tasks) related to the artificial intelligence based image processing method provided by the embodiment of the present application may be saved on the blockchain.
The following describes a structure of an electronic device provided in an embodiment of the present application, referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device 500 provided in an embodiment of the present application, and taking an example that the electronic device 500 is a server or a terminal as an example, the electronic device 500 shown in fig. 2 includes: at least one processor 510, memory 550, at least one network interface 520, and a user interface 530. The various components in the electronic device 500 are coupled together by a bus system 540. It is understood that the bus system 540 is used to enable communications among the components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 2.
The Processor 510 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.
The memory 550 may comprise volatile memory or nonvolatile memory, and may also comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 550 described in embodiments herein is intended to comprise any suitable type of memory. Memory 550 optionally includes one or more storage devices physically located remote from processor 510.
In some embodiments, memory 550 can store data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 551 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks;
a network communication module 553 for communicating to other electronic devices via one or more (wired or wireless) network interfaces 520, the exemplary network interfaces 520 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;
in some embodiments, the artificial intelligence based image processing apparatus provided by the embodiments of the present application can be implemented in software, and fig. 2 shows an artificial intelligence based image processing apparatus 555 stored in a memory 550, which can be software in the form of programs and plug-ins, and the like, and includes the following software modules: the first hash module 5551, the second hash module 5552, the distillation module 5553, the third hash module 5554, the training module 5555, which are logical and thus can be arbitrarily combined or further split depending on the functions implemented. The functions of the respective modules will be explained below.
As described above, the artificial intelligence based image processing method provided by the embodiment of the present application can be implemented by various types of electronic devices. Referring to fig. 3, fig. 3 is a schematic flowchart of an artificial intelligence-based image processing method provided in an embodiment of the present application, and is described with reference to the steps shown in fig. 3.
In the following steps, a first image task model of a first image domain (a domain) is a model trained by image samples (i.e. a business data) of the first image domain, which is referred to as an a model for short, and a second image task model of a second image domain (B domain) is a model to be trained, which is referred to as a B model for short, wherein the second image domain may be a mixed domain including the first image domain, and a model structure of the first image task model is the same as a model structure of the second image task model. For example, if the first image field is a human image field, and the second image field is an animal image field, the first image sample is a human image sample, and the second image sample is an animal image sample, the first image task model is a model for processing a task image, and the second image task model is a model for processing an animal image.
In step 101, a first image sample in the first image domain is subjected to hash processing through a first image task model in the first image domain, so as to obtain a first hash sample characteristic of the first image sample.
For example, a first image sample in the first image field is subjected to hash processing through the trained first image task model, so as to obtain a first hash sample characteristic (i.e., a hash characteristic or a quantization characteristic) of the first image sample. It should be noted that the first image sample does not refer to one image, but refers to one type of image, i.e., a plurality of images in the first image field.
In some embodiments, hashing a first image sample of a first image domain through a first image task model of the first image domain to obtain a first hashed sample feature of the first image sample includes: executing the following processing by a first image task model of a first image domain: performing feature extraction processing on a first image sample in the first image field to obtain a first embedded sample feature of the first image sample; and carrying out quantization processing on the first embedded sample characteristic of the first image sample to obtain a first Hash sample characteristic of the first image sample.
For example, the first image task model includes a feature extraction layer and a quantization layer, the feature extraction layer included in the first image task model performs feature extraction processing on a first image sample in the first image field to obtain a first embedded sample feature (i.e., an embedded feature) of the first image sample, and the quantization layer included in the first image task model performs quantization processing on the first embedded sample feature of the first image sample to obtain a first hash sample feature (i.e., a hash feature) of the first image sample.
In some embodiments, the first image task model includes a first feature layer and a first embedding layer; the method for extracting the features of the first image sample in the first image field to obtain the first embedded sample features of the first image sample comprises the following steps: performing basic feature extraction processing on a first image sample in the first image field through a first feature layer to obtain first basic sample features of the first image sample; and carrying out embedding vector conversion processing on the first basic sample characteristic of the first image sample through the first embedding layer to obtain a first embedding sample characteristic of the first image sample.
As shown in fig. 6, the feature extraction layer includes a first feature layer (i.e., a basic feature module or the basic feature layer in fig. 6) and a first embedding layer (i.e., the embedding layer shown in fig. 6), the first feature layer performs a basic feature extraction process on a first image sample (e.g., the a service data shown in fig. 6) in a first image domain to obtain a first basic sample feature (i.e., a basic feature or a depth feature, a low-order feature subjected to a preliminary feature extraction, representing an overall feature of the image (e.g., location information, attribute information, pixel value, etc.)) of the first image sample, the first embedding layer performs an embedding vector transformation process on the first basic sample feature of the first image sample to obtain a first embedded sample feature (i.e., an embedding feature) of the first image sample, and performs a quantization process on the first embedded sample feature of the first image sample through a quantization layer, a first hashed sample feature (e.g., the hashed feature shown in fig. 6) of the first image sample is obtained.
It should be noted that, in the embodiment of the present application, the first hash sample feature may also be obtained by: performing basic feature extraction processing on a first image sample in the first image field through a first feature layer to obtain first basic sample features of the first image sample; and quantizing the first basic sample characteristics of the first image sample through a quantization layer included by the first image task model to obtain first Hash sample characteristics of the first image sample.
In step 102, a second image task model in the second image field is used to perform hash processing on the first image sample in the first image field, so as to obtain a second hash sample characteristic of the first image sample.
For example, a first image sample in the first image field is subjected to hash processing through a second image task model to be trained, so as to obtain a second hash sample characteristic (namely, a hash characteristic or a quantization characteristic) of the first image sample. It should be noted that the first image sample does not refer to one image, but refers to one type of image, i.e., a plurality of images in the first image field.
In some embodiments, the hashing a first image sample of a first image domain through a second image task model of a second image domain to obtain a second hashed sample feature of the first image sample includes: performing the following processing by a second image task model of a second image domain: performing feature extraction processing on a first image sample in the first image field to obtain a second embedded sample feature of the first image sample; and quantizing the second embedded sample characteristic of the first image sample to obtain a second Hash sample characteristic of the first image sample.
For example, the second image task model includes a feature extraction layer and a quantization layer, the feature extraction layer included in the second image task model performs feature extraction processing on the first image sample in the first image field to obtain a second embedded sample feature (i.e., an embedded feature) of the first image sample, and the quantization layer included in the second image task model performs quantization processing on the second embedded sample feature of the first image sample to obtain a second hash sample feature (i.e., a hash feature) of the first image sample.
In some embodiments, the second image task model includes a second feature layer and a second embedding layer; the method for extracting the features of the first image sample in the first image field to obtain the second embedded sample features of the first image sample comprises the following steps: performing basic feature extraction processing on the first image sample in the first image field through the second feature layer to obtain a second basic sample feature of the first image sample; and carrying out embedding vector conversion processing on the second basic sample characteristic of the first image sample through a second embedding layer to obtain a second embedding sample characteristic of the first image sample.
As shown in fig. 7, the second image task model includes a feature extraction layer including a second feature layer (i.e., a basic feature module or the basic feature layer shown in fig. 7) and a second embedding layer (i.e., the embedding layer shown in fig. 7), a first image sample (e.g., the a service data shown in fig. 7) in the first image domain is subjected to a basic feature extraction process by the second feature layer to obtain a second basic sample feature (i.e., a basic feature or a depth feature, a low-order feature subjected to a preliminary feature extraction, representing an overall feature of the image (e.g., location information, attribute information, pixel value, etc.) of the image), a second basic sample feature of the first image sample is subjected to an embedding vector conversion process by the second embedding layer to obtain a second embedded sample feature (i.e., one of the embedding features) of the first image sample, and the embedding features of the first image sample are quantized by a quantization layer, a second hashed sample feature (e.g., the hashed feature shown in fig. 7) of the first image sample is obtained.
It should be noted that, in the embodiment of the present application, the second hash sample feature may also be obtained by: performing basic feature extraction processing on the first image sample in the first image field through the second feature layer to obtain a second basic sample feature of the first image sample; and quantizing the second basic sample characteristics of the first image sample through a quantization layer included by the second image task model to obtain second Hash sample characteristics of the first image sample.
In step 103, feature distillation processing is performed on the second hash sample feature based on the first hash sample feature, so as to obtain a distillation feature of the second image task model.
For example, in order to avoid that the parameters of the trained second image task model are guided to a poor local optimum by biased sample data, the hash result (i.e. the second hash sample characteristic) of the second image task model is distilled by the hash result (i.e. the first hash sample characteristic) of the first image task model so as to quickly construct a lightweight second image task model, and the second image task model is trained by using the supervision information of the first image task model with better performance so as to achieve better performance and precision.
Referring to fig. 4, fig. 4 is a schematic flowchart of an artificial intelligence-based image processing method provided in an embodiment of the present application, and fig. 4 shows that step 103 of fig. 3 can be implemented through steps 1031 to step 1033: in step 1031, determining mutual information between the first hashed sample feature and the second hashed sample feature; at step 1032, determining the information entropy of the first hash sample characteristic; in step 1033, the difference between the mutual information and the information entropy is used as a distillation feature of the second image task model.
For example, because the second image samples in the second image domain are limited, pure training of the second image samples tends to cause the second image task model to fall into a poor local solution, it is desirable to use the first image samples to enable the second image task model to learn the second image samples while maintaining the ability of the hash metric of the first image samples to be equivalent to (or close to) that of the first image task model. The distribution of the original output is maintained by distillation through the output of these data.
For example, the first hashed sample is characterized by p (x)i) The second hash sample is characterized by q (x)i) Determining mutual information between the first hashed sample feature and the second hashed sample feature as
Figure BDA0003367439960000141
Determining an entropy of information of the first hash sample feature as
Figure BDA0003367439960000142
The distillation characteristic of the second image task model is DKL(p||q)=H(p,q)-H(p)。
When the two vectors are similar enough, the mutual information is very close to the information provided by any one vector, so that the loss of KL in training is minimized, that is, the two distributions can be close to each other, so that the second image task model learns the second image sample, and the hash metric of the first image sample keeps the capability equivalent to (or close to) that of the first image task model.
It should be noted that the distillation characteristics of the examples of the present application are not limited to DKL(p | | q) ═ H (p, q) -H (p), and other modification formulas are also possible.
In step 104, a second image sample in the second image field is subjected to hash processing through the second image task model, so as to obtain a third hash sample characteristic of the second image sample.
For example, the second image sample in the second image domain is subjected to hash processing by the second image task model to be trained, so as to obtain a third hash sample characteristic (i.e. hash characteristic or quantization characteristic) of the second image sample. It should be noted that the second image sample does not refer to one image, but refers to one type of image, i.e., a plurality of images in the second image field.
In some embodiments, the hashing a second image sample in the second image domain through a second image task model in the second image domain to obtain a third hash sample feature of the second image sample includes: performing the following processing by a second image task model of a second image domain: performing feature extraction processing on a second image sample in the second image field to obtain a third embedded sample feature of the second image sample; and carrying out quantization processing on the third embedded sample characteristic of the second image sample to obtain a third Hash sample characteristic of the second image sample.
For example, the second image task model includes a feature extraction layer and a quantization layer, the feature extraction layer included in the second image task model performs feature extraction processing on the second image sample in the second image domain to obtain a third embedded sample feature (i.e., an embedded feature) of the second image sample, and the quantization layer included in the second image task model performs quantization processing on the third embedded sample feature of the second image sample to obtain a third hash sample feature (i.e., a hash feature) of the second image sample.
In some embodiments, the second image task model includes a second feature layer and a second embedding layer; the feature extraction processing is carried out on a second image sample in the second image field, and third embedded sample features of the second image sample are obtained, and the method comprises the following steps: performing basic feature extraction processing on a second image sample in the second image field through a second feature layer to obtain a third basic sample feature of the second image sample; and carrying out embedding vector conversion processing on the third basic sample characteristic of the second image sample through the second embedding layer to obtain a third embedded sample characteristic of the second image sample.
For example, the second image task model includes a feature extraction layer including a second feature layer (i.e., a basic feature module or a basic feature layer) and a second embedding layer (i.e., an embedding layer), the second image sample in the second image domain is subjected to basic feature extraction processing by the second feature layer to obtain a third basic sample feature (i.e., a basic feature or a depth feature, a low-order feature subjected to preliminary feature extraction, representing an overall feature of the image (e.g., location information, attribute information, pixel value, etc.) of the second image sample), and the third basic sample feature of the second image sample is subjected to embedding vector transformation processing by the second embedding layer to obtain a third embedded sample feature (i.e., an embedding feature) of the second image sample.
It should be noted that, the third hash sample feature in the embodiment of the present application may also be obtained by: performing basic feature extraction processing on a second image sample in the second image field through a second feature layer to obtain a third basic sample feature of the second image sample; and quantizing the third basic sample characteristic of the second image sample through a quantization layer included by the second image task model to obtain a third Hash sample characteristic of the second image sample.
In step 105, a second image task model is trained based on the third hash sample feature and the distillation feature, wherein the trained second image task model is used for extracting the hash feature of the to-be-processed image in the second image field, and the hash feature of the to-be-processed image is used for executing the image task.
For example, in the learning process, in addition to distilling the hash result (distillation feature) of the second image task model by means of the hash result of the first image task model, the feature of the second image task model based on metric learning is maintained, and the characterization effect of the transfer learning of the second image task model is improved.
Referring to fig. 5, fig. 5 is a schematic flowchart of an artificial intelligence-based image processing method provided in an embodiment of the present application, and fig. 5 illustrates that step 105 of fig. 3 can be implemented through steps 1051 to 1053: in step 1051, a feature loss function of the second image task model is constructed based on the third hash sample feature; constructing a distillation loss function of the second image task model based on the distillation characteristics; in step 1052, performing weighted summation processing on the characteristic loss function and the distillation loss function to obtain a target loss function of the second image task model; in step 1053, the parameters of the second image task model are updated based on the target loss function, and the updated parameters of the second image task model are used as the parameters of the trained second image task model.
For example, the distillation feature is used as a distillation loss function of the second image task model, and the sum of the characteristic loss function and the distillation loss function is used as a target loss function (i.e., total loss) of the second image task model. After the value of the target loss function of the second image task model is determined based on the third hash sample characteristic and the distillation characteristic, whether the value of the target loss function exceeds a preset threshold value or not can be judged, when the value of the target loss function exceeds the preset threshold value, an error signal of the second image task model is determined based on the target loss function, error information is reversely propagated in the second image task model, and model parameters of all layers are updated in the propagation process.
Here, describing the back propagation, inputting training sample data into an input layer of a neural network model, passing through a hidden layer, finally reaching an output layer and outputting a result, which is a forward propagation process of the neural network model, because the output result of the neural network model has an error with an actual result, calculating an error between the output result and the actual value, and reversely propagating the error from the output layer to the hidden layer until the error is propagated to the input layer, in the process of the back propagation, adjusting the value of a model parameter according to the error, namely constructing a loss function according to the error between the output result and the actual value, and calculating the partial derivative of the loss function on the model parameter layer by layer to generate the gradient of the loss function on the model parameter of each layer, because the direction of the gradient indicates the direction of error expansion, the gradient of the model parameter is inverted, and the original parameter of each layer is summed, the obtained summation result is used as the updated model parameter of each layer, so that the error caused by the model parameter is reduced; and continuously iterating the process until convergence. And the second image task model is a neural network model.
In some embodiments, constructing the feature loss function of the second image task model based on the third hashed sample features includes: determining a fourth hash sample characteristic of a similar image sample of the second image sample and a fifth hash sample characteristic of a dissimilar image sample of the second image sample; constructing a triple loss function of the second image task model based on the third Hash sample characteristic, the fourth Hash sample characteristic and the fifth Hash sample characteristic; constructing a symbol quantization loss function of the second image task model based on the third hash sample characteristic and the hash sample characteristic mark of the second image sample; and carrying out weighted summation processing on the triple loss function and the symbol quantization loss function to obtain a characteristic loss function of the second image task model.
For example, the process of constructing the triple loss function is as follows: carrying out similarity processing on the third Hash sample characteristic and the fourth Hash sample characteristic to obtain a first similarity between the second image sample and the similar image sample; carrying out similarity processing on the third Hash sample characteristic and the fifth Hash sample characteristic to obtain a second similarity between the second image sample and the dissimilar image sample; and constructing a triple loss function of the second image task model based on the first similarity and the second similarity.
As an example, the formula for the triple loss function is Ltriplet=max(||xa-xp||-‖xa-xn| + α,0), where α represents the boundary (margin), | | xa-xp| represents the L2 distance (i.e., the first similarity) between the hash feature of sample a and the hash feature of sample p, | |, | xa-xn|' represents the L2 distance (i.e., the second degree of similarity) between the hash feature of sample a and the hash feature of sample n. The similarity between otherwise similar samples is increased (or distance decreased) while the similarity between otherwise dissimilar samples is decreased (or distance increased) by the triplet loss function.
E.g. symbol quantization loss (L)coding) The purpose of (a) is to make the output result either very close to 1 or very close to-1. The quantization effect is calculated by the vector output of the quantization layer of the second image task model, which targets the output (-1, 1), whereby a sign quantization is possible (i.e. an output setting smaller than 0 defines 0 and an output larger than 0 defines 1), whereby the purpose of the sign quantization penalty is to bring the output of the quantized code close to-1 or 1 (if the output is at a critical value, i.e. around 0, it is easy to cause similar features to be quantized into different codes). Thus, a sign function may be employed to produce the target code for the quantitative learning task. Then using regression loss to encode the quantizationThe output vector u is less distant from L2 of the target code b.
As an example, the symbol quantization loss function is calculated as
Figure BDA0003367439960000171
Wherein
Figure BDA0003367439960000172
bi represents the target code for each bit of the vector u and ui represents an arbitrary bit of the vector u.
It should be noted that the symbol quantization loss function of the embodiment of the present application is not limited to
Figure BDA0003367439960000173
Figure BDA0003367439960000174
Other deformation equations are also possible.
In some embodiments, before the weighted summation processing is performed on the characteristic loss function and the distillation loss function to obtain the target loss function of the second image task model, the characteristic extraction processing is performed on the second image sample in the second image field through the second image task model to obtain a third embedded sample characteristic of the second image sample; determining a fourth embedded sample feature of a similar image sample of the second image sample and a fifth embedded sample feature of a dissimilar image sample of the second image sample; constructing an embedding loss function of the second image task model based on the third embedding sample characteristic, the fourth embedding sample characteristic and the fifth embedding sample characteristic; and carrying out weighted summation processing on the characteristic loss function and the distillation loss function to obtain a target loss function of the second image task model, wherein the weighted summation processing comprises the following steps: and carrying out weighted summation processing on the embedding loss function, the characteristic loss function and the distillation loss function to obtain a target loss function of the second image task model.
For example, since the application needs to rely on a more accurate floating point signature (the range of data that a floating point signature (i.e., an embedded signature) can characterize is larger than a hash signature) for returning results, it is necessary to maintain the embedding (embedding) metric effect in the model. Therefore, the embedding loss function, the characteristic loss function and the distillation loss function are combined for joint training, so that the training effect of the second image task model is improved.
It should be noted that the process of constructing the embedding loss function is as follows: carrying out similarity processing on the third embedded sample characteristic and the fourth embedded sample characteristic to obtain a third similarity between the second image sample and a similar video sample of the second image sample; carrying out similarity processing on the third embedded sample characteristic and the fifth embedded sample characteristic to obtain a fourth similarity between the second image sample and the dissimilar video sample of the second image sample; and constructing an embedding loss function of the second image task model based on the third similarity and the fourth similarity.
As an example, the calculation formula of the embedding loss function is Lem ═ max (| | x)a-xp||-‖xa-xn| + α,0), where α represents the boundary (margin), | | xa-xp| | represents the L2 distance (i.e., the third similarity) between the embedded features of sample a and sample p, | xa-xn|' represents the L2 distance (i.e., the fourth degree of similarity) between the embedded features of sample a and the embedded features of sample n. By embedding the loss function, the similarity between otherwise similar samples is increased (or distance is decreased) while the similarity between otherwise dissimilar samples is decreased (or distance is increased).
In some embodiments, a hash activation bit of a first image task model is determined, and a hash inertia bit of a second image task model is determined; and updating the parameters corresponding to the Hash inertia bits into the parameters corresponding to the Hash activation bits of the second image task model.
It should be noted that the hash activation bit represents a hash bit with a higher activation degree among all hash bits in the first image task model, and the hash inertia bit represents a hash bit with a lower activation degree among all hash bits in the second image task model.
For example, the second image task model cannot guarantee that all hash bits are well activated due to the limited second image samples that are directly sourced. Therefore, the embodiment of the application adopts Hash weight grafting to ensure that all Hash bits of the second image task model can be well activated, so that the training effect of the second image task model is improved.
In some embodiments, the first image task model comprises a plurality of first hash bits and the second image task model comprises a plurality of second hash bits; determining a hash activation bit for a first image task model, comprising: performing the following for any first hash bit of the plurality of first hash bits: performing statistical processing based on a first hash bit on the first hash sample characteristic of the first image sample to obtain an activation proportion of the first hash bit; determining activation metric information of the first hash bit based on the activation proportion of the first hash bit; screening the plurality of first hash bits based on the activation measurement information corresponding to the plurality of first hash bits respectively to obtain hash activation bits of the first image task model; determining a hash inertia bit for the second image task model, comprising: performing the following for any second hash bit of the plurality of second hash bits: performing statistical processing based on a second hash bit on the second hash sample characteristic of the first image sample to obtain an activation proportion of the second hash bit; determining activation metric information of the second hash bit based on the activation proportion of the second hash bit; and screening the plurality of second hash bits based on the activation measurement information corresponding to the plurality of second hash bits respectively to obtain hash inertia bits of the second image task model.
Taking advantage of the above example, performing a screening process on the plurality of first hash bits to obtain hash activation bits of the first image task model includes: and performing descending sorting on the plurality of first hash bits, taking partial first hash bits sorted at the front in a descending sorting result as candidate hash activation bits of the first image task model, and performing sampling processing on the candidate hash activation bits of the first image task model to obtain the hash activation bits of the second image task model. Screening the plurality of first hash bits to obtain hash activation bits of the first image task model, including: and performing descending sorting on the plurality of first hash bits, and taking the first hash bits sorted at the front in the descending sorting result as hash activation bits of the second image task model. Screening the plurality of second hash bits to obtain hash inertia bits of the second image task model, including: and performing descending sorting on the plurality of second hash bits, and taking the second hash bits sorted in the descending sorting result as hash inertia bits of the second image task model.
For example, determining activation metric information for the first hash bit based on the activation ratio of the first hash bit includes: determining an inactive proportion of the first hash bits based on the active proportion of the first hash bits; and carrying out measurement processing on the first hash bit based on the activation proportion of the first hash bit and the non-activation proportion of the first hash bit to obtain the activation measurement information of the first hash bit.
As an example, a first image sample is reasoned by a second image task model, the output of all quantized layers is collected and mapped to-1 or 1 through a sign function, and NA hash vectors Hb of 1x256 are total provided that the first image sample has NA images.
For the NA Hb vectors, the activation ratio and the metric effect of each bit in the 256-bit hash bits can be obtained, for example, for the 1 st bit hash bit, the number M of activated first bits in all the NA vectors is counted, and then the activation ratio p of the first bit hash bit is obtainedaM/NA, and calculates activation metric information H of the bithash=-pa*log(1-pa)。
For 256HhashThe activation effect of (2) is sorted from big to small, and the smallest 10 hash bits are taken as the hash inertia bits of the second image task model.
And reasoning the first image sample by using a first image task model, collecting the output of all quantized layers, mapping the output to-1 or 1 by a sign function, and assuming that the first image sample has NA images, sharing NA hash vectors Hb of 1x 256. For the NA Hb vectors, the activation ratio and the measurement effect of each bit in 256-bit hash bits can be obtained, the first 50 hash bits with the best measurement effect are found and are called Ha _50, and 10 weight parameters are randomly extracted from the Ha _50 to serve as hash activation bits of the first image task model.
It should be noted that the present application is illustrativeThe calculation formula of the activation metric information of the embodiment is not limited to Hhash=-pa*log(1-pa) Other deformation formulas are also possible.
In some embodiments, after the second image task model is trained based on the third hash sample feature and the distillation feature, the trained second image task model is obtained, the hash feature of the to-be-processed image in the second image domain is extracted based on the trained second image task model, and the image task, such as image recognition, image classification, image retrieval and the like, is performed based on the hash feature of the to-be-processed image.
In summary, the image processing method based on artificial intelligence provided by the embodiment of the application has the following beneficial effects: the method comprises the steps of carrying out feature distillation on an image sample in the first image field and a first image task model to obtain distillation features of a second image task model, and training the second image task model based on the distillation features and the image sample in the second image field, so that the transfer learning effect under the limited image sample is improved by combining the feature distillation, the model training efficiency is improved, and related communication resources and computing resources are saved.
Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
The embodiment of the application can be applied to various image task scenes, such as image recognition, image classification, image retrieval and the like. For the image recognition application, performing hash processing on an image to be recognized through a trained hash model to obtain hash characteristics, and performing image recognition based on the hash characteristics to obtain the category of the image; for image retrieval application, carrying out hash processing on an image to be recognized through a trained hash model to obtain hash characteristics, and carrying out image retrieval based on the hash characteristics to recall similar images of the image.
The following description will be given taking an image search as an example:
in the large-scale retrieval of image rearrangement, retrieval is carried out by using binarization embedding (embedding) features (the efficiency of the binarization features is higher than that of floating point embedding features), the binarization features are learned by means of a deep hash model, a learning method is that a Convolutional Neural Network (CNN) learns through quantization loss and measurement loss, and triple sample training is required. In a service migration scenario, when a hash model trained by service a data already exists, the hash model needs to be migrated to a service B domain, the service B is similar to the service a data main body, but a part of newly introduced data domain exists (the service B is mixed domain data with a new domain relative to the service a). Since the new service has limited triple training samples, and the service data A and the service data B cannot be shared due to the data sensitivity problem, how to effectively learn the hash characteristic of the service B is a problem in this scenario.
In view of the above, the embodiment of the present application provides an image processing method based on artificial intelligence. In the learning process, the Hash result of the B model is distilled by the Hash result of the A model, the Hash loss of the B model based on quantification and measurement learning is maintained, meanwhile, Hash bits (namely Hash inertia bits) with insufficient activation degree in the B model are found through the capability difference of Hash representation information quantity, and the Hash bits are grafted to the B model by the network weight generating effective Hash in the A model, so that the activation degree of the Hash bits of the B model is improved, and a richer representation effect is achieved under the limited B business data.
Therefore, the image processing method based on artificial intelligence provided by the embodiment of the application does not need to learn the training data of the model A again, and can avoid the problems that the training time is too long and the model cannot be updated rapidly due to overlarge training data of the model A, and the model B cannot be trained effectively due to unavailable training data of the model A; by combining a characteristic distillation method and a Hash learning method, the Hash learning effect under limited B model data is improved; the hash bit activation effect is improved through effective hash comparison and weight grafting.
As shown in fig. 8, the hash feature learned by the B model is used in the reverse search, as follows: 1) training a B model to obtain a Hash feature (binarization feature) through the trained B model, for example, extracting the feature of a query image through the trained B model to obtain the binarization feature (1, 0, 0); 2) acquiring a quantization center of the features (namely a clustering center generated by clustering the binary features of all samples of the image library); 3) taking the quantization center as an index for retrieval (used for bucket-based retrieval), and finding out the corresponding quantization center (namely the index to which the sample belongs) of the binarization characteristics of all samples according to the principle of closest distance, such as an index (1, 0, 0), an index (0, 1, 0), an index (0, 0, 1) and the like, and establishing the association relationship between the index and the image library; 4) finding the nearest K1 quantization center (i.e., index) from the hash of the query (query) image in the search, e.g., the nearest index to the query image is (1, 0, 0), (0, 1, 0), (0, 0, 1), where K1 is a positive integer; 5) acquiring samples related to the indexes as candidate images; 6) calculating an L2 distance according to the embedding characteristics (such as (0.2, 0.7, 0.3, 0.3), (0.1, 0.5, 0.2, 0.3) and the like shown in FIG. 8) of the candidate image and the embedding characteristics (such as (0.2, 0.8, 0.3, 0.3) shown in FIG. 8) of the query image, and sorting from small to large; 7) the top K2 samples in the sequence were taken as the last recalled image, where K2 is a positive integer.
The following describes an image processing method based on artificial intelligence provided in an embodiment of the present application:
as shown in fig. 9, the overall training process includes three parts, namely B-service triple-data preparation, hash model distillation learning, and a hash bit grafting activation module. Firstly, marking to generate similar sample pairs (each group is marked as (image 1, image 2, whether similar or not)), and then generating target triples to be learned (such as A business triplet samples (namely triplet samples formed based on A business data) and B business triplet samples (namely triplet samples formed based on B business data)) through triplet mining of each batch (batch); respectively inputting the ternary group data into a model A and a model B to obtain hash characteristics respectively corresponding to the model A and the model B; distilling the Hash result of the model B by adopting the Hash result of the model A to form a distillation loss Ldistill; after a certain amount of batch learning, testing the Hash activation effects of the model A and the model B, finding out Hash inertia bits of the model B and Hash activation bits of the model A, and grafting network weights corresponding to the Hash activation bits of the model A to the weights of the Hash bits of the model B; training until the total loss (total loss including distillation loss Ldistill, quantitative branch loss Lhash, metric learning loss Lem) no longer decreases.
The learning process of the model is explained in detail as follows:
1) sample preparation and triplet noise reduction
1.1) sample pair of similarity labels (image 1, image 2, similar or not)
Similar sample pairs are prepared through similarity training of image de-duplication (de-duplication, recognition of repeated images in an image library), and since the quantification method of the embodiment of the application is based on similarity information, the similar sample pairs also need to be prepared. Similar sample pairs for quantization are consistent with sample pairs for training similarity. Where similar samples are actually understood to be identical samples in a deduplication application, the two images, if identical or only slightly different, require deduplication techniques such that the final deduplicated image library does not contain such sufficiently similar samples.
A set of samples satisfying the (image 1, image 2, similar) condition is collected as the labeling result of one similar sample pair.
1.2) triple mining from similar sample pairs to training
The triplet samples are a group of samples (anchor sample (anchor), positive sample (positive), and negative sample (negative)), where anchor and positive constitute a positive sample pair, and anchor and negative constitute a negative sample pair, abbreviated as (a, p, n).
In the embodiment of the application, samples are mined by adopting triple (triplet) learning, similar sample pairs are used as input, and the triplets are obtained by mining in the sample pairs of each batch (the number of the sample pairs of each batch is bs) as follows: for a certain sample x, randomly selecting an image sample i from the remaining bs-1 sample pairs, calculating embedding (output of the current model embedding module) of the sample i, calculating the distance between the embedding and x, sorting according to the distance from small to large, removing the first K% samples (e.g., (2 bs-2) × 3/100, e.g., for bs-64, the first K% is the first 4), taking the first 10 samples as negative samples, and respectively forming triples with the positive sample pairs in x, so that each sample x generates 10 triples, and the whole batch generates 10bs triples.
Regarding K% before removal, mainly due to noise considerations, the model of the embodiment of the present application predicts the same (measures the distance as small as possible) for the same image (the identical and extremely similar image), and needs the distance as large as possible (and also needs to satisfy the order preserving effect that the more dissimilar distance is, the more dissimilar distance is) for different images (including only a few similar but not similar enough and dissimilar). Except for attacking similar images of the types such as previous and next frames under the same shot of a video, adding tone conversion on the images and the like, the probability that most of the images are similar to each other is very low, so that a positive sample pair (sampled from a full amount of samples, it can be understood that two sample pairs in the batch are different in most of the structures, but still have a small amount of same structures) of each batch is removed from the batch samples, the most similar small amount of samples (the first K% is considered that the samples in the batch may have the same sample and should not be learned as negative samples) are removed, the rest samples are all used as effective negative samples, and 10 negative samples with the minimum distance are selected as negative samples. Wherein K is a controllable value, and the larger the noise in the training set is, the larger K is.
1.3) composition of data per batch
Since a large amount of a service data and a limited amount of B service data are included, which are different from each other, the method for generating the batch data by directly mixing A, B data in the related art is slightly different, and the method for generating the batch data according to the embodiment of the present application is as follows: 1) adopting two paths of data input, wherein the first path extracts bs similar sample pairs without returning from the service data A each time, and generates 10bs triplet sample groups; the second path extracts bs similar sample pairs from the B service data without returning each time, and generates 10bs triplet sample groups; 2) when any data has been traversed, such as B service data (which is traversed faster than a service data due to a smaller amount) is trained, and a service data has a margin, it is sufficient to traverse the completed data (such as the aforementioned B service data) again; 3) each time the most massive data set traverses one time (in the embodiment of the present application, a service data), it is regarded as completing epoch once.
2) Distillation migration learning for hash model
2.1) model Structure
The model (including A model and B model) structure comprises 3 modules, namely a basic feature module, an embedding (embedding) module and a quantification module. The basic feature module training adopts a feature extraction module (such as resnet101), the parameters of which are shown in table 1, to generate depth features; the depth features are input into an embedding module shown in table 2 to generate vector features, and the vector features are input into a quantization model shown in table 3 to generate quantization features. The three modules can also adopt other model structures, for example, the basic feature module adopts a resnet18CNN module, and for example, the embedding module adopts a multilayer full-connection layer. Here, the quantization module of table 3 generates a1 × 256-dimensional vector, the embedding module of table 2 generates a1 × 64-dimensional vector, and since each bit of the vector feature is a floating point number (32 bits), and each bit is a 0 or 1 numerical value (1 bit is occupied), in order to balance the occupation of the two spaces, the vector feature dimension is as lower as possible than the quantization vector dimension, otherwise, too many bits of the vector feature occupy too much storage space.
Under the problem of model migration in the new field, a model with well-trained service data A (such as structures in tables 1, 2 and 3), service data B and a model B to be trained (such as structures in tables 1, 2 and 3) exist. In the embodiment of the application, the distillation and grafting of the B model are combined through the well-learned model of the A business data, so that effective Hash expression can be performed under the condition of extremely small training data volume in the B field.
Table 1 structure table of resnet101
Figure BDA0003367439960000251
Table 2 embedding module
Figure BDA0003367439960000252
It should be noted that the input of the embedding module is powing output, the metric learning is adopted to learn the triplet distance information, and 64 is an embedding dimension.
TABLE 3 quantization Module
Figure BDA0003367439960000261
It should be noted that the input of the quantization module is a boosting output or an Embedding output, and outputs 1 × 256 binary prediction (-1, 1).
2.2) training on distillation
2.2.1) initialization of B-model parameters
The basic characteristic module and the imbedding module are initialized by adopting parameters trained by the model A, and the quantization layer is initialized by adopting random normal distribution with the variance of 0.01 and the mean of 0.
2.2.2) setting learning parameters
And updating the underlying basic features, and setting learning parameters as the parameters shown in tables 1, 2 and 3.
2.2.3) learning Rate
The basic feature module adopts a learning rate lr equal to 0.0005, and the embedding module and the quantization module adopt a learning rate of 0.005. After each 10 iterations lr becomes 0.1 times the original.
2.2.4) learning Process
Performing training period (epoch) iteration on the full data; iteratively processing a full amount of samples once every epoch until the average loss under a certain epoch is not reduced any more;
2.2.5) the specific operations in each epoch iteration are as follows: dividing the total number of sample pairs into Nb batches, and acquiring the triple for each batch, and performing the following processing:
(1) model forward: all parameters of the B model are set to be in a state needing learning, the B model performs forward calculation on an input image during training to obtain a prediction result, outputs an embedded feature em and outputs a quantized feature q (for the first stage, em learning is utilized, and for the second stage, em and q are learned). For the service data A, inputting the service data A into a model A to obtain a quantized feature Aq1, and simultaneously inputting a model B to obtain a quantized feature Aq 2; for the B service data, the B service data is input into a model B to obtain a quantization characteristic Bq3, and subsequent loss calculation is carried out on the basis of the quantization characteristic A1, the quantization characteristic A2 and the quantization characteristic B.
(2) loss calculation: and calculating the corresponding loss, wherein the specific calculation process refers to the following description.
(3) Updating model parameters: and (3) performing Gradient backward calculation on the loss in the previous step by adopting a random Gradient Descent (SGD) method to obtain the updated values of all model parameters, and updating the model.
2.2.6) after each epoch, triggering Hash grafting when the number of epochs is integral multiple of 5 (for example, 51015 th epoch), otherwise skipping.
3) Loss function
The total loss of the embodiment of the application is the sum of three losses, namely the sum of quantitative branch loss (Lhash), distillation loss (Ldistill) and metric learning loss (Lem).
3.1) triple loss (triplet loss): for image triples (a, p, n), the representation x of the image triples is obtained from the model respectivelya、xp、xn(either the output of the quantization module or the output of the embedding module), the triplet loss l at a preset α (i.e., the preset distance (margin) of the positive and negative sample pairs) is calculated according to equation (1)tri
ltri=max(||xa-xp||-‖xa-xn‖+α,0) (1)
3.2) quantized Branch loss (Lhash): by symbol quantization loss (L)coding) Triplet loss, constrained by margin, is made up of feature loss functions. Calculating a quantization branch loss L for a quantization module outputhashAs shown in equation (2):
Lhash=w21Ltri+w22Lcoding (2)
wherein, w21、w22Represents a weight, w21Is 1, w22Is 0.5 due to LcodingConvergence ratio LtriFast, in order to guarantee LtriThe total loss is dominant, so that the embedding is guaranteed to have the capability of similarity measurement all the time, so that the w is in the position22Set to 0.5 (or other values less than 1, as appropriate).
With respect to quantized branch loss, triple samples generated by data preparation (B traffic) are employedData) is input into the B model, and the prediction results (1 × 256 vectors, respectively) of the hash layers (quantization layers) of a, p, and n are output, and since the objective of this task is to generate quantized image characterization vectors by making the quantization layer output as close to 1 or-1 as possible, the quantization feature of the quantization branch output requires a regression loss (L) in addition to the metric capabilitycoding) The encoding (from floating point numbers to finite integer numbers) effect is made closer to the target value.
(1) triplet loss: since the quantization vector is 256-dimensional, each bit needs to learn a value of-1 or 1, the distance between a and n samples in the triplet needs to be large enough to ensure that the triplet is distinguishable in the quantization space, margin needs to be set to be larger, the preset reference initial distance margin0 is 160, and the triplet loss calculation formula refers to formula (1).
(2) Symbol quantization loss (L)coding): the quantization effect is calculated for the vector output by this quantization branch, the layer target is the output (-1, 1), so that a sign quantization can be performed (i.e. an output less than 0 is set to define 0, and an output greater than 0 is defined as 1), so the purpose of the sign quantization penalty is to bring the output of the quantized code close to-1 or 1 (if the output is near a critical value, i.e. 0, it is easy to cause similar features to be quantized into different codes). Thus, the target code of the quantization learning task can be generated by using a sign function (e.g., sign function shown in equation (3), and the target code bi of u is calculated for each bit ui of the vector u by the sign function, and finally the target code of u is b). The return loss is then used to make the quantized code output vector u less distant from L2 for the target code b. The purpose of quantization coding in training is to make the output result either very close to 1 or very close to-1, where each bit output of a quantization branch takes 0 or 1 depending on its sign as a binary quantization vector.
Figure BDA0003367439960000281
Figure BDA0003367439960000282
Where bi represents the target code for each bit of the vector u and ui represents an arbitrary bit of the vector u.
3.3) distillation loss (Ldistill)
Due to the fact that B service data are limited, pure training of the B service data easily causes a poor local solution of a B model, and therefore the A service data hope can enable the B model to learn the B service data and meanwhile keep the capacity of the Hash measurement of the A service data equal to (or close to) that of the A model. Ldistill was introduced to maintain the distribution of the original output by distillation through the output of these data.
For a large amount of A business data irrelevant to B business data, two hash layer results (p (x) output by an A model and a B modeli) And q (x)i) Qxi for the B model output approaches the pxi distribution of the a model output by the KL penalty as shown in equation (5).
Figure BDA0003367439960000291
The intrinsic meaning of KL loss is to calculate H (p, q) -H (p), which can be understood as subtracting the information of p vector from the mutual information of p and q vectors, and when the two vectors are similar enough, the mutual information is very close to the information provided by any one vector, so that the KL loss is minimized in training, i.e. the two distributions can be made to be close.
3.4) metric learning loss (Lem)
In application, a return result needs to be measured according to a more accurate floating point feature (compared with a hash feature, a data range which can be represented by the floating point feature is larger, and actually, the measurement as image similarity is more accurate), so that the embedding measurement effect in the model needs to be maintained. Here, triplet loss is used as Lem, where a is 0.9, i.e., implemented by an embedded loss function.
4) Hash grafting
It is known that the a model has better effect because each hash bit (for example, 256 bits in table 3) is trained sufficiently, while the B model cannot guarantee that all hash bits are activated well because B data directly from the B model is limited. Therefore, Hash weight grafting is designed in the embodiment of the application, so that all Hash bits of the B model can be well activated.
The grafting process of the embodiment of the application is as follows: a measurement index is designed in a weight vector of 64x256 of a B model quantization layer to measure which hash bit has poor effect, and a hash bit with better expression is found by means of the weight vector corresponding to the A model to directly replace the bad hash bit weight of the B model.
4.1) Hash metric index
The effect of each bit of the hash features is mainly measured according to the proportion of the training data which can be distinguished by the bit features, and the closer the proportion which can be distinguished is to 0.5, the greater the influence of the hash bit is. For a bit hash, when only 10% of the data in all training sets is active (i.e., -1 or 1 is output after the quantization result passes through the sign function on the bit, 1 indicates active, and-1 indicates inactive), the hash has less impact than 50% active. According to the measuring method, the hash bit activation entropy formula shown in formula (6) is designed as the hash bit measurement effect index H in the embodiment of the applicationhashWherein p isaThe activation ratio of a certain bit of hash is represented, and the distribution is [0, 1]]0 means no activation, 1 means full activation (both of these are the worst cases), with 0.5 indicating the best activation, since then HhashThe value is maximum.
Hhash=-pa*log(1-pa) (6)
4.3) grafting Process
1) B, model reasoning: reasoning is carried out on the training set A (business data A) by using a model B, the output of all quantized layers is collected and mapped into-1 or 1 through a sign function, and NA Hash vectors Hb of 1x256 are in total provided that the training set A has NA images.
2) B, model Hash evaluation: and finding a target hash bit needing grafting.
a. For the NA Hb vectors, the activation ratio and the metric effect of each bit in the 256-bit hash bits can be obtained, for example, for the 1 st bit hash bit, the number M of activated first bits in all the NA vectors is counted, and then the activation ratio of the first bit hash bit is M/NA, and H of the bit is calculatedhash
b. For 256HhashThe smallest 10 hash bits are taken and their sequence numbers in the original 1x256 hash vector are found, sorted from large to small. The 10 hash bits are hash bits (i.e. hash inertia bits) to be grafted, called Hb _10, and find the corresponding quantization layer weight parameters in table 3, where the quantization layer total weight parameter w is a 64 × 256 vector, for example, if the hash number (index) corresponding to Hb _10 is 1, 3, 5, 7, 9, 11, 13, 17, 19, 21, the corresponding weight parameter is in the index column of w, i.e. 1, 3, 5, 7, 9, 11, 13, 17, 19, 21.
3) Model A reasoning and Hash evaluation
And (3) calculating the Hash measurement effect of the A model according to the 1) and 2), finding the first 50 Hash bits with the best measurement effect, namely Ha _50, and finding the corresponding quantization layer weight parameters in the table 3.
4) Weight grafting: randomly extracting 10 weight parameters (namely Hash activation bits) from Ha _50, and replacing the weight parameters originally in original Hb _ 10.
5) Quantization index construction and retrieval
5.1) indexing of image databases (generating inverted tables invertT and Positive tables T)
(1) Inputting the B model into N images i in an image library respectively to obtain an embedding feature e and a quantization feature q (note that e is the result output by an embedding module, and q is quantized 0 and 1 vectors obtained by taking a sign function after being output by the quantization module), and recording a mapping table T [ i: e ] of the images i and the embedding feature e (wherein i represents the serial number of the image, and e represents the embedding feature of the image).
(2) Then, a q-based index system is established, namely, the image sequence numbers with qj are recorded into the following Linvert inverted list, such as [ q1: [ img1, img2, img5], q2: [ img3], q3: [ img4] ], all index vector lists Lindex are saved: [ q1, q2, q3 ].
(3) For the newly added sample x, qx and ex can be calculated, when qx exists in the Lindex list, x is directly added into the list corresponding to qx under Linvert, and the image sequence number x and ex are added into the T mapping table (a record of a sequence number-character is added in advance [ x: ex ])
5.2) search
(1) Acquiring codes: for the query image q, the model is input to obtain eq and qq.
(2) And (3) quantitative retrieval: indexes whose hamming distance from qq is less than Dq _ thr (e.g., 64) are found from Lindex according to qq of the query image (assuming that q2, q3 satisfy the condition).
(3) Features are obtained from the recall index: the image ids of the recall indexes are obtained from Linvert, such as recall img3 and img 4. The embedded features of the last recalled image are found from the T list, and the e3, e4 features are obtained for images 3, 4.
(4) Sorting: calculating the L2 distance (Euclidean distance) according to the characteristics of eq and recalled e3 and e4, obtaining the distance of dist3 and dist4, and ordering the distance from small to large.
(5) And returning: the first K samples are selected from the above sorting distance and returned (K is preset by the product).
In summary, the embodiment of the present application has the following beneficial effects:
(1) the effective Hash migration learning is carried out under the limited service data, the migration effect of the Hash characteristics under the limited data on a new service is improved, two tasks (two tasks e and q) with different convergence modes can be guaranteed to be effectively converged finally through a pre-training and fine-tuning scheme, if a multi-task loss weighted learning method of multi-task learning in the related technology is adopted, the smooth convergence of double tasks is difficult to guarantee, and the recalling of embedding is easy to become low. On the other hand, in the parallel two-branch structure, compared with a structure in which two tasks of the same model are all shared (the output of the embedding is used as the input of the quantization module), the influence of quantization learning on the embedding learning can be reduced as much as possible.
(2) By designing a method for measuring the Hash bit effect and replacing bad Hash bit corresponding weight in the B model, the Hash effective activation bit is improved, so that the Hash characterization effect is improved
(3) Under the metric learning, the Hash learning effect under the limited data is improved by means of feature distillation, and the situation that a bad local optimum is only trapped under the limited data is avoided.
The artificial intelligence based image processing method provided by the embodiment of the present application has been described in conjunction with exemplary applications and implementations of the electronic device provided by the embodiment of the present application. The embodiment of the present application further provides an image processing apparatus based on artificial intelligence, and in practical applications, each functional module in the image processing apparatus based on artificial intelligence may be cooperatively implemented by hardware resources of an electronic device (such as a terminal and a server), such as computing resources of a processor and the like, communication resources (such as being used for supporting communication in various manners such as optical cables and cells), and a memory. Fig. 2 shows an artificial intelligence based image processing apparatus 555 stored in a memory 550, which may be software in the form of programs and plug-ins, for example, software modules designed by a programming language such as C/C + +, Java, application software designed by a programming language such as C/C + +, Java, or dedicated software modules in a large software system, application program interfaces, plug-ins, cloud services, and other implementations, which are exemplified below.
The artificial intelligence based image processing apparatus 555 includes a series of modules, including a first hash module 5551, a second hash module 5552, a distillation module 5553, a third hash module 5554, and a training module 5555. The following continues to describe the scheme for implementing image processing by cooperation of the modules in the artificial intelligence based image processing apparatus 555 according to the embodiment of the present application.
The first hash module 5551 is configured to perform hash processing on a first image sample in a first image field through a first image task model in the first image field to obtain a first hash sample feature of the first image sample; the second hash module 5552 is configured to perform hash processing on a first image sample in the first image field through a second image task model in a second image field to obtain a second hash sample feature of the first image sample; a distillation module 5553, configured to perform feature distillation processing on the second hash sample feature based on the first hash sample feature, so as to obtain a distillation feature of the second image task model; a third hash module 5554, configured to perform hash processing on a second image sample in the second image domain through the second image task model to obtain a third hash sample feature of the second image sample; a training module 5555, configured to train the second image task model based on the third hash sample feature and the distillation feature, where the trained second image task model is used to extract a hash feature of the to-be-processed image in the second image field, and the hash feature of the to-be-processed image is used to execute an image task.
In some embodiments, the first hash module 5551 is further configured to perform the following processing by the first image task model of the first image domain: performing feature extraction processing on a first image sample in the first image field to obtain a first embedded sample feature of the first image sample; quantizing the first embedded sample characteristic of the first image sample to obtain a first Hash sample characteristic of the first image sample; the second hash module 5552 is further configured to perform the following processing by a second image task model of a second image domain: performing feature extraction processing on a first image sample in the first image field to obtain a second embedded sample feature of the first image sample; and quantizing the second embedded sample characteristic of the first image sample to obtain a second Hash sample characteristic of the first image sample.
In some embodiments, the first image task model comprises a first feature layer and a first embedding layer, and the second image task model comprises a second feature layer and a second embedding layer; the first hash module 5551 is further configured to perform, by using the first feature layer, basic feature extraction processing on a first image sample in the first image field, so as to obtain a first basic sample feature of the first image sample; performing embedding vector conversion processing on the first basic sample feature of the first image sample through the first embedding layer to obtain a first embedding sample feature of the first image sample; the second hash module 5552 is further configured to perform, through the second feature layer, basic feature extraction processing on a first image sample in the first image field, so as to obtain a second basic sample feature of the first image sample; and carrying out embedding vector conversion processing on the second basic sample characteristic of the first image sample through the second embedding layer to obtain a second embedded sample characteristic of the first image sample.
In some embodiments, the distillation module 5553 is further configured to determine mutual information between the first hashed sample feature and the second hashed sample feature; determining the information entropy of the first hash sample characteristic; and taking the difference value between the mutual information and the information entropy as the distillation characteristic of the second image task model.
In some embodiments, the training module 5555 is further configured to construct a feature loss function for the second image task model based on the third hashed sample features; constructing a distillation loss function of the second image task model based on the distillation features; carrying out weighted summation processing on the characteristic loss function and the distillation loss function to obtain a target loss function of the second image task model; and updating the parameters of the second image task model based on the target loss function, and taking the updated parameters of the second image task model as the parameters of the trained second image task model.
In some embodiments, before the weighted summation of the feature loss function and the distillation loss function to obtain the target loss function of the second image task model, the training module 5555 is further configured to perform a feature extraction process on a second image sample in the second image domain through the second image task model to obtain a third embedded sample feature of the second image sample; determining a fourth embedded sample feature of a similar image sample of the second image sample and a fifth embedded sample feature of a dissimilar image sample of the second image sample; constructing an embedding loss function of the second image task model based on the third embedded sample feature, the fourth embedded sample feature and the fifth embedded sample feature; and carrying out weighted summation processing on the embedding loss function, the characteristic loss function and the distillation loss function to obtain a target loss function of the second image task model.
In some embodiments, the training module 5555 is further configured to determine a fourth hashed sample feature of a similar image sample of the second image sample and a fifth hashed sample feature of a dissimilar image sample of the second image sample; constructing a triple loss function of the second image task model based on the third hash sample characteristic, the fourth hash sample characteristic and the fifth hash sample characteristic; constructing a symbol quantization loss function of the second image task model based on the third hash sample feature and the hash sample feature label of the second image sample; and carrying out weighted summation processing on the triple loss function and the symbol quantization loss function to obtain a characteristic loss function of the second image task model.
In some embodiments, the training module 5555 is further configured to perform similarity processing on the third hashed sample feature and the fourth hashed sample feature to obtain a first similarity between the second image sample and the similar image sample; carrying out similarity processing on the third Hash sample characteristic and the fifth Hash sample characteristic to obtain a second similarity between the second image sample and the dissimilar image sample; and constructing a triple loss function of the second image task model based on the first similarity and the second similarity.
In some embodiments, the training module 5555 is further configured to determine a hash activation bit for the first image task model and determine a hash inertia bit for the second image task model; and updating the parameters corresponding to the Hash inertia bits into the parameters corresponding to the Hash activation bits of the second image task model.
In some embodiments, the first image task model comprises a plurality of first hash bits, the second image task model comprises a plurality of second hash bits; the training module 5555 is further configured to perform the following for any of the first hash bits of the plurality of first hash bits: performing statistical processing based on the first hash bit on the first hash sample characteristic of the first image sample to obtain an activation ratio of the first hash bit; determining activation metric information of the first hash bit based on the activation proportion of the first hash bit; based on the activation measurement information corresponding to the first hash bits, the first hash bits are screened to obtain hash activation bits of the first image task model; performing the following for any of the plurality of second hash bits: performing statistical processing based on the second hash bit on the second hash sample characteristic of the first image sample to obtain an activation ratio of the second hash bit; determining activation metric information of the second hash bit based on the activation proportion of the second hash bit; and screening the plurality of second hash bits based on the activation measurement information corresponding to the plurality of second hash bits respectively to obtain hash inertia bits of the second image task model.
In some embodiments, the training module 5555 is further configured to sort the plurality of first hash bits in a descending order, and use a top-ranked portion of the first hash bits in a descending order result as candidate hash activation bits of the first image task model; sampling the candidate Hash activation bits of the first image task model to obtain the Hash activation bits of the second image task model; and performing descending sorting on the plurality of second hash bits, and taking the second hash bits sorted in the descending sorting result as hash inertia bits of the second image task model.
In some embodiments, the training module 5555 is further configured to determine an inactive proportion of the first hash bits based on the active proportion of the first hash bits; and carrying out measurement processing on the first hash bit based on the activation proportion of the first hash bit and the non-activation proportion of the first hash bit to obtain activation measurement information of the first hash bit.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device executes the artificial intelligence based image processing method described in the embodiment of the present application.
Embodiments of the present application provide a computer-readable storage medium storing executable instructions, which when executed by a processor, cause the processor to perform an artificial intelligence based image processing method provided by embodiments of the present application, for example, the artificial intelligence based image processing method as shown in fig. 3-5.
In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.
In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, may be stored in a portion of a file that holds other programs or data, e.g., in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or distributed across multiple sites and interconnected by a communication network.
The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (16)

1. An artificial intelligence based image processing method, characterized in that the method comprises:
performing hash processing on a first image sample in a first image field through a first image task model in the first image field to obtain a first hash sample characteristic of the first image sample;
performing hash processing on a first image sample in the first image field through a second image task model in a second image field to obtain a second hash sample characteristic of the first image sample;
performing feature distillation processing on the second Hash sample feature based on the first Hash sample feature to obtain a distillation feature of the second image task model;
performing hash processing on a second image sample in the second image field through the second image task model to obtain a third hash sample characteristic of the second image sample;
and training the second image task model based on the third hash sample characteristic and the distillation characteristic, wherein the trained second image task model is used for extracting the hash characteristic of the image to be processed in the second image field, and the hash characteristic of the image to be processed is used for executing an image task.
2. The method of claim 1,
the performing hash processing on a first image sample in a first image field through a first image task model in the first image field to obtain a first hash sample characteristic of the first image sample includes:
executing the following processing by a first image task model of a first image domain:
performing feature extraction processing on a first image sample in the first image field to obtain a first embedded sample feature of the first image sample;
quantizing the first embedded sample characteristic of the first image sample to obtain a first Hash sample characteristic of the first image sample;
the performing hash processing on the first image sample in the first image field through the second image task model in the second image field to obtain a second hash sample characteristic of the first image sample, including:
performing the following processing by a second image task model of a second image domain:
performing feature extraction processing on a first image sample in the first image field to obtain a second embedded sample feature of the first image sample;
and quantizing the second embedded sample characteristic of the first image sample to obtain a second Hash sample characteristic of the first image sample.
3. The method of claim 2,
the first image task model comprises a first characteristic layer and a first embedded layer, and the second image task model comprises a second characteristic layer and a second embedded layer;
the performing feature extraction processing on the first image sample in the first image field to obtain a first embedded sample feature of the first image sample includes:
performing basic feature extraction processing on a first image sample in the first image field through the first feature layer to obtain a first basic sample feature of the first image sample;
performing embedding vector conversion processing on the first basic sample feature of the first image sample through the first embedding layer to obtain a first embedding sample feature of the first image sample;
the performing feature extraction processing on the first image sample in the first image field to obtain a second embedded sample feature of the first image sample includes:
performing basic feature extraction processing on a first image sample in the first image field through the second feature layer to obtain a second basic sample feature of the first image sample;
and carrying out embedding vector conversion processing on the second basic sample characteristic of the first image sample through the second embedding layer to obtain a second embedded sample characteristic of the first image sample.
4. The method of claim 1, wherein the performing feature distillation processing on the second hashed sample feature based on the first hashed sample feature to obtain a distillation feature of the second image task model comprises:
determining mutual information between the first hashed sample feature and the second hashed sample feature;
determining the information entropy of the first hash sample characteristic;
and taking the difference value between the mutual information and the information entropy as the distillation characteristic of the second image task model.
5. The method of claim 1, wherein training the second image task model based on the third hashed sample feature and the distillation feature comprises:
constructing a feature loss function of the second image task model based on the third Hash sample features;
constructing a distillation loss function of the second image task model based on the distillation features;
carrying out weighted summation processing on the characteristic loss function and the distillation loss function to obtain a target loss function of the second image task model;
and updating the parameters of the second image task model based on the target loss function, and taking the updated parameters of the second image task model as the parameters of the trained second image task model.
6. The method of claim 5,
before the weighted summation processing is performed on the characteristic loss function and the distillation loss function to obtain the target loss function of the second image task model, the method further includes:
performing feature extraction processing on a second image sample in the second image field through the second image task model to obtain a third embedded sample feature of the second image sample;
determining a fourth embedded sample feature of a similar image sample of the second image sample and a fifth embedded sample feature of a dissimilar image sample of the second image sample;
constructing an embedding loss function of the second image task model based on the third embedded sample feature, the fourth embedded sample feature and the fifth embedded sample feature;
the step of performing weighted summation processing on the characteristic loss function and the distillation loss function to obtain a target loss function of the second image task model includes:
and carrying out weighted summation processing on the embedding loss function, the characteristic loss function and the distillation loss function to obtain a target loss function of the second image task model.
7. The method of claim 5, wherein constructing the feature loss function of the second image task model based on the third hashed sample features comprises:
determining a fourth hashed sample feature of a similar image sample of the second image sample and a fifth hashed sample feature of a dissimilar image sample of the second image sample;
constructing a triple loss function of the second image task model based on the third hash sample characteristic, the fourth hash sample characteristic and the fifth hash sample characteristic;
constructing a symbol quantization loss function of the second image task model based on the third hash sample feature and the hash sample feature label of the second image sample;
and carrying out weighted summation processing on the triple loss function and the symbol quantization loss function to obtain a characteristic loss function of the second image task model.
8. The method of claim 7, wherein constructing the triplet loss function for the second image task model based on the third hash sample feature, the fourth hash sample feature, and the fifth hash sample feature comprises:
carrying out similarity processing on the third Hash sample characteristic and the fourth Hash sample characteristic to obtain a first similarity between the second image sample and the similar image sample;
carrying out similarity processing on the third Hash sample characteristic and the fifth Hash sample characteristic to obtain a second similarity between the second image sample and the dissimilar image sample;
and constructing a triple loss function of the second image task model based on the first similarity and the second similarity.
9. The method of claim 1, further comprising:
determining a Hash activation bit of the first image task model, and determining a Hash inertia bit of the second image task model;
and updating the parameters corresponding to the Hash inertia bits into the parameters corresponding to the Hash activation bits of the second image task model.
10. The method of claim 9,
the first image task model comprises a plurality of first hash bits and the second image task model comprises a plurality of second hash bits;
the determining a hash activation bit of the first image task model includes:
performing the following for any of the first hash bits of the plurality of first hash bits:
performing statistical processing based on the first hash bit on the first hash sample characteristic of the first image sample to obtain an activation ratio of the first hash bit;
determining activation metric information of the first hash bit based on the activation proportion of the first hash bit;
based on the activation measurement information corresponding to the first hash bits, the first hash bits are screened to obtain hash activation bits of the first image task model;
the determining the hash inertia bit of the second image task model includes:
performing the following for any of the plurality of second hash bits:
performing statistical processing based on the second hash bit on the second hash sample characteristic of the first image sample to obtain an activation ratio of the second hash bit;
determining activation metric information of the second hash bit based on the activation proportion of the second hash bit;
and screening the plurality of second hash bits based on the activation measurement information corresponding to the plurality of second hash bits respectively to obtain hash inertia bits of the second image task model.
11. The method of claim 10,
the screening the plurality of first hash bits to obtain hash activation bits of the first image task model includes:
sorting the plurality of first hash bits in a descending order, and taking the first hash bits sorted at the top in a descending order result as candidate hash activation bits of the first image task model;
sampling the candidate Hash activation bits of the first image task model to obtain the Hash activation bits of the second image task model;
the screening the plurality of second hash bits to obtain hash inertia bits of the second image task model includes:
and performing descending sorting on the plurality of second hash bits, and taking the second hash bits sorted in the descending sorting result as hash inertia bits of the second image task model.
12. The method of claim 10, wherein determining the activation metric information for the first hash bit based on the activation ratio of the first hash bit comprises:
determining an inactive proportion of the first hash bits based on the active proportion of the first hash bits;
and carrying out measurement processing on the first hash bit based on the activation proportion of the first hash bit and the non-activation proportion of the first hash bit to obtain activation measurement information of the first hash bit.
13. An artificial intelligence-based image processing apparatus, characterized in that the apparatus comprises:
the system comprises a first hash module, a second hash module and a third hash module, wherein the first hash module is used for performing hash processing on a first image sample in a first image field through a first image task model in the first image field to obtain a first hash sample characteristic of the first image sample;
the second hash module is used for carrying out hash processing on a first image sample in the first image field through a second image task model in a second image field to obtain a second hash sample characteristic of the first image sample;
the distillation module is used for carrying out characteristic distillation processing on the second Hash sample characteristic based on the first Hash sample characteristic to obtain a distillation characteristic of the second image task model;
the third hash module is used for performing hash processing on a second image sample in the second image field through the second image task model to obtain a third hash sample characteristic of the second image sample;
and the training module is used for training the second image task model based on the third Hash sample characteristics and the distillation characteristics, wherein the trained second image task model is used for extracting the Hash characteristics of the image to be processed in the second image field, and the Hash characteristics of the image to be processed are used for executing an image task.
14. An electronic device, characterized in that the electronic device comprises:
a memory for storing executable instructions;
a processor for implementing the artificial intelligence based image processing method of any one of claims 1 to 12 when executing executable instructions stored in the memory.
15. A computer-readable storage medium storing executable instructions for implementing the artificial intelligence based image processing method of any one of claims 1 to 12 when executed by a processor.
16. A computer program product comprising a computer program or instructions, characterized in that the computer program or instructions, when executed by a processor, implement the artificial intelligence based image processing method of any one of claims 1 to 12.
CN202111386959.XA 2021-11-22 2021-11-22 Image processing method, apparatus, device, storage medium, and program product Active CN114359649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111386959.XA CN114359649B (en) 2021-11-22 2021-11-22 Image processing method, apparatus, device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111386959.XA CN114359649B (en) 2021-11-22 2021-11-22 Image processing method, apparatus, device, storage medium, and program product

Publications (2)

Publication Number Publication Date
CN114359649A true CN114359649A (en) 2022-04-15
CN114359649B CN114359649B (en) 2024-03-22

Family

ID=81096047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111386959.XA Active CN114359649B (en) 2021-11-22 2021-11-22 Image processing method, apparatus, device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN114359649B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115050355A (en) * 2022-05-31 2022-09-13 北京小米移动软件有限公司 Training method and device of speech recognition model, electronic equipment and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN110163344A (en) * 2019-04-26 2019-08-23 北京迈格威科技有限公司 Neural network training method, device, equipment and storage medium
CN110516095A (en) * 2019-08-12 2019-11-29 山东师范大学 Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN111582479A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Distillation method and device of neural network model
CN111597374A (en) * 2020-07-24 2020-08-28 腾讯科技(深圳)有限公司 Image classification method and device and electronic equipment
CN111738436A (en) * 2020-06-28 2020-10-02 电子科技大学中山学院 Model distillation method and device, electronic equipment and storage medium
CN111768438A (en) * 2020-07-30 2020-10-13 腾讯科技(深圳)有限公司 Image processing method, device, equipment and computer readable storage medium
CN111882031A (en) * 2020-06-30 2020-11-03 华为技术有限公司 Neural network distillation method and device
CN112329617A (en) * 2020-11-04 2021-02-05 中国科学院自动化研究所 New scene face recognition model construction method and system based on single source domain sample
CN112486686A (en) * 2020-11-30 2021-03-12 之江实验室 Customized deep neural network model compression method and system based on cloud edge cooperation
CN112860183A (en) * 2021-01-07 2021-05-28 西安交通大学 Multisource distillation-migration mechanical fault intelligent diagnosis method based on high-order moment matching
CN113011387A (en) * 2021-04-20 2021-06-22 上海商汤科技开发有限公司 Network training and human face living body detection method, device, equipment and storage medium
CN113255915A (en) * 2021-05-20 2021-08-13 深圳思谋信息科技有限公司 Knowledge distillation method, device, equipment and medium based on structured instance graph
CN113297906A (en) * 2021-04-20 2021-08-24 之江实验室 Knowledge distillation-based pedestrian re-recognition model compression method and evaluation method

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN110163344A (en) * 2019-04-26 2019-08-23 北京迈格威科技有限公司 Neural network training method, device, equipment and storage medium
CN110516095A (en) * 2019-08-12 2019-11-29 山东师范大学 Weakly supervised depth Hash social activity image search method and system based on semanteme migration
CN111582479A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Distillation method and device of neural network model
CN111738436A (en) * 2020-06-28 2020-10-02 电子科技大学中山学院 Model distillation method and device, electronic equipment and storage medium
CN111882031A (en) * 2020-06-30 2020-11-03 华为技术有限公司 Neural network distillation method and device
CN111597374A (en) * 2020-07-24 2020-08-28 腾讯科技(深圳)有限公司 Image classification method and device and electronic equipment
CN111768438A (en) * 2020-07-30 2020-10-13 腾讯科技(深圳)有限公司 Image processing method, device, equipment and computer readable storage medium
CN112329617A (en) * 2020-11-04 2021-02-05 中国科学院自动化研究所 New scene face recognition model construction method and system based on single source domain sample
CN112486686A (en) * 2020-11-30 2021-03-12 之江实验室 Customized deep neural network model compression method and system based on cloud edge cooperation
CN112860183A (en) * 2021-01-07 2021-05-28 西安交通大学 Multisource distillation-migration mechanical fault intelligent diagnosis method based on high-order moment matching
CN113011387A (en) * 2021-04-20 2021-06-22 上海商汤科技开发有限公司 Network training and human face living body detection method, device, equipment and storage medium
CN113297906A (en) * 2021-04-20 2021-08-24 之江实验室 Knowledge distillation-based pedestrian re-recognition model compression method and evaluation method
CN113255915A (en) * 2021-05-20 2021-08-13 深圳思谋信息科技有限公司 Knowledge distillation method, device, equipment and medium based on structured instance graph

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
L.T. NGUYEN-MEIDINE等: "Unsupervised Multi-Target Domain Adaptation ThroughKnowledge Distillation", 《ARXIV》, pages 1 - 20 *
YANG LI等: "Contrastive Self-Supervised Hashing WithDual Pseudo Agreement", 《IEEE ACCESS》, vol. 8, pages 165034 - 165043, XP011809845, DOI: 10.1109/ACCESS.2020.3022672 *
刘洪利: "基于深度学习的轻量化目标检测算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 03, pages 138 - 649 *
赵振兵等: "基于动态监督知识蒸馏的输电线路螺栓缺陷图像分类", 《高压电技术》, vol. 47, no. 2, pages 406 - 414 *
闫冰琦等: "基于正态分布的距离保持哈希图像检索方法", 《济南大学学报(自然科学版)》, vol. 36, no. 2, pages 1 - 9 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115050355A (en) * 2022-05-31 2022-09-13 北京小米移动软件有限公司 Training method and device of speech recognition model, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114359649B (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN106503106B (en) A kind of image hash index construction method based on deep learning
CN110728317A (en) Training method and system of decision tree model, storage medium and prediction method
CN111612134B (en) Neural network structure searching method and device, electronic equipment and storage medium
CN111339433A (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN111611488B (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN111563192B (en) Entity alignment method, device, electronic equipment and storage medium
CN113298197B (en) Data clustering method, device, equipment and readable storage medium
CN112990378B (en) Scene recognition method and device based on artificial intelligence and electronic equipment
CN113821657A (en) Artificial intelligence-based image processing model training method and image processing method
CN113705589A (en) Data processing method, device and equipment
CN115221396A (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN111639230A (en) Similar video screening method, device, equipment and storage medium
CN115858919A (en) Learning resource recommendation method and system based on project field knowledge and user comments
CN114676279A (en) Image retrieval method, device, equipment and computer readable storage medium
CN116738009B (en) Method for archiving and backtracking data
CN111984842B (en) Bank customer data processing method and device
CN113010705A (en) Label prediction method, device, equipment and storage medium
CN113392868A (en) Model training method, related device, equipment and storage medium
CN114548382B (en) Migration training method, device, equipment, storage medium and program product
CN114359649A (en) Image processing method, apparatus, device, storage medium, and program product
CN115982634A (en) Application program classification method and device, electronic equipment and computer program product
US11526727B1 (en) Machine learned chart recommendation system
CN109308565B (en) Crowd performance grade identification method and device, storage medium and computer equipment
CN111143641A (en) Deep learning model training method and device and electronic equipment
CN114077681B (en) Image data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40071962

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant