CN108664999B - Training method and device of classification model and computer server - Google Patents

Training method and device of classification model and computer server Download PDF

Info

Publication number
CN108664999B
CN108664999B CN201810412797.4A CN201810412797A CN108664999B CN 108664999 B CN108664999 B CN 108664999B CN 201810412797 A CN201810412797 A CN 201810412797A CN 108664999 B CN108664999 B CN 108664999B
Authority
CN
China
Prior art keywords
classification model
modal
training
mode
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810412797.4A
Other languages
Chinese (zh)
Other versions
CN108664999A (en
Inventor
王乃岩
樊峻崧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tusimple Technology Co Ltd
Original Assignee
Beijing Tusimple Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tusimple Technology Co Ltd filed Critical Beijing Tusimple Technology Co Ltd
Priority to CN201810412797.4A priority Critical patent/CN108664999B/en
Publication of CN108664999A publication Critical patent/CN108664999A/en
Application granted granted Critical
Publication of CN108664999B publication Critical patent/CN108664999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a training method and a device of a classification model and a computer server, and aims to solve the technical problems of low calculation efficiency and narrow application range of the prior art for training the classification model by a semi-supervised learning technology. The method comprises the following steps: constructing an initial classification model, wherein the initial classification model comprises at least one single-mode classification model with the same classification task, and a modal data training set corresponding to each single-mode classification model comprises tag training data and label-free training data; training the initial classification model to obtain a target classification model based on a method for aligning feature code distribution of labeled training data and unlabeled training data in a modal data training set of each single-modal classification model. The scheme can improve the efficiency of the classification model training and has wider application range.

Description

Training method and device of classification model and computer server
Technical Field
The invention relates to the field of deep learning, in particular to a training method of a classification model, a training device of the classification model and a computer server.
Background
At present, a large amount of labeled sample data is usually needed for training a neural network, firstly, a large amount of sample data needs to be collected, then, the collected sample data is labeled manually to obtain labeled sample data for training the neural network, and the collection and the labeling need higher labor cost and time cost.
For solving the technical problem, the neural network is trained by adopting a training data set comprising label training data and label-free training data at present, a large amount of label training data is not needed, so that the dependence on a large amount of label data can be relieved, and the problems of higher cost of labeled sample data and higher time cost in the prior art are solved.
At present, the existing semi-supervised learning technology in deep learning mainly introduces random noise or various random transformations in the processes of input and feature construction, and simultaneously restricts the output of a neural network to have robustness and invariance so as to achieve the aim of performing auxiliary training by using unlabelled training data, for example, Takeru Miyato and the like uses antagonistic samples, Mehdi Sajjadi and the like use random transformations, and Samuli Lane and the like use random noise to introduce disturbance.
However, the existing semi-supervised learning technology has the following technical defects: in order to obtain supervision information from the label-free training data, the same group of training samples need to be subjected to forward calculation for multiple times, so that the efficiency is low; meanwhile, only single-mode data can be used for learning and training, and the application range is narrow.
Disclosure of Invention
In view of the above problems, the present invention provides a method and an apparatus for training a classification model, and a computer server, so as to solve the technical problems of low computational efficiency and narrow application range of the prior art in training a classification model by a semi-supervised learning technique.
The embodiment of the invention provides a training method of a classification model in a first aspect, which comprises the following steps:
constructing an initial classification model, wherein the initial classification model comprises at least one single-mode classification model with the same classification task, and a modal data training set corresponding to each single-mode classification model comprises tag training data and label-free training data;
and training the initial classification model by adopting the modal training data set corresponding to each single-modal classification model to obtain a target classification model based on a method for aligning the feature code distribution of the labeled training data and the unlabeled training data in the modal data training set of each single-modal classification model.
In an embodiment of the present invention, in a second aspect, a training apparatus for classification models includes:
the model building unit is used for building an initial classification model, the initial classification model comprises at least one single-mode classification model with the same classification task, and a mode data training set corresponding to each single-mode classification model comprises tag training data and label-free training data;
and the training unit is used for training the initial classification model by adopting the modal training data set corresponding to each single-modal classification model to obtain the target classification model based on a method for aligning the characteristic coding distribution of the labeled training data and the unlabeled training data in the modal data training set of each single-modal classification model.
In a third aspect, an embodiment of the present invention provides a computer server, including a memory, and one or more processors communicatively connected to the memory;
the memory has stored therein instructions executable by the one or more processors to cause the one or more processors to implement the aforementioned training method of the classification model.
According to the technical scheme, based on the method for aligning the characteristic coding distribution of the labeled training data and the unlabeled training data in the modal data training set of each single-modal classification model, the initial classification model is trained by adopting the modal training data set corresponding to each single-modal classification model to obtain the target classification model. Namely, by adopting the technical scheme of the invention, the countercheck constraint training is carried out on the feature encoder by combining the labeled training data and the unlabeled training data, so that the encoder can learn the feature expression with good consistency of the labeled training data and a large amount of unlabeled training data, and the requirement of carrying out forward calculation on the same group of training samples for multiple times in the prior art is avoided, thereby improving the training efficiency of the classification model, and in addition, the training and learning can be carried out aiming at the multi-modal data, and the application range is wider.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flowchart of a method for training a classification model according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an initial classification model according to an embodiment of the present invention;
FIG. 3 is a second exemplary diagram of the initial classification model according to the present invention;
FIG. 4 is a flowchart of training based on the initial classification model shown in FIG. 2/FIG. 3 according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a target classification model trained based on the initial classification model shown in FIG. 3 according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating that the same object respectively corresponds to multiple modal data representations according to an embodiment of the present invention;
FIG. 7 is a third exemplary diagram of an initial classification model according to an embodiment of the present invention;
FIG. 8 is a flowchart of training based on the initial classification model shown in FIG. 7 according to an embodiment of the present invention;
FIG. 9 is a fourth exemplary diagram illustrating an initial classification model according to the present invention;
FIG. 10 is a diagram illustrating a target classification model trained based on the initial classification model shown in FIG. 9 according to an embodiment of the present invention;
FIG. 11 is a fifth exemplary diagram illustrating the structure of the initial classification model according to the embodiment of the present invention;
FIG. 12 is a sixth exemplary diagram illustrating the structure of the initial classification model according to the embodiment of the present invention;
FIG. 13 is a seventh exemplary diagram illustrating an initial classification model according to an embodiment of the present invention;
FIG. 14 is an eighth schematic structural diagram of an initial classification model according to an embodiment of the present invention;
FIG. 15 is a schematic structural diagram of an apparatus for training a classification model according to an embodiment of the present invention;
fig. 16 is a schematic structural diagram of a computer server according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart of a training method of a classification model in an embodiment of the present invention is shown, where the method includes:
101, constructing an initial classification model, wherein the initial classification model comprises at least one single-mode classification model with the same classification task, and a mode data training set corresponding to each single-mode classification model comprises tag training data and label-free training data;
102, training the initial classification model by using the modal training data set corresponding to each single-modal classification model based on a method for aligning the feature code distribution of the labeled training data and the unlabeled training data in the modal data training set of each single-modal classification model to obtain a target classification model.
In the embodiment of the invention, each single-mode classification model in the initial classification model classifies the modal data of the corresponding type, the types of the modal data corresponding to different single-mode classification models are different, but the classification tasks corresponding to a plurality of single-mode classification models are the same. For example, the multi-modal classification model includes three single-modal classification models, which are respectively represented by a model a, a model B and a model C, wherein the model a is used for classifying image data, the model B is used for classifying character data, and the model C is used for classifying video data, but the classification tasks of the model a, the model B and the model C are the same, for example, the classification tasks include pedestrians, vehicles, traffic lights and the like, that is, each model identifies pedestrians, vehicles, traffic lights and the like from the corresponding type of modal data.
Based on the method flow shown in fig. 1, in the embodiment of the present invention, there may be a plurality of structures of the initial classification model, and a plurality of examples are described below in detail for training the initial classification models with different structures to obtain the target classification model, and those skilled in the art may extend other alternatives based on the examples provided in the embodiment of the present invention, but any alternative is within the scope to be protected by the present application as long as it is based on a method of aligning feature code distributions of a plurality of single-mode classification models.
Example 1
In example 1, the structure of the initial classification model may include only one single-mode classification model as shown in fig. 2, or may include more than two single-mode classification models as shown in fig. 3, regardless of the structures shown in fig. 2 or fig. 3, each single-mode classification model includes a feature encoder, and a classifier and a discriminator respectively cascaded to the feature encoder, the discriminator is configured to determine whether a feature code output by the feature encoder is derived from tagged training data or untagged training data, an output end of the discriminator is provided with a first loss function for training the discriminator and a second loss function for training the feature encoder, and the first loss function and the second loss function are oppositively set.
In this example 1, the step 102 trains the initial classification model by using the modality training data set corresponding to each single-modality classification model to obtain the target classification model, which may be specifically implemented by, but is not limited to, the following manner, which includes steps 102a to 102b, as shown in fig. 4:
102a, performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model;
and 102b, deleting the discriminators in the single-mode classification models in the classification models obtained through training to obtain the target classification model shown in the figure 5.
In example 1, step 102a may be implemented by, but is not limited to, the following:
performing the following iterative training on the initial classification model for multiple times, wherein one iterative training specifically comprises the following steps a 1-a 2, wherein:
step A1, aiming at each single-mode classification model, acquiring training data from a mode data training set of the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model according to a loss function value of the classifier of the single-mode classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
and step A2, carrying out next iterative training based on the initial classification model after parameter adjustment.
Preferably, in the step a1, parameters of the discriminator and the feature encoder of the single-mode classification model are adjusted based on values of the first loss function and values of the second loss function of the single-mode classification model, and the adjustment may be implemented by, but not limited to, any one of the following modes (mode B1-mode B2):
a mode B1, adjusting parameters of the discriminator according to a value of a first loss function after the discriminator of the monomodal classification model discriminates the feature code output by the feature encoder; and adjusting the parameters of the feature encoder based on the value of the second loss function after the discriminator performs re-discrimination on the feature code output by the feature encoder after the parameters are adjusted.
And in the mode B2, adjusting parameters of the discriminator according to values of a first loss function after the discriminator of the single-mode classification model discriminates the feature code output by the feature encoder, and adjusting parameters of the feature encoder of the single-mode classification model according to values of a second loss function.
In practical applications, the same object may be represented by different modality data, such as images, videos, voices, texts, and the like. As shown in fig. 6, the same room can be expressed by three modality data, i.e., image, hand drawing, and text, respectively. In the embodiment of the invention, when the classification model comprises more than two single-mode classification models, in order to improve the performance of each single-mode classification model, cross-mode training can be performed on the single-mode classification models based on a method for aligning the feature code distribution of a plurality of single-mode classification models in the training process, namely confrontation constraint training can be performed on the feature code distribution of the plurality of single-mode classification models. Different single-mode classification models correspond to different-mode modal data training sets, and by aligning the feature coding distribution of different modal data of a plurality of single-mode classification models, the training data of different modes can be implicitly and commonly utilized among the plurality of single-mode classification models, and the feature information of the training data of different modes is shared, so that the plurality of single-mode classification models can be cooperatively trained, and the performance of each single-mode classification model is mutually improved by utilizing multi-mode data. The training method can improve the accuracy of classification of each single-mode classification model, does not need each training sample to have multi-mode data expression at the same time (namely the technical scheme of the invention does not need multi-mode data alignment of the training samples), and has the advantages of easy acquisition of the training samples and wider application range. For the cross-modal training scheme, the structure of the initial classification model may be set as shown in fig. 7, fig. 9, fig. 11, and fig. 12, and the initial classification models shown in fig. 7, fig. 9, fig. 11, and fig. 12 are described in detail by example 2, example 3, example 4, and example 5, respectively.
Example 2
The structure for setting the initial classification model may be as shown in fig. 7, where each single-mode classification model includes a feature encoder, and a classifier and a discriminator respectively cascaded with the feature encoder, where the discriminator is configured to determine that a feature code output by the feature encoder is derived from tagged training data or untagged training data, an output end of the discriminator is provided with a first loss function for training the discriminator and a second loss function for training the feature encoder, and the first loss function and the second loss function are opposedly set; and the feature encoders of the plurality of single-mode classification models are also respectively connected to the same cross-mode discriminator, the cross-mode discriminator is used for discriminating the mode types corresponding to the feature encodings output by the feature encoders of the single-mode classification models, a third loss function for training the cross-mode discriminator and a fourth loss function for training the feature encoders in the single-mode classification models are arranged at the output end of the cross-mode discriminator, and the third loss function and the fourth loss function are arranged in a confronting manner.
In the aforementioned flow shown in fig. 1, the initial classification model is trained in step 102 by using the modality training data set corresponding to each single-modality classification model to obtain the target classification model, which may be specifically implemented by, but is not limited to, the following manner, which includes steps 102c to 102d, as shown in fig. 8:
102c, performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model;
and 102d, deleting the discriminators and the cross-modal discriminators in the single-modal classification models in the classification models obtained by training.
Preferably, the step 102c can be implemented by, but not limited to, the following ways:
performing the following iterative training on the initial classification model for multiple times, wherein one iterative training comprises the steps of C1-C3, wherein:
step C1, aiming at each single-mode classification model, acquiring training data from a mode data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model according to the value of a loss function of the classifier of the single-mode classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
step C2, adjusting parameters of the cross-modal discriminator and the feature encoder of each single-modal classification model based on the value of the third loss function and the value of the fourth loss function;
and step C3, performing next iterative training based on the initial classification model after parameter adjustment.
Preferably, in the step C1, parameters of a discriminator and a feature encoder of the single-mode classification model are adjusted based on a value of a first loss function and a value of a second loss function of the single-mode classification model, which may be specifically referred to as a mode B1 or a mode B2 in example 1, and are not described herein again.
Preferably, in the step C2, parameters of the cross-modal discriminator and the feature encoder of each single-modal classification model are adjusted based on a value of the third loss function and a value of the fourth loss function, which may be specifically implemented by, but not limited to, any one of the following modes (mode D1 to mode D2):
the mode D1 is that parameters of the cross-modal classifier are adjusted according to the value of a third loss function after the cross-modal classifier discriminates the feature codes output by the feature encoders of the single-modal classification models; and adjusting the parameters of the feature encoders of the single-mode classification models based on the value of a fourth loss function obtained after the cross-mode discriminator subjected to parameter adjustment performs re-discrimination on the feature codes output by the feature encoders of the single-mode classification models.
And in the mode D2, adjusting parameters of the cross-modal classifier according to values of a third loss function after the cross-modal classifier discriminates the feature codes output by the feature encoders of the single-modal classification models, and adjusting parameters of the feature encoders of the single-modal classification models according to values of a fourth loss function.
Example 3
The structure for setting the initial classification model may be as shown in fig. 9, where each single-mode classification model includes a feature encoder, and a classifier and a fifth loss function that are respectively cascaded with the feature encoder, and a value of the fifth loss function indicates consistency of distribution of feature codes of labeled training data and unlabeled training data in a modal data training set corresponding to the single-mode classification model; the characteristic encoders of the plurality of single-mode classification models are connected to the same cross-mode discriminator, the cross-mode discriminator is used for discriminating the mode types corresponding to the characteristic encoders output by the characteristic encoders of the single-mode classification models, a third loss function used for training the cross-mode discriminator and a fourth loss function used for training the characteristic encoders in the single-mode classification models are arranged at the output end of the cross-mode discriminator, and the third loss function and the fourth loss function are arranged in a confronting manner.
In this example 3, the initial classification model is trained in the step 102 by using the modality training data set corresponding to each single-modality classification model to obtain the target classification model, which may be specifically implemented by, but is not limited to, the following manners, including steps 102e to 102 f:
102e, performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model;
102f, deleting the cross-mode discriminators in the classification model obtained by training to obtain a target classification model shown in FIG. 10; or deleting the cross-modal discriminator in the trained classification model and the fifth loss function in each single-modal classification model to obtain the target classification model shown in fig. 5.
Preferably, the step 102e can be implemented by, but not limited to, the following ways:
performing the following iterative training on the initial classification model for multiple times, wherein one iterative training comprises the steps of E1-E3:
step E1, for each single-mode classification model, acquiring training data from a mode data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model based on values of a loss function of the classifier according to the single-mode classification model; adjusting parameters of a feature encoder of the single-mode classification model according to the value of a fifth loss function of the single-mode classification model;
step E2, adjusting parameters of the cross-modal discriminator and the feature encoders of the single-modal classification models based on the value of the third loss function and the value of the fourth loss function;
and E3, carrying out next iterative training based on the initial classification model after parameter adjustment.
The specific implementation of the step E2 can be referred to as the mode D1 or the mode D2 in example 2, and details are not repeated here.
Example 4
In example 4, the initial classification model may have a structure as shown in fig. 11, where each single-mode classification model includes a feature encoder, and a classifier and a fifth loss function respectively cascaded with the feature encoder, and a value of the fifth loss function indicates consistency of feature encoding distributions of labeled training data and unlabeled training data in a modal data training set corresponding to the single-mode classification model; the feature encoders of the plurality of single-mode classification models are all connected to the same sixth loss function, and the value of the sixth loss function represents the consistency of the feature encoding distribution output by the feature encoders of the single-mode classification models.
In this example 4, the initial classification model is trained in the step 102 by using the modality training data set corresponding to each single-modality classification model to obtain the target classification model, which may be specifically implemented by, but not limited to, a manner including steps 102g to 102h, where the manner includes steps 102g to 102h
102g, performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model;
and 102h, deleting a sixth loss function in the multi-modal classification model obtained through training and a fifth loss function in each single-modal classification model to obtain a target classification model.
Preferably, the step 102g can be realized by, but not limited to, the following ways: performing the following iterative training on the initial classification model for multiple times, wherein one iterative training comprises steps F1-F3, wherein:
step F1, for each single-mode classification model, acquiring training data from a mode data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model according to a loss function value of the classifier of the single-mode classification model; adjusting parameters of a feature encoder of the single-mode classification model according to the value of a fifth loss function of the single-mode classification model;
step F2, adjusting parameters of the feature encoders of the single-mode classification models according to the values of the sixth loss function;
and F3, performing next iterative training based on the initial classification model after parameter adjustment.
Example 5
In example 5, the initial classification model may have a structure as shown in fig. 12, each single-mode classification model includes a feature encoder, and a classifier and a discriminator respectively cascaded to the feature encoder, the discriminator is configured to discriminate whether a feature code output by the feature encoder cascaded to the discriminator is derived from tagged training data or untagged training data, and an output end of the discriminator is provided with a first loss function for training the discriminator and a second loss function for training the feature encoder, where the first loss function and the second loss function are countermeasure settings; the feature encoders of the plurality of single-mode classification models are all connected to the same sixth loss function, and the value of the sixth loss function represents the consistency of the feature encoding distribution output by the feature encoders of the single-mode classification models.
In this example 5, the initial classification model is trained in step 102 by using the modality training data set corresponding to each single-modality classification model to obtain the target classification model, which may be specifically implemented by, but is not limited to, the following manner, where the manner includes steps 102i to 102 j:
102i, performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model;
102j, deleting discriminators in all single-mode classification models in the classification models obtained through training to obtain target classification models; or deleting the sixth loss function in the trained classification model and the discriminators in the single-mode classification models to obtain the target classification model.
Preferably, the step 102i can be implemented by, but not limited to, the following ways:
performing the following iterative training on the initial classification model for multiple times, wherein one iterative training comprises the steps G1-G3:
g1, aiming at each single-mode classification model, acquiring training data from a mode data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model according to the value of a loss function of the classifier of the single-mode classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
g2, adjusting parameters of the feature encoders of the single-mode classification models according to the value of the sixth loss function;
and G3, carrying out next iterative training based on the initial classification model after parameter adjustment.
In the embodiment of the present invention, in the step G1, parameters of a discriminator and a feature encoder of the single-mode classification model are adjusted based on a value of the first loss function and a value of the second loss function of the single-mode classification model, and specific implementation may refer to the mode B1 or the mode B2 in example 1, which is not described herein again.
Example 6
In example 6, the initial classification model may have a structure as shown in fig. 13 or fig. 14, and regardless of the initial classification model shown in fig. 13 or fig. 14, each of the single-mode classification models includes a feature encoder and a fifth loss function in cascade, and a value of the fifth loss function represents consistency of feature coding distributions of the labeled training data and the unlabeled training data in the modal data training set corresponding to the single-mode classification model.
In this example 6, in the step 102, the initial classification model is trained by using the modality training data set corresponding to each single-modality classification model to obtain the target classification model, which may be specifically implemented by, but is not limited to, the following manner, where the manner includes steps 102k to 102 l:
102k, performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model;
and 102l, deleting the fifth loss function in each single-mode classification model in the classification model obtained through training to obtain a target classification model.
In example 6, step 102k may be implemented by, but is not limited to, the following:
performing the following iterative training on the initial classification model for multiple times, wherein one iterative training specifically includes the following steps H1-H2, wherein:
step H1, aiming at each single-mode classification model, acquiring training data from a mode data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model based on the value of a loss function of the classifier according to the single-mode classification model; adjusting parameters of a feature encoder of the single-mode classification model according to the value of a fifth loss function of the single-mode classification model;
and step H2, performing next iterative training based on the initial classification model after parameter adjustment.
In the embodiment of the present invention, the initial classification models constructed in the foregoing examples 1 and 6 have a simpler structure and a faster training speed, but the complementarity between the multi-modal data cannot be fully utilized between the single-modal classification models to perform collaborative learning. Although the structures of the constructed initial classification models are relatively complex, different single-mode classification models correspond to different-mode modal data training sets, and by aligning the feature coding distribution of different-mode data of a plurality of single-mode classification models, the plurality of single-mode classification models can implicitly utilize the training data of different modes together and share the feature information of the training data of different modes, so that the plurality of single-mode classification models can be cooperatively trained, and the performance of each single-mode classification model is mutually improved by utilizing multi-mode data. The different examples can be combined with the labeled training data and the unlabeled training data to carry out confrontation constraint training on the feature encoder, so that the encoder can learn the feature expression with good consistency of the labeled training data and a large amount of unlabeled training data, the requirement of carrying out forward calculation on the same group of training samples for multiple times in the prior art is avoided, the training efficiency of the classification model can be improved, in addition, the training and learning can be carried out aiming at the multi-mode data, and the application range is wider; on the basis of the above, the person skilled in the art can select any one of the initial classification models in the foregoing examples according to actual needs.
In the first embodiment of the present invention, in the foregoing examples, the loss functions of the classifiers in the single-mode classification models may be set to be the same. Classifying features in individual single-mode classesThe encoder and the classifier are respectively denoted as feAnd fcThe learning parameters of the feature encoder and the classifier are respectively thetaeAnd thetac. In the embodiment of the invention, the encoder and the classifier in each single-mode classification model can adopt a cross entropy loss function to carry out parameter theta on the labeled training data according to the truth labeleAnd thetacOptimized by Lc(X;θec) Representing the loss function of a classifier in a unimodal classification model, the loss function may be set as shown in equation (1):
Figure BDA0001648508310000121
in the formula (1), NlRepresenting the total number of labeled training data in the modal data training set corresponding to the single modal classification model, C being the number of classes of the classification task,
Figure BDA0001648508310000122
representing a training sample xiClass label of (2), if xiBelong to the k-th class
Figure BDA0001648508310000123
A value of 1, if xiNot in class k
Figure BDA0001648508310000124
The value is 0.
Preferably, in example 1, example 2 and example 5, the feature encoder and the classifier in each of the single-modality classification modalities are respectively denoted as feAnd fcThe learning parameters of the feature encoder, the classifier and the discriminator are respectively thetae、θcAnd phi, by Ld(X; φ) represents a first loss function, which may be set as shown in equation (2):
Figure BDA0001648508310000125
in the formula (2), NlTotal number of labeled data in training set for modal data corresponding to single modal classification model, NuTotal number of unlabeled data in training set for modal data corresponding to single modal classification model, ziIs a scalar if xiZ is labeled dataiA value of 1, if xiZ is the label-free dataiThe value is 0.
The second loss function may be Le(X;θe) It is shown that the second loss function is arranged against the first loss function, and therefore the second loss function can be arranged as shown in equation (3):
Figure BDA0001648508310000126
in the formula (3), NlTotal number of labeled data in training set for modal data corresponding to single modal classification model, NuTotal number of unlabeled data in training set for modal data corresponding to single modal classification model, ziIs a scalar if xiZ is labeled dataiA value of 1, if xiZ is the label-free dataiThe value is 0.
Preferably, in examples 2 and 3, the feature encoder and the classifier in each of the single-modality classification modalities are respectively denoted as feAnd fcThe learning parameters of the feature encoder, the classifier and the cross-mode discriminator are respectively thetae、θcAnd phi' with Ld′(X; φ') represents a third loss function, which may be set as shown in equation (4):
Figure BDA0001648508310000131
in formula (4), N is the total number of all training samples included in the modal data training set of all single-modal classification models, J is the total number of single-modal classification models,
Figure BDA0001648508310000132
representing the total number of the labeled training data contained in the training set of modal data corresponding to the jth single-modal classification model,
Figure BDA0001648508310000133
represents the total number of unlabeled training data contained in the training set of modal data corresponding to the jth single-modal classification model,
Figure BDA0001648508310000134
and
Figure BDA0001648508310000135
respectively representing the feature encoder of the jth single-mode classification model and the learning parameters of the feature encoder.
The fourth loss function in examples 2 and 3 is set against the third loss function, which may be Lm(X), then the fourth loss function can be set as shown in equation (5):
Figure BDA0001648508310000136
in the formula (5), d'(k)A k-th element representing a cross mode discriminator output vector; j is the total number of single-mode classification models,
Figure BDA0001648508310000137
representing the total number of the labeled training data contained in the training set of modal data corresponding to the jth single-modal classification model,
Figure BDA0001648508310000138
represents the total number of unlabeled training data contained in the training set of modal data corresponding to the jth single-modal classification model,
Figure BDA0001648508310000139
and
Figure BDA00016485083100001310
respectively representing the learning parameters of a characteristic encoder and a characteristic encoder of the jth single-mode classification model, and phi' is the learning parameter of a cross-mode discriminator.
Preferably, in examples 3, 4 and 6, the feature encoders and classifiers in the respective single-modality classification modalities are respectively denoted as feAnd fcThe learning parameters of the feature encoder and the classifier are respectively thetae、θcThe fifth loss function in each of the unimodal classification models may be Lmmd(X;θe) To show, the fifth loss function may be set as shown in equation (6):
Figure BDA00016485083100001311
in equation (6), k (·,. cndot.) is a kernel function, x represents labeled training data, y represents unlabeled training data, NlTotal number of labeled data in training set for modal data corresponding to single modal classification model, NuAnd the total number of the unlabeled data in the modal data training set corresponding to the single modal classification model.
Preferably, in examples 4 and 5, the expression of the sixth loss function may be Lmmd' (X), the sixth loss function may be set as shown in equation (7):
Figure BDA00016485083100001312
in the formula (7), a plurality of modal data training sets corresponding to a plurality of single-modal classification models form a group between every two modal data training sets, and N is used for each groupaAnd NbRespectively representing the number of training samples contained in the two modal training data sets in the group, x and y representing the training samples respectively belonging to the two different modal training data sets, and k (·,) is a kernel function.
The foregoing formula (1) to formula (7) are only an example, and those skilled in the art may also use other formulas to implement the same function, and the present application is not limited thereto.
Example two
Based on the same concept of the training method of the classification model provided in the first embodiment, a second embodiment of the present invention provides a training apparatus of a classification model, the structure of the apparatus may be as shown in fig. 15, and the apparatus includes a model building unit 1 and a training unit 2, where:
the model building unit 1 is used for building an initial classification model, wherein the initial classification model comprises at least one single-mode classification model with the same classification task, and a mode data training set corresponding to each single-mode classification model comprises tag training data and label-free training data;
and the training unit 2 is used for training the initial classification model by adopting the modal training data set corresponding to each single-modal classification model based on a method for aligning the feature code distribution of the labeled training data and the unlabeled training data in the modal data training set of each single-modal classification model to obtain the target classification model.
In the embodiment of the invention, each single-mode classification model in the initial classification model classifies the modal data of the corresponding type, the types of the modal data corresponding to different single-mode classification models are different, but the classification tasks corresponding to a plurality of single-mode classification models are the same. For example, the multi-modal classification model includes three single-modal classification models, which are respectively represented by a model a, a model B and a model C, wherein the model a is used for classifying image data, the model B is used for classifying character data, and the model C is used for classifying video data, but the classification tasks of the model a, the model B and the model C are the same, for example, the classification tasks include pedestrians, vehicles, traffic lights and the like, that is, each model identifies pedestrians, vehicles, traffic lights and the like from the corresponding type of modal data.
Based on the training apparatus for classification models shown in fig. 15, there may be a plurality of structures of initial classification models in the embodiment of the present invention, and a plurality of examples are described below in detail for training initial classification models with different structures respectively to obtain a target classification model, and those skilled in the art may extend other alternatives based on the examples provided in the embodiment of the present invention, but the alternatives are all within the scope to be protected by the present application as long as the alternatives are based on a method of aligning feature coding distributions of a plurality of single-mode classification models.
Example 1A
Example 1A corresponds to example 1 in the first embodiment, and the structure of the initial modality classification model may be as shown in fig. 2 or fig. 3, for details, refer to example 1 in the first embodiment, and will not be described herein again.
In this example 1A, the training unit 2 shown in fig. 15 specifically includes:
the training subunit is used for performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model and triggering the deletion subunit when the training is finished;
and the deleting subunit is used for deleting the discriminators in the single-mode classification models in the classification models obtained by training of the training subunit.
In this example 1A, the training subunit is specifically configured to:
performing the following iterative training on the initial classification model for multiple times:
acquiring training data from a modal data training set of each single-modal classification model aiming at each single-modal classification model, inputting the training data into a feature encoder of the single-modal classification model, and adjusting parameters of the feature encoder and a classifier in the single-modal classification model according to a loss function value of a classifier of the single-modal classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
The training subunit adjusts parameters of a discriminator and a feature encoder of the single-mode classification model based on a value of the first loss function and a value of the second loss function of the single-mode classification model, and specific implementation may refer to the mode B1 or the mode B2 in example 1, which is not described herein again.
Example 2A
Example 2A corresponds to example 2 in the first embodiment, and the structure of the initial modality classification model may be as shown in fig. 7, for details, refer to example 2 in the first embodiment, and details are not repeated here.
In this example 2A, the training unit 2 shown in fig. 15 specifically includes:
the training subunit is used for performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model and triggering the deletion subunit when the training is finished;
and the deleting subunit is used for deleting the discriminators and the cross-modal discriminators in the single-modal classification models in the classification models obtained by training the training subunit.
In example 2A, the training subunit is specifically configured to:
for each single-mode classification model, acquiring training data from a modal data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model according to the value of a loss function of the classifier of the single-mode classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
adjusting parameters of the cross-modal discriminator and the feature encoders of the single-modal classification models based on the value of the third loss function and the value of the fourth loss function;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
In example 2A, the training subunit adjusts parameters of the cross-modal discriminator and the feature encoders of the single-modal classification models based on a value of the third loss function and a value of the fourth loss function, and specific implementation may refer to a mode D1 or a mode D2 in example 2, which is not described herein again.
In example 2A, the training subunit adjusts parameters of a discriminator and a feature encoder of the single-mode classification model based on a value of a first loss function and a value of a second loss function of the single-mode classification model, and specific implementation may refer to a mode B1 or a mode B2 in example 2, which is not described herein again.
Example 3A
Example 3A corresponds to example 3 in the first embodiment, and the structure of the initial modality classification model may be as shown in fig. 9, for details, refer to example 3 in the first embodiment, and details are not repeated here.
In example 3A, the training unit shown in fig. 15 may specifically include:
the training subunit is used for performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model and triggering the deletion subunit when the training is finished;
the deleting subunit is used for deleting the cross-mode discriminator in the classification model obtained by the training of the training subunit to obtain a target classification model; or deleting the cross-modal discriminator in the classification model obtained by training the training subunit and the fifth loss function in each single-modal classification model to obtain the target classification model.
In example 3A, the training subunit is specifically configured to:
performing the following iterative training on the initial classification model for multiple times:
for each single-mode classification model, acquiring training data from a mode data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model based on values of a loss function of a classifier according to the single-mode classification model; adjusting parameters of a feature encoder of the single-mode classification model according to the value of a fifth loss function of the single-mode classification model;
adjusting parameters of the cross-modal discriminator and the feature encoders of the single-modal classification models based on the value of the third loss function and the value of the fourth loss function;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
In example 3A, the training subunit adjusts parameters of the cross-modal discriminator and the feature encoder of each single-modal classification model based on a value of the third loss function and a value of the fourth loss function, and specific implementation may be referred to as a mode D1 or a mode D2 in example 2, which is not described herein again
Example 4A
Example 4A corresponds to example 4 in the first embodiment, and the structure of the initial modality classification model may be as shown in fig. 11, for details, refer to example 4 in the first embodiment, and details are not repeated here.
In example 4A, the training unit shown in fig. 15 may specifically include:
the training subunit is used for performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model and triggering the deletion subunit when the training is finished;
and the deleting subunit is used for deleting the sixth loss function in the multi-modal classification model obtained by training of the training subunit and the fifth loss function in each single-modal classification model to obtain the target classification model.
In example 4A, the training subunit is specifically configured to:
performing the following iterative training on the initial classification model for multiple times:
for each single-mode classification model, acquiring training data from a modal data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model according to the value of a loss function of the classifier of the single-mode classification model; adjusting parameters of a feature encoder of the single-mode classification model according to the value of a fifth loss function of the single-mode classification model;
adjusting parameters of a feature encoder of each single-mode classification model according to the value of the sixth loss function;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
Example 5A
Example 5A corresponds to example 5 in the first embodiment, and the structure of the initial modality classification model may be as shown in fig. 12, for details, refer to example 5 in the first embodiment, and details are not repeated here.
In example 5A, the training unit shown in fig. 15 may specifically include:
the training subunit is used for performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model and triggering the deletion subunit when the training is finished;
the deleting subunit is used for deleting the discriminators in the single-mode classification models in the classification models obtained by the training of the training subunit to obtain target classification models; or deleting the sixth loss function in the classification model obtained by training the training subunit and the discriminators in the single-mode classification models to obtain the target classification model.
In example 5A, the training subunit is specifically configured to:
performing the following iterative training on the initial classification model for multiple times:
for each single-mode classification model, acquiring training data from a modal data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model according to the value of a loss function of a classifier of the single-mode classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
adjusting parameters of a feature encoder of each single-mode classification model according to the value of the sixth loss function;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
In example 5A, the training subunit adjusts parameters of a discriminator and a feature encoder of the single-mode classification model based on a value of the first loss function and a value of the second loss function of the single-mode classification model, and specific implementation may refer to the mode B1 or the mode B2 in example 2, which is not described herein again.
Example 6A
Example 6A corresponds to example 6 in the first embodiment, and the structure of the initial modality classification model may be as shown in fig. 13 or fig. 14, for details, refer to example 6 in the first embodiment, and will not be described herein again.
In example 6A, the training unit shown in fig. 15 may specifically include:
the training subunit is used for performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model and triggering the deletion subunit when the training is finished;
and the deleting subunit is used for deleting the fifth loss function in each single-mode classification model in the classification model obtained by training of the training subunit to obtain the target classification model.
In example 6A, the training subunit is specifically configured to:
performing the following iterative training on the initial classification model for multiple times:
for each single-mode classification model, acquiring training data from a mode data training set corresponding to the single-mode classification model, inputting the training data into a feature encoder of the single-mode classification model, and adjusting parameters of the feature encoder and a classifier in the single-mode classification model based on values of a loss function of a classifier according to the single-mode classification model; adjusting parameters of a feature encoder of the single-mode classification model according to the value of a fifth loss function of the single-mode classification model;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
In the embodiment of the present invention, the initial classification models constructed in the foregoing examples 1A and 6A have a simpler structure and a faster training speed, but the complementarity between the multi-modal data cannot be fully utilized between the single-modal classification models to perform collaborative learning. Although the structures of the initial classification models constructed in examples 2A to 5A are relatively complex, different single-mode classification models correspond to different-mode modal data training sets, and by aligning feature coding distributions of different-mode data of a plurality of single-mode classification models, the plurality of single-mode classification models can implicitly and commonly use the training data of different modes and share feature information of the training data of different modes, so that the plurality of single-mode classification models can be cooperatively trained, and the performance of each single-mode classification model is mutually improved by using multi-mode data. The different examples can be combined with the labeled training data and the unlabeled training data to carry out confrontation constraint training on the feature encoder, so that the encoder can learn the feature expression with good consistency of the labeled training data and a large amount of unlabeled training data, the requirement of carrying out forward calculation on the same group of training samples for multiple times in the prior art is avoided, the training efficiency of the classification model can be improved, in addition, the training and learning can be carried out aiming at the multi-mode data, and the application range is wider; on the basis of the above, the person skilled in the art can select any one of the initial classification models in the foregoing examples according to actual needs.
EXAMPLE III
A third embodiment of the present invention further provides a computer server, as shown in fig. 16, where the computer server includes a memory and one or more processors communicatively connected to the memory;
the memory stores instructions executable by the one or more processors to cause the one or more processors to implement a method for training a multi-modal classification model according to any one of the preceding embodiments.
In the third embodiment of the present invention, the computer server may be a hardware device such as a PC, a notebook, a tablet computer, an FPGA (Field-Programmable Gate Array), an industrial computer, or a smart phone.
While the principles of the invention have been described in connection with specific embodiments thereof, it should be noted that it will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which may be implemented by those skilled in the art using their basic programming skills after reading the description of the invention.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the above embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the above-described embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (15)

1. An object classification method using a classification model, comprising:
constructing an initial classification model, wherein the initial classification model comprises at least two single-mode classification models with the same object classification task, different single-mode classification models correspond to different modal data types, the modal data comprise at least one of images, videos, voices and characters and are respectively used for representing different characteristics of the same object, and a modal training data set corresponding to each single-mode classification model comprises tag training data and label-free training data;
based on a method for aligning the feature code distribution of labeled training data and unlabeled training data in the modal training data set of each single-modal classification model, carrying out iterative training on the initial classification model by adopting the modal training data set corresponding to each single-modal classification model to obtain a target classification model, and carrying out object classification according to the target classification model;
each single-mode classification model comprises a feature encoder, a classifier and a discriminator, wherein the classifier and the discriminator are respectively cascaded with the feature encoder, and the discriminator is used for judging whether the feature encoding output by the feature encoder comes from labeled training data or unlabeled training data.
2. The method according to claim 1, wherein the output of the discriminator is provided with a first loss function for training the discriminator and a second loss function for training the feature encoder, the first and second loss functions being arranged in opposition;
training the initial classification model by using a modal training data set corresponding to each single-modal classification model to obtain a target classification model, which specifically comprises the following steps:
performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single modal classification model;
and deleting the discriminators in the single-mode classification models in the classification models obtained by training.
3. The method according to claim 2, wherein iteratively training the initial classification model using the modal training dataset corresponding to each single-modal classification model specifically comprises:
performing the following iterative training on the initial classification model for multiple times:
acquiring training data from a modal training data set of the single-modal classification model for each single-modal classification model, inputting the training data into a feature encoder of the single-modal classification model, and adjusting parameters of the feature encoder and a classifier in the single-modal classification model according to a loss function value of a classifier of the single-modal classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
4. The method according to claim 2, wherein the feature encoders of the at least two single-mode classification models are further respectively connected to a same cross-mode discriminator, the cross-mode discriminator is used for discriminating a mode type corresponding to the feature encoding output by the feature encoder of each single-mode classification model, a third loss function for training the cross-mode discriminator and a fourth loss function for training the feature encoder in each single-mode classification model are arranged at an output end of the cross-mode discriminator, and the third loss function and the fourth loss function are arranged in opposition;
the method further comprises the following steps: deleting the cross-mode discriminator in the classification model obtained by training.
5. The method according to claim 4, wherein iteratively training the initial classification model using the modal training dataset corresponding to each single-modal classification model specifically comprises:
performing the following iterative training on the initial classification model for multiple times:
acquiring training data from a modal training data set corresponding to a single-modal classification model for each single-modal classification model, inputting the training data into a feature encoder of the single-modal classification model, and adjusting parameters of the feature encoder and a classifier in the single-modal classification model according to a loss function value of the classifier of the single-modal classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
adjusting parameters of the cross-modal discriminator and the feature encoders of the single-modal classification models based on the value of the third loss function and the value of the fourth loss function;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
6. The method according to claim 5, wherein adjusting parameters of the cross-modal discriminator and the feature encoder of each single-modal classification model based on a value of the third loss function and a value of the fourth loss function comprises:
adjusting parameters of a cross-modal discriminator according to the value of a third loss function after the cross-modal discriminator discriminates the feature codes output by the feature encoders of the single-modal classification models;
and adjusting the parameters of the feature encoders of the single-mode classification models based on the value of a fourth loss function obtained after the cross-mode discriminator subjected to parameter adjustment performs re-discrimination on the feature codes output by the feature encoders of the single-mode classification models.
7. The method according to claim 3 or 5, wherein adjusting parameters of a discriminator and a feature encoder of the monomodal classification model based on a value of a first loss function and a value of a second loss function of the monomodal classification model comprises:
adjusting parameters of a discriminator according to the value of a first loss function after the discriminator of the single-mode classification model discriminates the feature codes output by the feature encoder;
and adjusting the parameters of the feature encoder based on the value of the second loss function after the discriminator performs re-discrimination on the feature code output by the feature encoder after the parameters are adjusted.
8. An object classification apparatus using a classification model, comprising:
the model building unit is used for building an initial classification model, the initial classification model comprises at least two single-mode classification models with the same object classification task, different single-mode classification models correspond to different modal data types, the modal data comprise at least one of images, videos, voices and characters and are respectively used for representing different characteristics of the same object, and a modal training data set corresponding to each single-mode classification model comprises tag training data and label-free training data;
the training unit is used for carrying out iterative training on the initial classification model by adopting the modal training data set corresponding to each single-modal classification model based on a method for aligning the characteristic coding distribution of the labeled training data and the unlabeled training data in the modal training data set of each single-modal classification model to obtain a target classification model so as to carry out object classification according to the target classification model;
each single-mode classification model comprises a feature encoder, a classifier and a discriminator, wherein the classifier and the discriminator are respectively cascaded with the feature encoder, and the discriminator is used for judging whether the feature encoding output by the feature encoder comes from labeled training data or unlabeled training data.
9. The apparatus of claim 8, wherein the output of the discriminator is provided with a first loss function for training the discriminator and a second loss function for training the feature encoder, the first and second loss functions being oppositional;
the training unit specifically comprises:
the training subunit is used for performing iterative training on the initial classification model by adopting a modal training data set corresponding to each single-modal classification model, and triggering the deletion subunit after the training is finished;
and the deleting subunit is used for training the training subunit to delete the discriminators in the single-mode classification models in the classification models obtained by training.
10. The apparatus according to claim 9, wherein the training subunit is specifically configured to:
performing the following iterative training on the initial classification model for multiple times:
acquiring training data from a modal training data set of the single-modal classification model for each single-modal classification model, inputting the training data into a feature encoder of the single-modal classification model, and adjusting parameters of the feature encoder and a classifier in the single-modal classification model according to a loss function value of a classifier of the single-modal classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
11. The apparatus according to claim 9, wherein the feature encoders of the at least two single-mode classification models are further respectively connected to a same cross-mode discriminator, the cross-mode discriminator is configured to discriminate a mode type corresponding to the feature encoding output by the feature encoder of each single-mode classification model, a third loss function for training the cross-mode discriminator and a fourth loss function for training the feature encoder in each single-mode classification model are provided at an output end of the cross-mode discriminator, and the third loss function and the fourth loss function are opposedly provided;
the delete subunit is further to: deleting the cross-mode discriminator in the classification model obtained by training.
12. The apparatus according to claim 11, wherein the training subunit is specifically configured to:
performing the following iterative training on the initial classification model for multiple times:
acquiring training data from a modal training data set corresponding to a single-modal classification model for each single-modal classification model, inputting the training data into a feature encoder of the single-modal classification model, and adjusting parameters of the feature encoder and a classifier in the single-modal classification model according to a loss function value of the classifier of the single-modal classification model; adjusting parameters of a discriminator and a feature encoder of the single-mode classification model based on the value of the first loss function and the value of the second loss function of the single-mode classification model;
adjusting parameters of the cross-modal discriminator and the feature encoders of the single-modal classification models based on the value of the third loss function and the value of the fourth loss function;
and performing the next iterative training based on the initial classification model after the parameters are adjusted.
13. The apparatus according to claim 12, wherein the training subunit adjusts parameters of the cross-modal classifier and the feature encoder of each single-modal classification model based on a value of the third loss function and a value of the fourth loss function, and specifically includes:
adjusting parameters of a cross-modal discriminator according to the value of a third loss function after the cross-modal discriminator discriminates the feature codes output by the feature encoders of the single-modal classification models;
and adjusting the parameters of the feature encoders of the single-mode classification models based on the value of a fourth loss function obtained after the cross-mode discriminator subjected to parameter adjustment performs re-discrimination on the feature codes output by the feature encoders of the single-mode classification models.
14. The apparatus according to claim 10 or 12, wherein the training subunit adjusts parameters of a discriminator and a feature encoder of the single-mode classification model based on values of a first loss function and values of a second loss function of the single-mode classification model, and specifically includes:
adjusting parameters of a discriminator according to the value of a first loss function after the discriminator of the single-mode classification model discriminates the feature codes output by the feature encoder;
and adjusting the parameters of the feature encoder based on the value of the second loss function after the discriminator performs re-discrimination on the feature code output by the feature encoder after the parameters are adjusted.
15. A computer server comprising a memory and one or more processors communicatively coupled to the memory;
the memory has stored therein instructions executable by the one or more processors to cause the one or more processors to implement an object classification method applying a classification model as claimed in any one of claims 1 to 7.
CN201810412797.4A 2018-05-03 2018-05-03 Training method and device of classification model and computer server Active CN108664999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810412797.4A CN108664999B (en) 2018-05-03 2018-05-03 Training method and device of classification model and computer server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810412797.4A CN108664999B (en) 2018-05-03 2018-05-03 Training method and device of classification model and computer server

Publications (2)

Publication Number Publication Date
CN108664999A CN108664999A (en) 2018-10-16
CN108664999B true CN108664999B (en) 2021-02-12

Family

ID=63780584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810412797.4A Active CN108664999B (en) 2018-05-03 2018-05-03 Training method and device of classification model and computer server

Country Status (1)

Country Link
CN (1) CN108664999B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090753B (en) * 2018-10-24 2020-11-20 马上消费金融股份有限公司 Training method of classification model, classification method, device and computer storage medium
CN109753966A (en) * 2018-12-16 2019-05-14 初速度(苏州)科技有限公司 A kind of Text region training system and method
CN109376556B (en) * 2018-12-17 2020-12-18 华中科技大学 Attack method for EEG brain-computer interface based on convolutional neural network
CN109886342A (en) * 2019-02-26 2019-06-14 视睿(杭州)信息科技有限公司 Model training method and device based on machine learning
CN111930476B (en) * 2019-05-13 2024-02-27 百度(中国)有限公司 Task scheduling method and device and electronic equipment
CN110263865B (en) * 2019-06-24 2021-11-02 北方民族大学 Semi-supervised multi-mode multi-class image translation method
CN110741388B (en) * 2019-08-14 2023-04-14 东莞理工学院 Confrontation sample detection method and device, computing equipment and computer storage medium
CN110472737B (en) * 2019-08-15 2023-11-17 腾讯医疗健康(深圳)有限公司 Training method and device for neural network model and medical image processing system
CN112307860A (en) * 2019-10-10 2021-02-02 北京沃东天骏信息技术有限公司 Image recognition model training method and device and image recognition method and device
CN111667027B (en) * 2020-07-03 2022-11-11 腾讯科技(深圳)有限公司 Multi-modal image segmentation model training method, image processing method and device
CN112115781B (en) * 2020-08-11 2022-08-16 西安交通大学 Unsupervised pedestrian re-identification method based on anti-attack sample and multi-view clustering
CN112016523B (en) * 2020-09-25 2023-08-29 北京百度网讯科技有限公司 Cross-modal face recognition method, device, equipment and storage medium
CN112668671B (en) * 2021-03-15 2021-12-24 北京百度网讯科技有限公司 Method and device for acquiring pre-training model
CN113178189B (en) * 2021-04-27 2023-10-27 科大讯飞股份有限公司 Information classification method and device and information classification model training method and device
CN113343936B (en) * 2021-07-15 2024-07-12 北京达佳互联信息技术有限公司 Training method and training device for video characterization model
CN113449700B (en) * 2021-08-30 2021-11-23 腾讯科技(深圳)有限公司 Training of video classification model, video classification method, device, equipment and medium
CN115600091B (en) * 2022-12-16 2023-03-10 珠海圣美生物诊断技术有限公司 Classification model recommendation method and device based on multi-modal feature fusion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999938A (en) * 2011-03-09 2013-03-27 西门子公司 Method and system for model-based fusion of multi-modal volumetric images
CN103166830A (en) * 2011-12-14 2013-06-19 中国电信股份有限公司 Spam email filtering system and method capable of intelligently selecting training samples
CN106951919A (en) * 2017-03-02 2017-07-14 浙江工业大学 A kind of flow monitoring implementation method based on confrontation generation network
CN107392312A (en) * 2017-06-01 2017-11-24 华南理工大学 A kind of dynamic adjustment algorithm based on DCGAN performances
CN107392125A (en) * 2017-07-11 2017-11-24 中国科学院上海高等研究院 Training method/system, computer-readable recording medium and the terminal of model of mind

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030197A (en) * 2006-02-28 2007-09-05 株式会社东芝 Method and apparatus for bilingual word alignment, method and apparatus for training bilingual word alignment model
CN107729513B (en) * 2017-10-25 2020-12-01 鲁东大学 Discrete supervision cross-modal Hash retrieval method based on semantic alignment
CN107958216A (en) * 2017-11-27 2018-04-24 沈阳航空航天大学 Based on semi-supervised multi-modal deep learning sorting technique

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102999938A (en) * 2011-03-09 2013-03-27 西门子公司 Method and system for model-based fusion of multi-modal volumetric images
CN103166830A (en) * 2011-12-14 2013-06-19 中国电信股份有限公司 Spam email filtering system and method capable of intelligently selecting training samples
CN106951919A (en) * 2017-03-02 2017-07-14 浙江工业大学 A kind of flow monitoring implementation method based on confrontation generation network
CN107392312A (en) * 2017-06-01 2017-11-24 华南理工大学 A kind of dynamic adjustment algorithm based on DCGAN performances
CN107392125A (en) * 2017-07-11 2017-11-24 中国科学院上海高等研究院 Training method/system, computer-readable recording medium and the terminal of model of mind

Also Published As

Publication number Publication date
CN108664999A (en) 2018-10-16

Similar Documents

Publication Publication Date Title
CN108664999B (en) Training method and device of classification model and computer server
CN111797893B (en) Neural network training method, image classification system and related equipment
WO2022017245A1 (en) Text recognition network, neural network training method, and related device
CN108875522B (en) Face clustering method, device and system and storage medium
CN111275107A (en) Multi-label scene image classification method and device based on transfer learning
CN111461174B (en) Multi-mode label recommendation model construction method and device based on multi-level attention mechanism
CN112559784A (en) Image classification method and system based on incremental learning
CN105069424A (en) Quick recognition system and method for face
Gu et al. A novel lightweight real-time traffic sign detection integration framework based on YOLOv4
Li et al. FRD-CNN: Object detection based on small-scale convolutional neural networks and feature reuse
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN112069884A (en) Violent video classification method, system and storage medium
Sun et al. Adaptive multi-lane detection based on robust instance segmentation for intelligent vehicles
CN112529068B (en) Multi-view image classification method, system, computer equipment and storage medium
Heo et al. Estimation of pedestrian pose orientation using soft target training based on teacher–student framework
Bezak Building recognition system based on deep learning
CN114581710A (en) Image recognition method, device, equipment, readable storage medium and program product
CN114330588A (en) Picture classification method, picture classification model training method and related device
CN116434033A (en) Cross-modal contrast learning method and system for RGB-D image dense prediction task
CN110175588B (en) Meta learning-based few-sample facial expression recognition method and system
CN112131506B (en) Webpage classification method, terminal equipment and storage medium
CN111783688B (en) Remote sensing image scene classification method based on convolutional neural network
Liu et al. A framework for short video recognition based on motion estimation and feature curves on SPD manifolds
Ullah et al. A review of multi-modal learning from the text-guided visual processing viewpoint
CN111143544B (en) Method and device for extracting bar graph information based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200324

Address after: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District

Applicant after: BEIJING TUSENZHITU TECHNOLOGY Co.,Ltd.

Address before: 101300, No. two, 1 road, Shunyi Park, Zhongguancun science and Technology Park, Beijing, Shunyi District

Applicant before: TuSimple

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant