CN115099293B

CN115099293B - Model training method and device, electronic equipment and storage medium

Info

Publication number: CN115099293B
Application number: CN202210226361.2A
Authority: CN
Inventors: 陈亦新; 王焱; 陈晓天; 任艺柯; 张培芳; 吴振洲
Original assignee: Beijing Ande Yizhi Technology Co ltd
Current assignee: Beijing Ande Yizhi Technology Co ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2023-04-18
Anticipated expiration: 2042-03-09
Also published as: CN115099293A

Abstract

The disclosure relates to a model training method and apparatus, an electronic device, and a storage medium. The method comprises the following steps: acquiring target domain data and source domain data, wherein target objects contained in the target domain data and source domain data are the same, target objects contained in the source domain data are provided with labels, and the target domain data and the source domain data comprise image data; obtaining fitting data according to the target domain data and the source domain data, wherein the fitting data comprise style characteristics of the target domain data and structural characteristics of the source domain data, and the structural characteristics are information of spatial position and layout of a target object; training an initial model according to the fitting data and the label to obtain an intermediate model; and performing self-supervision training on the intermediate model according to the target domain data to obtain a target model, wherein the target model is used for identifying a target object contained in the target domain data. Through the process, the identification capability of the model to cross-domain label-free data is effectively improved, and therefore the generalization capability of the model among different data sets is improved.

Description

Model training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a model training method and apparatus, an electronic device, and a storage medium.

Background

The performance of deep learning models on different data sets across domains may suffer a large degradation. For example: segmentation models trained on Magnetic Resonance Imaging (MRI) data can only be referenced on MRI data, which can be poorly represented by Computerized Tomography (CT) data. Under the condition of only labeling MRI data, a conventional training method cannot be used, so that the model can well deal with the CT data under the condition of not having labeling information of the CT data.

Disclosure of Invention

In view of this, the present disclosure provides a model training technical solution.

According to an aspect of the present disclosure, there is provided a model training method, including: acquiring target domain data and source domain data, wherein the target domain data is the same as a target object contained in the source domain data, the target object contained in the source domain data has a label, and the target domain data and the source domain data comprise image data; obtaining fitting data according to the target domain data and the source domain data, wherein the fitting data comprise style characteristics of the target domain data and structural characteristics of the source domain data, and the structural characteristics are information of the spatial position and layout of the target object; training an initial model according to the fitting data and the label to obtain an intermediate model; and performing self-supervision training on the intermediate model according to the target domain data to obtain a target model, wherein the target model is used for identifying a target object contained in the target domain data.

In a possible implementation manner, the obtaining fitting data according to the target domain data and the source domain data includes: inputting the target domain data and the source domain data into a countermeasure generation network, the countermeasure generation network comprising an extraction network and a generation network; extracting style features of the target domain data and structural features of the source domain data according to the extraction network; generating fitting data containing the style characteristics of the source domain data and the structural characteristics of the target domain data according to the generation network; wherein the countermeasure generating network is obtained by performing countermeasure training on the target domain data and the source domain data.

In one possible implementation, the countermeasure generation network is obtained by performing countermeasure training on the target domain data and the source domain data, and includes: extracting a first style characteristic and a first structural characteristic of the target domain data and a second structural characteristic and a second style characteristic of the source domain data according to the extraction network; generating first reconstruction data, first conversion data, second conversion data and second reconstruction data according to the generation network, wherein the first reconstruction data comprises a first style feature and a first structural feature, the first conversion data comprises a first style feature and a second structural feature, the second conversion data comprises the first structural feature and a second style feature, and the second reconstruction data comprises a second structural feature and a second style feature; judging whether the first reconstruction data, the first conversion data, the second conversion data and the second reconstruction data are true or false according to the judging network to obtain a judging result; and adjusting parameters of the countermeasure generating network according to the judgment result.

In a possible implementation manner, adjusting parameters of the countermeasure generating network according to the determination result includes: generating a first reconstruction loss from a difference between the target domain data and the first reconstruction data; generating a second reconstruction loss from a difference between the source domain data and the second reconstruction data; generating a first discrimination loss according to a discrimination result of the first reconstruction data output by the discrimination network; generating a second discrimination loss according to a discrimination result of the first conversion data output by the discrimination network; generating a third discrimination loss according to a discrimination result of the second conversion data output by the discrimination network; generating a fourth discrimination loss according to a discrimination result of the second reconstruction data output by the discrimination network; and according to the first reconstruction loss, the second reconstruction loss, the first discrimination loss, the second discrimination loss, the third discrimination loss and the fourth discrimination loss, performing countermeasure training on the generation network and the discrimination network to obtain a trained neural network.

In a possible implementation manner, the performing an auto-supervised training on the intermediate model according to the target domain data to obtain a target model includes: inputting the target domain data into the intermediate model to obtain an output result as a pseudo label of the target domain data; training the intermediate model according to the target domain data and the pseudo label to obtain an updated intermediate model; and iteratively executing the steps on the intermediate model until a target model meeting preset conditions is obtained.

In a possible implementation manner, after the target domain data is input to the intermediate model to obtain an output result and used as a pseudo tag of the target domain data, the intermediate model is trained according to the target domain data and the pseudo tag, and before an updated intermediate model is obtained, the intermediate model is self-supervised trained according to the target domain data to obtain a target model, including: initializing parameters of the intermediate model.

In a possible implementation manner, the iteratively performing the above steps on the intermediate model until a target model meeting a preset condition is obtained includes: according to the probability P _N-i Extracting an output result of the intermediate model obtained by the N-i iteration as a pseudo label of target domain data used in the N iteration, wherein N is an integer larger than 1, i is a positive integer smaller than N, and P is _N-i The value of (d) is positively correlated with the value of N-i.

According to another aspect of the present disclosure, there is provided a model training apparatus including: a data obtaining module, configured to obtain target domain data and source domain data, where the target domain data is the same as a target object included in the source domain data, the target object included in the source domain data has a tag, and the target domain data and the source domain data include image data; a fitting data generation module, configured to obtain fitting data according to the target domain data and the source domain data, where the fitting data includes a style characteristic of the target domain data and a structural characteristic of the source domain data, and the structural characteristic is information of a spatial position and a layout of the target object; the full-supervision training module is used for training an initial model according to the fitting data and the labels to obtain an intermediate model; and the self-supervision training module is used for carrying out self-supervision training on the intermediate model according to the target domain data to obtain a target model, and the target model is used for identifying a target object contained in the target domain data.

In one possible implementation, the fitting data generating module includes: a data input submodule for inputting the target domain data and the source domain data into a countermeasure generation network, the countermeasure generation network including an extraction network and a generation network; the feature extraction submodule is used for extracting the style features of the target domain data and the structural features of the source domain data according to the extraction network; the fitting data generation sub-module is used for generating fitting data containing the style characteristics of the source domain data and the structural characteristics of the target domain data according to the generation network; wherein the countermeasure generating network is obtained by performing countermeasure training on the target domain data and the source domain data.

In one possible implementation, the countermeasure generation network is obtained by performing countermeasure training on the target domain data and the source domain data, and includes: extracting a first style characteristic and a first structural characteristic of the target domain data and a second structural characteristic and a second style characteristic of the source domain data according to the extraction network; generating first reconstruction data, first conversion data, second conversion data and second reconstruction data according to the generation network, wherein the first reconstruction data comprises a first style feature and a first structural feature, the first conversion data comprises a first style feature and a second structural feature, the second conversion data comprises the first structural feature and a second style feature, and the second reconstruction data comprises a second structural feature and a second style feature; judging whether the first reconstruction data, the first conversion data, the second conversion data and the second reconstruction data are true or false according to the judging network to obtain a judging result; and adjusting parameters of the confrontation generation network according to the judgment result.

In a possible implementation manner, the adjusting, according to the determination result, a parameter of the countermeasure generating network includes: generating a first reconstruction loss from a difference between the target domain data and the first reconstruction data; generating a second reconstruction loss from a difference between the source domain data and the second reconstruction data; generating a first discrimination loss according to a discrimination result of the first reconstruction data output by the discrimination network; generating a second discrimination loss according to a discrimination result of the first conversion data output by the discrimination network; generating a third discrimination loss according to a discrimination result of the second conversion data output by the discrimination network; generating a fourth discrimination loss according to the discrimination result of the second reconstruction data output by the discrimination network; and carrying out countermeasure training on the generation network and the discrimination network according to the first reconstruction loss, the second reconstruction loss, the first discrimination loss, the second discrimination loss, the third discrimination loss and the fourth discrimination loss to obtain a trained neural network.

In one possible implementation, the self-supervised training module includes: the pseudo label obtaining submodule is used for inputting the target domain data into the intermediate model to obtain an output result which is used as a pseudo label of the target domain data; the intermediate model training letter module is used for training the intermediate model according to the target domain data and the pseudo label to obtain an updated intermediate model; and iteratively executing the steps on the intermediate model until a target model meeting preset conditions is obtained.

In one possible implementation manner, the pseudo tag obtaining sub-module includes: and the initialization submodule is used for initializing the parameters of the intermediate model.

In one possible implementation, the self-supervised training module includes: a pseudo tag update submodule for updating the pseudo tag according to the probability P _N-i Extracting an output result of the intermediate model obtained by the N-i iterations as a pseudo label of target domain data used in the Nth iteration, wherein N is an integer larger than 1, i is a positive integer smaller than N, and P _N-i The value of (b) is positively correlated with the value of N-i.

According to another aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the memory-stored instructions.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the above-described method.

According to another aspect of the present disclosure, there is provided a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above method.

In the embodiment of the disclosure, the initial model training is performed through the fitting data generated by the structure characteristics of the labeled source domain data and the style characteristics of the unlabeled target domain data and the label of the source domain data to generate the intermediate model, and because the fitting data has the style characteristics of the target domain data, the recognition capability of the intermediate model to the target domain data is improved compared with the initial model in the training process; the target model is obtained through the self-supervision training of the target domain data on the intermediate model, and due to the fact that the structural characteristics of the target domain data are further learned, the recognition capability of the target model on the target domain data compared with the intermediate model is further improved in the training process, and finally the model with better recognition capability on the target domain data without labels is obtained. Through the process, the identification capability of the model to cross-domain label-free data is effectively improved, and therefore the generalization capability of the model among different data sets is improved.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow diagram of a model training method according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of abdominal CT and MR views according to an embodiment of the present disclosure.

Fig. 3 shows a schematic diagram of a countermeasure generation network training method according to an embodiment of the present disclosure.

Fig. 4 shows a comparison of model prediction results according to an application example of the present disclosure.

FIG. 5 shows a block diagram of a model training apparatus according to an embodiment of the present disclosure.

Fig. 6 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

FIG. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, C, and may mean including any one or more elements selected from the group consisting of a, B, and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of a model training method according to an embodiment of the present disclosure, which may be applied to a model training apparatus, which may be a terminal device, a server, or other processing device. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like.

In some possible implementations, the model training method may be implemented by a processor calling computer readable instructions stored in a memory.

As shown in fig. 1, the model training method may include:

step S11, target domain data and source domain data are obtained, wherein the target domain data and the target object contained in the source domain data are the same, the target object contained in the source domain data has a label, and the target domain data and the source domain data comprise image data.

And S12, obtaining fitting data according to the target domain data and the source domain data, wherein the fitting data comprises style characteristics of the target domain data and structural characteristics of the source domain data, and the structural characteristics are information of the spatial position and layout of the target object.

And S13, training an initial model according to the fitting data and the label to obtain an intermediate model.

And S14, performing self-supervision training on the intermediate model according to the target domain data to obtain a target model, wherein the target model is used for identifying a target object contained in the target domain data.

Wherein the source domain data is a labeled data set that has been labeled. In one possible implementation, the source domain data may be a data set of image data or a data set of video data. Further, in one possible implementation, the source domain data may also be other types of data sets that have been labeled. In one possible implementation, the label of the source domain data may be a label of a category of the target object. The labeling information is manually labeled or labeled in other manners, which is not limited in the embodiment of the present disclosure. In the following disclosed embodiments, source domain data is taken as image data for example and explanation, and in the case that the source domain data is data in other forms, the processing method can be flexibly extended according to the method provided by the disclosed embodiments, which is not illustrated.

The target domain data may be a data set of the same data type as the source domain data that is not labeled, and the target object contained by the target domain data and the source domain data may be the same. In a possible implementation manner, the number of the target objects may be one or more, and the number of the target objects is not specifically limited in the embodiment of the present disclosure. In one example, the target object may be an organ or a part in a medical image, such as a heart, an abdomen, and the like, and the specific type of the target object is not specifically limited in the embodiments of the present disclosure, and may be flexibly selected according to actual situations. When the target object is an organ in a medical image, the target domain data and the source domain data may be a data set of various medical images such as a Magnetic Resonance Imaging (MRI) data set, a Computed Tomography (CT) data set, and the like. In one possible implementation, the target domain data and the source domain data may be datasets of two different medical images, e.g., the target domain data may be an MRI dataset and the source domain data may be a CT dataset; alternatively, the target domain data may be a CT data set and the source domain data may be an MRI data set. In one possible implementation, the target domain data and the source domain data may also be data sets of the same medical image, for example, the target domain data and the source domain data may both be MRI data sets or may both be CT data sets, but the sources are different so that the target domain data and the source domain data have different styles and/or structures, and in particular, the target domain data and the source domain data may be data of two different models or two different hospitals respectively. In the following disclosure embodiments, the target domain data is an MRI data set, the source domain data is a CT data set, or the target domain data is a CT data set, and the source domain data is an MRI data set, and in the case that the target domain data and the source domain data are in other forms, the processing method thereof may be flexibly extended according to the method provided in the embodiments of the present disclosure, which is not illustrated.

As described in the foregoing disclosure embodiments, the implementation forms of the source domain data and the target domain data can be flexibly determined according to actual situations. Therefore, the method for acquiring the source domain data and the target domain data can be flexibly determined according to actual conditions. In a possible implementation manner, the source domain data and the target domain data may be obtained by directly performing data transmission through the medical imaging device. The remaining possible implementation manners of step S11 may be flexibly selected according to actual situations, and are not illustrated one by one.

After the source domain data and the target domain data including the target object are obtained, the source domain data and the target domain data may be fitted in step S12 to obtain fitting data. In one possible implementation, the fitting data has structural features of the source domain data and stylistic features of the target domain data. Specifically, the style characteristics are information for distinguishing the style of the target domain data and the source domain data, specifically, the style characteristics may be characteristics such as color and texture that embody the style of an image or a video, and the target domain data and the source domain data may be distinguished as an MRI data set or a CT data set by the style characteristics. For example, it is obvious from fig. 2 that the styles of the abdominal CT and MR raw images are different, wherein fig. 2 (1) is an abdominal nuclear magnetic raw image and fig. 2 (3) is an abdominal CT raw image. The structural feature refers to information on a spatial position and layout of the target object, and may specifically be a feature such as an outline of the target object. When the target domain data is an MRI data set and the source domain data is a CT data set, the fitting data obtained in step S12 is an MRI-style CT data set. When the target domain data is a CT data set and the source domain data is an MRI data set, the fitting data obtained in step S12 is CT-style MRI data. The specific generation method of the fitting data can be flexibly determined according to actual conditions, and can be referred to the following disclosed embodiments, and the development is not performed at first. Fig. 2 shows the fitting effect of the CT image and the MRI image, fig. 2 (2) is a CT-style MR image, and fig. 2 (4) is an MR-style CT image.

Since the source domain data is labeled, the fitting data generated by the source domain data and the target domain data may also carry the same label as the source domain data, and therefore, in step S13, the initial model may be trained according to the fitting data with the label to obtain an intermediate model. Compared with the initial model, the intermediate model has better capability of identifying the target domain data due to the fact that the style characteristics of the target domain data are learned.

The self-supervision training trains the deep neural network by using label-free data to finely adjust the parameters of the neural network, so that the neural network has better effect. Specifically, the self-supervised training method is to use data and its labels for training, which are pseudo labels automatically generated by using image attributes or through a conventional feature design method. Since the self-supervision training mode requires that the model has certain performance, and the intermediate model already has partial segmentation performance of the target domain data through the steps S12 and S13, the recognition capability of the model on the target object in the target domain data can be further improved through the self-supervision training mode after the intermediate model is obtained.

After the intermediate model is obtained, the target domain data can be input into the intermediate model, and the intermediate model is subjected to self-supervision training so as to further learn the structural characteristics of the target domain data, obtain the target model and improve the recognition capability of the target model on the target domain object. In one possible implementation, the intermediate model may be iteratively trained based on the target domain data and its pseudo-labels. Specifically, after the target domain data is input into the intermediate model, a pseudo label of the target domain data can be obtained, where the pseudo label is a prediction result of the intermediate model on the target domain data. And then, the pseudo label and the target domain data can be input into the intermediate model together for one-time training to obtain a more accurate pseudo label and an updated intermediate model. The training process may be iterated multiple times. The present disclosure is not limited to a particular number of iterations. The target model obtained through the self-supervision learning of the target domain data has higher recognition capability on the target object in the target domain data compared with the intermediate model because other characteristics except the style characteristics of the target domain data are learned. In a possible implementation manner, the target model can be used for tasks such as detection and segmentation of the target object besides the identification of the target object. The present disclosure does not specifically limit this.

In the embodiment of the disclosure, the initial model is trained through fitting data generated by the structure characteristics of labeled source domain data and the style characteristics of unlabeled target domain data and the label of the source domain data to generate an intermediate model, and because the fitting data has the style characteristics of the target domain data, the recognition capability of the intermediate model to the target domain data is improved compared with the initial model in the training process; the target model is obtained through the self-supervision training of the target domain data on the intermediate model, and due to the fact that the structural characteristics of the target domain data are further learned, the recognition capability of the target model on the target domain data is further improved compared with the intermediate model in the training process, and finally the model with better recognition capability on the target domain data without labels is obtained. Through the process, the identification capability of the model to cross-domain label-free data is effectively improved, and therefore the generalization capability of the model among different data sets is improved.

The method solves the main problem of how to learn new samples from the existing training samples. The model typically includes at least two modules: the method comprises the steps of generating a Model (Generative Model) and a discriminant Model (discriminant Model), wherein the generating Model is used for generating new data, the new data are as real as possible, the discriminant network is used for judging whether the new data come from a real sample set and a false sample set, and quite good output can be generated through mutual game learning of the generating Model and the discriminant Model. The confrontation generation network can generate a target data set to make up for the defect of insufficient training data, so that the method is significant for deep learning.

In one possible implementation, the fitting data may be generated by the countermeasure generation network using existing target domain data and source domain data. Specifically, the target domain data and the source domain data may be input into a pair of trained anti-generation networks to generate fitting data having the style characteristics of the target domain data and the structural characteristics of the source domain data. The specific implementation form of the countermeasure generation network is not limited in the embodiment of the present disclosure, and any countermeasure generation network that can generate fitting data according to the structural feature of the target domain data and the style feature of the source domain data may be used. In one possible implementation, the challenge-generating network may perform challenge training with the target domain data and the source domain data to generate a challenge-generating network that may generate truer fitting data. The method for training the confrontation generation network is not limited, can be flexibly selected according to actual conditions, and the loss function adopted by training can also be flexibly selected according to the actual conditions.

In one possible implementation, the countermeasure generation network may include an extraction network and a generation network. Specifically, the style characteristics of the target domain data and the structural characteristics of the source domain data may be extracted according to the extraction network; fitting data including the style characteristics of the source domain data and the structural characteristics of the target domain data may be generated from the generation network. In one possible implementation, the extraction network and the generation network may be deep convolutional networks.

In the embodiment of the disclosure, the structural features of the target domain data and the style features of the source domain data are extracted and fitted through the countermeasure generation network, and then fitted data are generated. The truth of the fitting data directly determines the training quality of the subsequent initial model. Because the new data generated by the antagonistic generating network has higher authenticity, the authenticity of the generated fitting data can be improved through the antagonistic generating network, and the subsequent training quality of the initial model is further improved.

In some possible implementations, the countermeasure generation network is obtained by performing countermeasure training on the target domain data and the source domain data, and includes:

extracting a first style characteristic and a first structural characteristic of the target domain data and a second structural characteristic and a second style characteristic of the source domain data according to the extraction network;

generating first reconstruction data, first conversion data, second conversion data and second reconstruction data according to the generation network, wherein the first reconstruction data comprises a first style feature and a first structural feature, the first conversion data comprises a first style feature and a second structural feature, the second conversion data comprises the first structural feature and a second style feature, and the second reconstruction data comprises a second structural feature and a second style feature;

judging whether the first reconstruction data, the first conversion data, the second conversion data and the second reconstruction data are true or false according to the judging network to obtain a judging result;

and adjusting parameters of the countermeasure generating network according to the judgment result.

In one possible implementation, as shown in fig. 3, the countermeasure generation network may include an extraction network, a generation network, and a discrimination network. In the network training process, the target of the extraction network is to extract the characteristics of target domain data and source domain data, the target of the generation network is to generate data which can deceive the discrimination network as much as possible according to the characteristics extracted by the extraction network, namely the data generated by the generation network is expected to be more like real data with a first style characteristic or a second style characteristic, and the target of the discrimination network is to distinguish the data generated by the generation network and having the first style characteristic from the real target domain data as much as possible and distinguish the data having the second style characteristic from the real source domain data. The present disclosure does not limit the network structure of the extracting network, the generating network, and the discriminating network.

In one possible implementation, as shown in fig. 3, there are 2 extraction networks for extracting the style feature and the structural feature from the target domain data and the source domain data, respectively. Specifically, a first style feature and a first structural feature are extracted from the target domain data, and a second structural feature and a second style feature are extracted from the source domain data. And 4 generating networks are used for extracting style characteristics and structural characteristics from the target domain data and the source domain data to generate new data. Specifically, the 4 generation networks may generate first reconstruction data according to the first style characteristic and the first structural characteristic, generate first conversion data according to the first style characteristic and the second structural characteristic, generate second conversion data according to the first structural characteristic and the second style characteristic, and generate second reconstruction data according to the second structural characteristic and the second style characteristic, and it can be seen that the first conversion data and the second conversion data are data with converted styles, and the first reconstruction data and the second reconstruction data are reconstruction data of the target domain data and the source domain data. The number of the discrimination networks is 2, and the discrimination networks can discriminate the truth and the falseness of the data generated by the generation network. Specifically, the two discrimination networks may discriminate the authenticity of the first reconstruction data and the first conversion data, and the authenticity of the second conversion data and the second reconstruction data, respectively, so that the generation network generates an image which is falsified by reality. During the training process, the quality of the first reconstruction data, the first conversion data, the second conversion data and the second reconstruction data generated by the generation network and the discrimination capability of the discrimination network are improved interactively.

In the disclosed embodiments, training of the challenge generating network is achieved by the target domain data and the source domain data. The training process specifically comprises training of a generation network for generating images and training of a discrimination network for discriminating the networks, the antagonism generation network can generate real fitting data through antagonism of the generation network and the discrimination network, accuracy of the fitting data is improved, and model identification accuracy is improved for learning style characteristics of target domain data of an initial model trained through the fitting data subsequently.

In a possible implementation manner, the adjusting, according to the determination result, a parameter of the countermeasure generation network includes:

generating a first reconstruction loss from a difference between the target domain data and the first reconstruction data;

generating a second reconstruction loss from a difference between the source domain data and the second reconstruction data;

generating a first discrimination loss according to a discrimination result of the first reconstruction data output by the discrimination network;

generating a second discrimination loss according to a discrimination result of the first conversion data output by the discrimination network;

generating a third discrimination loss according to a discrimination result of the second conversion data output by the discrimination network;

generating a fourth discrimination loss according to the discrimination result of the second reconstruction data output by the discrimination network;

and carrying out countermeasure training on the generation network and the discrimination network according to the first reconstruction loss, the second reconstruction loss, the first discrimination loss, the second discrimination loss, the third discrimination loss and the fourth discrimination loss to obtain a trained neural network.

Since the training process of the antagonistic generating network mainly includes training of the generating network for generating the image and training of the discriminative network for the discriminative network, in one possible implementation, the loss function of the antagonistic generating network may include two parts: generating the loss of the network and judging the loss of the network. In one possible implementation, the loss of the generation network may include a loss of reconstruction consistency between the newly generated first reconstruction data and the original target domain data, and a loss of reconstruction consistency between the newly generated fourth reconstruction data and the original source domain data. In one possible implementation, the reconstruction consistency loss may be calculated by MAE to evaluate the absolute error. The calculation mode of the reestablishment consistency loss is not particularly limited in the present disclosure. In one possible implementation, determining the loss of the network may include determining the loss of all reconstructed data, converted data, and true or false. Since the discrimination network is a binary task, in one possible implementation, the loss function used may be a binary cross entropy. The calculation method of the discrimination loss is not particularly limited in the present disclosure.

In the embodiment of the disclosure, parameters of the generation network and the discrimination network are optimized by iteration for multiple times according to the loss function, so that the loss is converged, an image generated by the image generation network can be more real, the discrimination capability of the discrimination network is improved, and balance between the discrimination network and the generation network is achieved.

After the initial model learning is carried out on the label learning of the fitting data and the source domain data, an intermediate model is obtained, and as the fitting data only contain the style features in the target domain data, the intermediate model does not learn other features except the style features in the target domain data, so that the identification capability of the intermediate model on the target object in the target domain data is further improved, and the target domain data without the label is not required to be further learned.

In recent years, self-supervised learning has achieved significant success in visual perception tasks such as image recognition, detection, and segmentation. By means of the learning mode, sample labels can be automatically constructed from large-scale label-free data, network self-supervision training is achieved, and feature representations which are valuable to original tasks are learned.

In one possible implementation, the intermediate model may be self-supervised trained to learn features other than the style feature in the target domain data. The self-supervised learning approach is trained using data and its labels, which are pseudo-labels generated automatically using image attributes or by traditional feature design methods. In a possible implementation manner, the target domain data may be input into the intermediate model, and an output result is obtained as a pseudo tag of the target domain data. And after the pseudo label of the target domain data is obtained, training the intermediate model by using the target domain data and the pseudo label thereof to obtain an updated intermediate model. It is apparent that the updated intermediate model has learned other features of the target domain data than the style feature.

After the learning of the target domain data by the intermediate model is completed, the identification capability of the intermediate model for the target object in the target domain data is further enhanced, the target domain data can be input into the intermediate model again to obtain a more accurate pseudo label, and then the intermediate model is trained again by using the more accurate pseudo label and the target domain data, so that the identification capability of the intermediate model for the target domain data is further improved. In a possible implementation manner, the above steps may be iteratively performed on the intermediate model until a target model satisfying a preset condition is obtained. The number of iterations is not specifically limited in the present disclosure, and may be selected according to actual circumstances.

In the process of iteratively training the intermediate model, the noise generated by inaccurate pseudo labels can be continuously transmitted, and when the noise is too large, the model is broken down. Therefore, how to deal with the noise generated by the pseudo-tag has a crucial influence on the final performance of the model.

In order to block the transfer of the pseudo-tag noise, in one possible implementation, the parameters of the intermediate model may be initialized during each iteration, after the generation of the pseudo-tag and before the training of the intermediate model. In this embodiment, since the parameter initialization of the intermediate model is performed in each iteration process, that is, the intermediate model is retrained, the intermediate model in the previous iteration cannot be completely inherited, and the model crash caused by the noise transfer of the pseudo label can be avoided.

In one possible implementation, according to the probability P _N-i Extracting an output result of the intermediate model obtained by the N-i iteration as a pseudo label of target domain data used in the N iteration, wherein N is an integer larger than 1, i is a positive integer smaller than N, and P is _N-i The value of (d) is positively correlated with the value of N-i. In particular, P _N-i The specific value of (c) can be calculated by formula (1).

（1）

The S represents an attenuation coefficient and is used for controlling the value of P, so that the influence of a pseudo label output by the intermediate model in the iteration closer to the Nth iteration on a pseudo label of a training target model in the Nth iteration is larger; n represents an iteration number; n represents the number of intermediate models used to update the pseudo label for the nth iteration; i ∈ {1,2,..., n }.

In a possible implementation manner, the values of S and n may be set according to actual situations, which is not limited in this disclosure. When S =0.6 and n =3, S ¹ =0.6，S ² =0.6 ² =0.36，S ³ =0.6 ³ =0.216, calculated as P _N-1 =0.51，P _N-2 =0.31，P _N-3 =0.18. The pseudo label of each data in the target domain data of the Nth iteration extracts the pseudo label output by the intermediate model in the N-1 th iteration with a probability of 0.51, the pseudo label output by the intermediate model in the N-2 th iteration with a probability of 0.31, and the pseudo label output by the intermediate model in the N-3 rd iteration with a probability of 0.18.

In this embodiment, because the pseudo label used in each iteration process extracts the pseudo label output by the intermediate model in multiple iterations before the iteration according to a certain probability, the pseudo label generated by the intermediate model trained in the previous iteration process is not completely selected, noise generated by the pseudo label can be reduced, model collapse caused by pseudo label noise is avoided, and P is a product of model collapse _N-i The value of the intermediate model is positively correlated with the value of the N-i, so that the influence of the pseudo label output by the intermediate model after the iteration times on the model parameters is larger, the intermediate model tends to converge, and the accuracy of the obtained target model is higher.

In one possible implementation, the method further includes: and performing data preprocessing on the target domain data and the source domain data to ensure that the data quality of the target domain data is consistent with that of the source domain data. In one possible implementation, the preprocessing may include size normalization and normalization. Specifically, the size normalization may include modifying the image to a resolution of 256x256 in size, and the data of the size is not particularly limited by the present disclosure. The normalization may include distributing the target domain data and the source domain data between 0 to 1.

Application scenario examples

The segmentation model trained on electron Computed Tomography (CT) images can only be applied to CT images, which are poorly represented by Magnetic Resonance Imaging (MRI) images. Under the condition of only labeling a CT image, a conventional training method cannot be used, so that the model can well cope with the MRI image under the condition of not having labeling information of the MRI image. Therefore, how to train the initial model by using the labeled CT image and make the trained target model have a better identification effect on the MRI image becomes a problem to be solved at present.

The embodiment of the disclosure provides a model training method, which can realize good identification of an initial model to a CT image through combination of style migration and self-supervision training, and the process of model training is roughly divided into four steps.

In a first step, a CT image and an MRI image (wherein the target domain is the MRI image and the source domain is the CT image) are acquired, respectively, both containing the same body organ. Preprocessing a CT image and an MRI image, specifically: modifying the image to 256 × 256 resolution, and then normalizing the image data to be distributed between 0 and 1.

And secondly, fitting the style characteristics of the MRI image and the structural characteristics of the CT image through a trained confrontation generation network to generate a fitting image. The countermeasure generation network includes an extraction network, a generation network, and a discrimination network. The antagonistic network is trained with CT images and MRI images.

And thirdly, training an initial model according to the label of the CT image and the fitting image obtained in the second step to obtain an intermediate model.

And fourthly, performing self-supervision training on the intermediate model obtained in the third step according to the MRI image to obtain a target model. Through the post-processing, a final target model is obtained, and the target model can have a good recognition result on human organs in the CT image.

Specifically, the training process against the generated network may include:

extracting a first style characteristic and a first structural characteristic of the MRI image and a second structural characteristic and a second style characteristic of the CT image according to the extraction network;

generating a first reconstructed image, a first converted image, a second converted image and a second reconstructed image according to a generation network, the first reconstructed image comprising a first style feature and a first structural feature, the first converted image comprising the first style feature and a second structural feature, the second converted image comprising the first structural feature and a second style feature, the second reconstructed image comprising the second structural feature and a second style feature;

and judging whether the first reconstructed image, the first converted image, the second converted image and the second reconstructed image are true or false according to a judging network to obtain a judging result.

The loss function has two: the method comprises an MAE loss function used for calculating the consistency loss of image reconstruction and a two-to-one cross entropy loss function used for calculating discriminant loss. And adjusting parameters of the countermeasure generating network according to the loss function.

Specifically, the self-supervision training process may include:

inputting the MRI image into the intermediate model to obtain an output result as a pseudo label of the MRI image;

training the intermediate model according to the MRI image and the pseudo label thereof to obtain an updated intermediate model;

and iteratively executing the steps on the intermediate model until a target model meeting preset conditions is obtained, wherein when the intermediate model is trained for the Nth time, the pseudo label input into the intermediate model is a pseudo label output by the intermediate model obtained from the N-3 th to the N-1 st iterations according to probability, specifically, the probability of selecting the pseudo label obtained for the N-1 st time can be 51%, the probability of selecting the pseudo label obtained for the N-2 nd time can be 31%, and the probability of selecting the pseudo label obtained for the N-3 rd time can be 18%.

Fig. 4 shows a comparison of the model prediction results obtained by artificial labeling and training of the abdomen, the first line of the graph of fig. 4 is a nuclear magnetic image, the second line of the graph is a region of labeling or model prediction, and different gray scales in the graph represent different regions. Fig. 4 (1) shows an artificially labeled region, fig. 4 (2) shows a predicted region directly trained using CT images and CT labeled data without using a domain migration and self-supervision method, fig. 4 (3) shows a predicted region trained using MRI-style CT labeled data using a domain migration method, and fig. 4 (4) shows a predicted region trained using a domain migration and self-supervision method. As can be seen from fig. 4, comparing fig. 4 (2), fig. 4 (3) and fig. 4 (4), fig. 4 (2) has the worst effect, fig. 4 (3) has an improved effect compared to fig. 4 (2), fig. 4 (4) has the best effect, and the predicted region obtained in fig. 4 (4) is the closest to the artificially labeled region in fig. 4 (1).

The method can also be used for the interconversion of other co-regional (such as cardiac) CT and MR medical image data.

Through the process, the initial model is trained through fitting data generated by the structural features of the labeled source domain data and the style features of the unlabeled target domain data and the labels of the source domain data to generate an intermediate model, and the intermediate model is improved in recognition capability of the intermediate model on the target domain data compared with the initial model due to the fact that the fitting data have the style features of the target domain data; the target model is obtained through the self-supervision training of the target domain data on the intermediate model, and due to the fact that the structural characteristics of the target domain data are further learned, the recognition capability of the target model on the target domain data is further improved compared with the intermediate model in the training process, and finally the model with better recognition capability on the target domain data without labels is obtained. Through the process, the identification capability of the model to the cross-domain label-free data is effectively improved, so that the generalization capability of the model among different data sets is improved.

It should be noted that the model training method according to the embodiment of the present disclosure is not limited to be applied to the processing of the medical image, and may be applied to any different domain data, which is not limited by the present disclosure.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a model training apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the model training methods provided by the present disclosure, and the corresponding technical solutions and descriptions thereof and the corresponding descriptions in the methods section are omitted for brevity.

FIG. 5 shows a block diagram of a model training apparatus according to an embodiment of the present disclosure. The model training device can be a terminal device, a server or other processing equipment and the like. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like.

In some possible implementations, the model training apparatus may be implemented by a processor calling computer readable instructions stored in a memory.

As shown in fig. 5, the model training apparatus 50 may include:

a data obtaining module 51, configured to obtain target domain data and source domain data, where the target domain data is the same as a target object included in the source domain data, the target object included in the source domain data has a tag, and the target domain data and the source domain data include image data;

a fitting data generating module 52, configured to obtain fitting data according to the target domain data and the source domain data, where the fitting data includes a style characteristic of the target domain data and a structural characteristic of the source domain data, and the structural characteristic is information of a spatial position and a layout of the target object;

a full-supervision training module 53, configured to train an initial model according to the fitting data and the label, to obtain an intermediate model;

and the self-supervision training module 54 performs self-supervision training on the intermediate model according to the target domain data to obtain a target model, wherein the target model is used for identifying a target object contained in the target domain data.

In one possible implementation, the fitting data generating module includes: the data input sub-module is used for inputting the target domain data and the source domain data into a countermeasure generation network, and the countermeasure generation network comprises an extraction network and a generation network; the feature extraction submodule is used for extracting the style features of the target domain data and the structural features of the source domain data according to the extraction network; the fitting data generation submodule is used for generating fitting data containing the style characteristics of the source domain data and the structural characteristics of the target domain data according to the generation network; wherein the countermeasure generation network is obtained by performing countermeasure training on the target domain data and the source domain data.

In a possible implementation manner, the adjusting, according to the determination result, a parameter of the countermeasure generating network includes: generating a first reconstruction loss from a difference between the target domain data and the first reconstruction data; generating a second reconstruction loss from a difference between the source domain data and the second reconstruction data; generating a first discrimination loss according to a discrimination result of the first reconstruction data output by the discrimination network; generating a second discrimination loss according to a discrimination result of the first conversion data output by the discrimination network; generating a third discrimination loss according to a discrimination result of the second conversion data output by the discrimination network; generating a fourth discrimination loss according to the discrimination result of the second reconstruction data output by the discrimination network; and according to the first reconstruction loss, the second reconstruction loss, the first discrimination loss, the second discrimination loss, the third discrimination loss and the fourth discrimination loss, performing countermeasure training on the generation network and the discrimination network to obtain a trained neural network.

In one possible implementation, the self-supervised training module includes: pseudo label update submodule for updating according to probability P _N-i Extracting an output result of the intermediate model obtained by the N-i iteration as a pseudo label of target domain data used in the N iteration, wherein N is an integer larger than 1, i is a positive integer smaller than N, and P is _N-i The value of (d) is positively correlated with the value of N-i.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing the model training method provided in any one of the above embodiments.

Embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed, cause a computer to perform the operations of the model training method provided in any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 6, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 7 illustrates a block diagram of an electronic device 1900 in accordance with an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the methods described above.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932 ^TM ，Mac OS X ^TM ，UnixTM, Linux ^TM ，FreeBSD ^TM Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as a memory 1932, is also provided that includes computer program instructions executable by a processing component 1922 of an electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the disclosure are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of model training, comprising:

acquiring target domain data and source domain data, wherein the target domain data is the same as a target object contained in the source domain data, the target object contained in the source domain data is provided with a label, the target domain data and the source domain data comprise image data, the target domain data is an MRI image, the source domain data is a CT image, and the target object is a human body organ;

obtaining fitting data according to the target domain data and the source domain data, wherein the fitting data comprise style characteristics of the target domain data and structural characteristics of the source domain data, the structural characteristics are information of the spatial position and layout of the target object, and the fitting data are fitting images;

training an initial model according to the fitting data and the label to obtain an intermediate model;

performing self-supervision training on the intermediate model according to the target domain data to obtain a target model, wherein the target model is used for identifying a target object contained in the target domain data;

obtaining fitting data according to the target domain data and the source domain data, including:

inputting the target domain data and the source domain data into a countermeasure generation network, the countermeasure generation network including an extraction network and a generation network;

extracting style characteristics of the target domain data and structural characteristics of the source domain data according to the extraction network;

generating fitting data containing the style characteristics of the source domain data and the structural characteristics of the target domain data according to the generation network;

wherein the countermeasure generating network is obtained by performing countermeasure training on the target domain data and the source domain data;

the countermeasure generating network is obtained by performing countermeasure training on the target domain data and the source domain data, and comprises:

generating first reconstruction data, first conversion data, second conversion data and second reconstruction data according to the generation network, wherein the first reconstruction data comprises a first style characteristic and a first structural characteristic, the first conversion data comprises a first style characteristic and a second structural characteristic, the second conversion data comprises the first structural characteristic and a second style characteristic, and the second reconstruction data comprises a second structural characteristic and a second style characteristic;

judging whether the first reconstruction data, the first conversion data, the second conversion data and the second reconstruction data are true or false according to a judging network to obtain a judging result;

and adjusting parameters of the confrontation generation network according to the judgment result.

2. The method of claim 1, wherein said adjusting parameters of said countermeasure generation network based on said determination comprises:

and according to the first reconstruction loss, the second reconstruction loss, the first discrimination loss, the second discrimination loss, the third discrimination loss and the fourth discrimination loss, performing countermeasure training on the generation network and the discrimination network to obtain a trained neural network.

3. The method of claim 1, wherein the performing an auto-supervised training of the intermediate model based on the target domain data to obtain a target model comprises:

inputting the target domain data into the intermediate model to obtain an output result as a pseudo label of the target domain data;

training the intermediate model according to the target domain data and the pseudo label to obtain an updated intermediate model;

and iteratively executing the steps on the intermediate model until a target model meeting preset conditions is obtained.

4. The method of claim 3, wherein after the target domain data is input into the intermediate model to obtain an output result as a pseudo label of the target domain data, the intermediate model is trained according to the target domain data and the pseudo label to obtain an updated intermediate model,

the self-supervision training of the intermediate model is carried out according to the target domain data to obtain a target model, and the method comprises the following steps:

initializing parameters of the intermediate model.

5. The method of claim 3, wherein the iteratively performing the above steps on the intermediate model until a target model satisfying a preset condition is obtained comprises:

according to the probability P _N-i Extracting an output result of the intermediate model obtained by the N-i iteration as a pseudo label of target domain data used in the N iteration, wherein N is an integer larger than 1, i is a positive integer smaller than N, and P is _N-i The value of (b) is positively correlated with the value of N-i.

6. A model training apparatus, comprising:

a data acquisition module, configured to acquire target domain data and source domain data, where the target domain data is the same as a target object included in the source domain data, the target object included in the source domain data has a tag, the target domain data and the source domain data include image data, the target domain data is an MRI image, the source domain data is a CT image, and the target object is a human organ;

a fitting data generation module, configured to obtain fitting data according to the target domain data and the source domain data, where the fitting data includes a style feature of the target domain data and a structural feature of the source domain data, the structural feature is information of a spatial position and a layout of the target object, and the fitting data is a fitting image;

the full-supervision training module is used for training an initial model according to the fitting data and the labels to obtain an intermediate model;

the self-supervision training module is used for carrying out self-supervision training on the intermediate model according to the target domain data to obtain a target model, and the target model is used for identifying a target object contained in the target domain data;

the fitting data generation module comprises: the data input sub-module is used for inputting the target domain data and the source domain data into a countermeasure generation network, and the countermeasure generation network comprises an extraction network and a generation network; the feature extraction submodule is used for extracting the style features of the target domain data and the structural features of the source domain data according to the extraction network; the fitting data generation submodule is used for generating fitting data containing the style characteristics of the source domain data and the structural characteristics of the target domain data according to the generation network; wherein the countermeasure generating network is obtained by performing countermeasure training on the target domain data and the source domain data;

the countermeasure generating network is obtained by performing countermeasure training on the target domain data and the source domain data, and comprises: extracting a first style characteristic and a first structural characteristic of the target domain data and a second structural characteristic and a second style characteristic of the source domain data according to the extraction network; generating first reconstruction data, first conversion data, second conversion data and second reconstruction data according to the generation network, wherein the first reconstruction data comprises a first style characteristic and a first structural characteristic, the first conversion data comprises a first style characteristic and a second structural characteristic, the second conversion data comprises the first structural characteristic and a second style characteristic, and the second reconstruction data comprises a second structural characteristic and a second style characteristic; judging whether the first reconstruction data, the first conversion data, the second conversion data and the second reconstruction data are true or false according to a judging network to obtain a judging result; and adjusting parameters of the countermeasure generating network according to the judgment result.

7. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the method of any one of claims 1 to 5 when executing the memory-stored instructions.

8. A non-transitory computer readable storage medium having stored thereon computer program instructions, wherein the computer program instructions, when executed by a processor, implement the method of any one of claims 1 to 5.