CN108597602B

CN108597602B - Label error correction method for skin medical data

Info

Publication number: CN108597602B
Application number: CN201810398681.XA
Authority: CN
Inventors: 曹瑞; 郭克华
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2021-11-05
Anticipated expiration: 2038-04-28
Also published as: CN108597602A

Abstract

The invention discloses a label error correction method facing skin medical data, which adopts a deep learning tensoflow frame and a convolutional neural network GoogleNet model of input V3, uses a transfer learning technology, takes a small amount of skin disease image data sets containing various labeled labels as training samples, puts the training samples into an input V3 model for training to obtain an identification model capable of distinguishing various skin diseases, takes the skin disease image data sets containing a large amount of noise labels as a test set, tests by using the model, and can correct the images with the noise labels. The result shows that the model can correct most error labels, and the identification rate of the model trained by the corrected data set on the diseases is greatly improved compared with that before the correction is carried out.

Description

Label error correction method for skin medical data

Technical Field

The invention relates to the field of computers and medicine, in particular to a label error correction method for skin medical data.

Background

In recent years, artificial intelligence brings convenience to human beings, and the life of people is influenced in many ways, such as intelligent home, automatic driving and face recognition. In artificial intelligence, deep learning is becoming a representative technique and many application cases are being generated. In deep learning, the machine calculates each characteristic of a training sample according to the existing labeled data set to train and learn the training sample, generates a discrimination model for identification and classification, generates a series of intelligent behaviors, and is applied to the simplest application such as digital portrait identification and picture classification, so that the acquisition of enough labeled data is very important for ensuring the accuracy and the efficiency. In the medical field, deep learning technology has achieved good results at present. For example, a deep learning-based congenital cataract multi-hospital collaborative management platform proposed in recent years, skin cancer classification to reach a dermatologist level using deep neural network design, and the like.

Clinically, the pathological features of some skin diseases are similar, and the images are not easy to distinguish, but the treatment schemes of different skin diseases are greatly different, so that correct diagnosis is very important. Applying deep learning techniques to dermatology requires high quality labels of dermatology data. However, in processing dermatologic medical data, we encounter a number of difficulties. On the one hand, for large data sets, manual processing of the data is a time consuming and laborious task, and in many systems, the tags present a noise problem. On the other hand, in hospitals, the labeling of disease data requires the participation of a large number of specialized doctors, and different doctors may have different insights into the disease data, which also generates noise labels. Therefore, in the field of skin medical data processing based on deep learning, error correction of tags has become a very important issue.

Currently in the field of data processing, manual processing is also its primary. However, some solutions have been proposed by researchers in tag error correction. For example, two algorithms for correcting tag noise have been proposed in recent years: self-training corrections and cluster-based corrections.

Manual processing of medical data can take a great deal of time and effort, and can also generate noise signatures. In the above-mentioned conventional methods, a large amount of label data is required as a training sample, and the methods are not applied to the field of dermatology.

Disclosure of Invention

The invention aims to solve the technical problem that the prior art is insufficient, and provides a label error correction method facing skin medical data to correct most error labels.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a label error correction method facing skin medical data comprises the following steps:

1) loading the seed data set to a GoogleNet (initiation V3) model by using a transfer learning method for training, and learning and calculating various characteristics in the seed data set to obtain a diagnosis model of the corresponding skin disease;

2) testing all images in the data set containing the noise labels by using the model in the step 2), wherein N images are totally obtained, and the confidence coefficient of each image is obtained;

3) sorting the confidence degrees of all the images in a descending order, selecting the image with the confidence degree ranked at the top K, selecting the image with the original label consistent with the model diagnosis label, and identifying the images as the images with the correct labels; selecting an image with an original label inconsistent with the model diagnosis label, identifying the image as an image with an error label, and correcting the label of the image; adding the corrected image and the image with the correct label into an incremental dataset, and removing the two parts of images from the original dataset containing the noise label;

4) merging the seed data set and the incremental data set into a new seed data set, judging whether an image still exists in the data set containing the noise label, and if so, returning to execute the step 1); if not, the process is ended.

The label error correction method for the skin medical data is characterized in that the following operations are performed before the step 1): the GoogleNet: initiation V3 model was loaded on the tensoflow platform.

The number of executing steps of the method is N/K, the value of K can be reasonably set according to the size of N, if N is 800, if K is 200, the number of executing steps is 4; if K is 400, the number of execution steps is 2, and the like.

An assumption V3 model is adopted, a transfer learning technology is utilized, a small number of skin disease image data sets containing various labeled labels are used as training samples and are placed into an assumption V3 model for training, an identification model capable of distinguishing various skin diseases is obtained, a skin disease image data set containing a large number of noise labels is used as a test set, the model is used for testing, and the model can correct the images with the noise labels. The result shows that the model can correct most error labels, and the identification rate of the model trained by the corrected data set on the diseases is greatly improved compared with that before the correction is carried out.

Compared with the prior art, the invention has the beneficial effects that: the invention adopts a method for correcting the noise label in the skin medical image data based on the deep learning technology, under the condition of no large amount of marked data, a small amount of skin disease image data sets are used as training samples, diagnosis models of corresponding skin diseases are trained, the data sets containing the noise label are detected, pictures are selected by confidence degrees and added into the training samples, the models are retrained, the data sets containing the noise label are continuously detected, and the iteration is carried out in sequence, so that the method can carry out the improvement of label correction on the data sets containing the noise label. Under the condition that a large number of professional doctors are not needed to label the pictures and correct the error label pictures, the error label images can be corrected with high precision, the time and the energy of the doctors for labeling and correcting the disease images are saved, meanwhile, the model after label error correction can assist the doctors in judging the diseases, the disease diagnosis efficiency of the doctors is improved, the influence of error labels on the model is reduced, and the doctors can diagnose the diseases efficiently and accurately.

Drawings

FIG. 1 is a schematic diagram of the method of the present invention.

Detailed Description

During the label error correction process, the data set contains a certain number of images with noisy labels, which are to be corrected. The data set required by the process is divided into two parts, wherein the first part is a characteristic diagram data set of a plurality of diseases, the data of the part is labeled by a dermatologist, and the data of the part is called seed data; the second part is a large number of labeled data sets that contain characteristic maps of various diseases, but are labeled with noise. Now to correct these noise tags, the steps are as follows:

In step 4), selecting images with correct labels and images with wrong labels from the images with the first K confidence coefficients, and assuming that the number of the images in the second partial data set is N, performing N/K iterations in the whole experiment, namely repeating the step N/K times until no images remain in the partial data set. If N is 800, if K is 200, the execution times of the steps are 4; if K is 400, the number of execution steps is 2. All noise signatures are then examined and corrected by the model.

The label error correction flow is shown in fig. 1, and the method is applicable to label error correction between various skin diseases, and for convenience of expression, error correction is assumed to be performed between two skin diseases, namely, diseases a and B in the figure.

Claims

1. A label error correction method facing skin medical data is characterized by comprising the following steps:

2. The label error correction method for dermatologic medical data according to claim 1, characterized in that before step 1), the following operations are performed: the GoogleNet: initiation V3 model was loaded on the tensoflow platform.