CN108597602B - Label error correction method for skin medical data - Google Patents

Label error correction method for skin medical data Download PDF

Info

Publication number
CN108597602B
CN108597602B CN201810398681.XA CN201810398681A CN108597602B CN 108597602 B CN108597602 B CN 108597602B CN 201810398681 A CN201810398681 A CN 201810398681A CN 108597602 B CN108597602 B CN 108597602B
Authority
CN
China
Prior art keywords
label
model
image
images
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810398681.XA
Other languages
Chinese (zh)
Other versions
CN108597602A (en
Inventor
曹瑞
郭克华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201810398681.XA priority Critical patent/CN108597602B/en
Publication of CN108597602A publication Critical patent/CN108597602A/en
Application granted granted Critical
Publication of CN108597602B publication Critical patent/CN108597602B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a label error correction method facing skin medical data, which adopts a deep learning tensoflow frame and a convolutional neural network GoogleNet model of input V3, uses a transfer learning technology, takes a small amount of skin disease image data sets containing various labeled labels as training samples, puts the training samples into an input V3 model for training to obtain an identification model capable of distinguishing various skin diseases, takes the skin disease image data sets containing a large amount of noise labels as a test set, tests by using the model, and can correct the images with the noise labels. The result shows that the model can correct most error labels, and the identification rate of the model trained by the corrected data set on the diseases is greatly improved compared with that before the correction is carried out.

Description

Label error correction method for skin medical data
Technical Field
The invention relates to the field of computers and medicine, in particular to a label error correction method for skin medical data.
Background
In recent years, artificial intelligence brings convenience to human beings, and the life of people is influenced in many ways, such as intelligent home, automatic driving and face recognition. In artificial intelligence, deep learning is becoming a representative technique and many application cases are being generated. In deep learning, the machine calculates each characteristic of a training sample according to the existing labeled data set to train and learn the training sample, generates a discrimination model for identification and classification, generates a series of intelligent behaviors, and is applied to the simplest application such as digital portrait identification and picture classification, so that the acquisition of enough labeled data is very important for ensuring the accuracy and the efficiency. In the medical field, deep learning technology has achieved good results at present. For example, a deep learning-based congenital cataract multi-hospital collaborative management platform proposed in recent years, skin cancer classification to reach a dermatologist level using deep neural network design, and the like.
Clinically, the pathological features of some skin diseases are similar, and the images are not easy to distinguish, but the treatment schemes of different skin diseases are greatly different, so that correct diagnosis is very important. Applying deep learning techniques to dermatology requires high quality labels of dermatology data. However, in processing dermatologic medical data, we encounter a number of difficulties. On the one hand, for large data sets, manual processing of the data is a time consuming and laborious task, and in many systems, the tags present a noise problem. On the other hand, in hospitals, the labeling of disease data requires the participation of a large number of specialized doctors, and different doctors may have different insights into the disease data, which also generates noise labels. Therefore, in the field of skin medical data processing based on deep learning, error correction of tags has become a very important issue.
Currently in the field of data processing, manual processing is also its primary. However, some solutions have been proposed by researchers in tag error correction. For example, two algorithms for correcting tag noise have been proposed in recent years: self-training corrections and cluster-based corrections.
Manual processing of medical data can take a great deal of time and effort, and can also generate noise signatures. In the above-mentioned conventional methods, a large amount of label data is required as a training sample, and the methods are not applied to the field of dermatology.
Disclosure of Invention
The invention aims to solve the technical problem that the prior art is insufficient, and provides a label error correction method facing skin medical data to correct most error labels.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a label error correction method facing skin medical data comprises the following steps:
1) loading the seed data set to a GoogleNet (initiation V3) model by using a transfer learning method for training, and learning and calculating various characteristics in the seed data set to obtain a diagnosis model of the corresponding skin disease;
2) testing all images in the data set containing the noise labels by using the model in the step 2), wherein N images are totally obtained, and the confidence coefficient of each image is obtained;
3) sorting the confidence degrees of all the images in a descending order, selecting the image with the confidence degree ranked at the top K, selecting the image with the original label consistent with the model diagnosis label, and identifying the images as the images with the correct labels; selecting an image with an original label inconsistent with the model diagnosis label, identifying the image as an image with an error label, and correcting the label of the image; adding the corrected image and the image with the correct label into an incremental dataset, and removing the two parts of images from the original dataset containing the noise label;
4) merging the seed data set and the incremental data set into a new seed data set, judging whether an image still exists in the data set containing the noise label, and if so, returning to execute the step 1); if not, the process is ended.
The label error correction method for the skin medical data is characterized in that the following operations are performed before the step 1): the GoogleNet: initiation V3 model was loaded on the tensoflow platform.
The number of executing steps of the method is N/K, the value of K can be reasonably set according to the size of N, if N is 800, if K is 200, the number of executing steps is 4; if K is 400, the number of execution steps is 2, and the like.
An assumption V3 model is adopted, a transfer learning technology is utilized, a small number of skin disease image data sets containing various labeled labels are used as training samples and are placed into an assumption V3 model for training, an identification model capable of distinguishing various skin diseases is obtained, a skin disease image data set containing a large number of noise labels is used as a test set, the model is used for testing, and the model can correct the images with the noise labels. The result shows that the model can correct most error labels, and the identification rate of the model trained by the corrected data set on the diseases is greatly improved compared with that before the correction is carried out.
Compared with the prior art, the invention has the beneficial effects that: the invention adopts a method for correcting the noise label in the skin medical image data based on the deep learning technology, under the condition of no large amount of marked data, a small amount of skin disease image data sets are used as training samples, diagnosis models of corresponding skin diseases are trained, the data sets containing the noise label are detected, pictures are selected by confidence degrees and added into the training samples, the models are retrained, the data sets containing the noise label are continuously detected, and the iteration is carried out in sequence, so that the method can carry out the improvement of label correction on the data sets containing the noise label. Under the condition that a large number of professional doctors are not needed to label the pictures and correct the error label pictures, the error label images can be corrected with high precision, the time and the energy of the doctors for labeling and correcting the disease images are saved, meanwhile, the model after label error correction can assist the doctors in judging the diseases, the disease diagnosis efficiency of the doctors is improved, the influence of error labels on the model is reduced, and the doctors can diagnose the diseases efficiently and accurately.
Drawings
FIG. 1 is a schematic diagram of the method of the present invention.
Detailed Description
During the label error correction process, the data set contains a certain number of images with noisy labels, which are to be corrected. The data set required by the process is divided into two parts, wherein the first part is a characteristic diagram data set of a plurality of diseases, the data of the part is labeled by a dermatologist, and the data of the part is called seed data; the second part is a large number of labeled data sets that contain characteristic maps of various diseases, but are labeled with noise. Now to correct these noise tags, the steps are as follows:
1) loading the seed data set to a GoogleNet (initiation V3) model by using a transfer learning method for training, and learning and calculating various characteristics in the seed data set to obtain a diagnosis model of the corresponding skin disease;
2) testing all images in the data set containing the noise labels by using the model in the step 2), wherein N images are totally obtained, and the confidence coefficient of each image is obtained;
3) sorting the confidence degrees of all the images in a descending order, selecting the image with the confidence degree ranked at the top K, selecting the image with the original label consistent with the model diagnosis label, and identifying the images as the images with the correct labels; selecting an image with an original label inconsistent with the model diagnosis label, identifying the image as an image with an error label, and correcting the label of the image; adding the corrected image and the image with the correct label into an incremental dataset, and removing the two parts of images from the original dataset containing the noise label;
4) merging the seed data set and the incremental data set into a new seed data set, judging whether an image still exists in the data set containing the noise label, and if so, returning to execute the step 1); if not, the process is ended.
The label error correction method for the skin medical data is characterized in that the following operations are performed before the step 1): the GoogleNet: initiation V3 model was loaded on the tensoflow platform.
In step 4), selecting images with correct labels and images with wrong labels from the images with the first K confidence coefficients, and assuming that the number of the images in the second partial data set is N, performing N/K iterations in the whole experiment, namely repeating the step N/K times until no images remain in the partial data set. If N is 800, if K is 200, the execution times of the steps are 4; if K is 400, the number of execution steps is 2. All noise signatures are then examined and corrected by the model.
The label error correction flow is shown in fig. 1, and the method is applicable to label error correction between various skin diseases, and for convenience of expression, error correction is assumed to be performed between two skin diseases, namely, diseases a and B in the figure.

Claims (2)

1. A label error correction method facing skin medical data is characterized by comprising the following steps:
1) loading the seed data set to a GoogleNet (initiation V3) model by using a transfer learning method for training, and learning and calculating various characteristics in the seed data set to obtain a diagnosis model of the corresponding skin disease;
2) testing all images in the data set containing the noise labels by using the model in the step 2), wherein N images are totally obtained, and the confidence coefficient of each image is obtained;
3) sorting the confidence degrees of all the images in a descending order, selecting the image with the confidence degree ranked at the top K, selecting the image with the original label consistent with the model diagnosis label, and identifying the images as the images with the correct labels; selecting an image with an original label inconsistent with the model diagnosis label, identifying the image as an image with an error label, and correcting the label of the image; adding the corrected image and the image with the correct label into an incremental dataset, and removing the two parts of images from the original dataset containing the noise label;
4) merging the seed data set and the incremental data set into a new seed data set, judging whether an image still exists in the data set containing the noise label, and if so, returning to execute the step 1); if not, the process is ended.
2. The label error correction method for dermatologic medical data according to claim 1, characterized in that before step 1), the following operations are performed: the GoogleNet: initiation V3 model was loaded on the tensoflow platform.
CN201810398681.XA 2018-04-28 2018-04-28 Label error correction method for skin medical data Active CN108597602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810398681.XA CN108597602B (en) 2018-04-28 2018-04-28 Label error correction method for skin medical data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810398681.XA CN108597602B (en) 2018-04-28 2018-04-28 Label error correction method for skin medical data

Publications (2)

Publication Number Publication Date
CN108597602A CN108597602A (en) 2018-09-28
CN108597602B true CN108597602B (en) 2021-11-05

Family

ID=63610663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810398681.XA Active CN108597602B (en) 2018-04-28 2018-04-28 Label error correction method for skin medical data

Country Status (1)

Country Link
CN (1) CN108597602B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101328A (en) * 2020-11-19 2020-12-18 四川新网银行股份有限公司 Method for identifying and processing label noise in deep learning
CN112884135B (en) * 2021-04-29 2021-07-30 聚时科技(江苏)有限公司 Data annotation correction method based on frame regression

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7313328B2 (en) * 2002-02-28 2007-12-25 Nippon Telegraph And Telephone Corporation Node used in photonic network, and photonic network
CN106055091A (en) * 2016-05-16 2016-10-26 电子科技大学 Hand posture estimation method based on depth information and calibration method
CN107016387A (en) * 2016-01-28 2017-08-04 苏宁云商集团股份有限公司 A kind of method and device for recognizing label
CN107847156A (en) * 2015-06-15 2018-03-27 维塔尔实验室公司 The method and system assessed and managed for angiocardiopathy
CN107862694A (en) * 2017-12-19 2018-03-30 济南大象信息技术有限公司 A kind of hand-foot-and-mouth disease detecting system based on deep learning
CN107909566A (en) * 2017-10-28 2018-04-13 杭州电子科技大学 A kind of image-recognizing method of the cutaneum carcinoma melanoma based on deep learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463208A (en) * 2014-12-09 2015-03-25 北京工商大学 Multi-view semi-supervised collaboration classification algorithm with combination of agreement and disagreement label rules
CN106096557B (en) * 2016-06-15 2019-01-18 浙江大学 A kind of semi-supervised learning facial expression recognizing method based on fuzzy training sample

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7313328B2 (en) * 2002-02-28 2007-12-25 Nippon Telegraph And Telephone Corporation Node used in photonic network, and photonic network
CN107847156A (en) * 2015-06-15 2018-03-27 维塔尔实验室公司 The method and system assessed and managed for angiocardiopathy
CN107016387A (en) * 2016-01-28 2017-08-04 苏宁云商集团股份有限公司 A kind of method and device for recognizing label
CN106055091A (en) * 2016-05-16 2016-10-26 电子科技大学 Hand posture estimation method based on depth information and calibration method
CN107909566A (en) * 2017-10-28 2018-04-13 杭州电子科技大学 A kind of image-recognizing method of the cutaneum carcinoma melanoma based on deep learning
CN107862694A (en) * 2017-12-19 2018-03-30 济南大象信息技术有限公司 A kind of hand-foot-and-mouth disease detecting system based on deep learning

Also Published As

Publication number Publication date
CN108597602A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN110693486B (en) Electrocardiogram abnormity labeling method and device
CN111110228B (en) Electrocardiosignal R wave detection method and device
CN108492873B (en) Knowledge transfer learning method for assisting in diagnosing Alzheimer's disease
CN107766874B (en) Measuring method and measuring system for ultrasonic volume biological parameters
CN110992351A (en) sMRI image classification method and device based on multi-input convolutional neural network
CN111667027B (en) Multi-modal image segmentation model training method, image processing method and device
CN108597602B (en) Label error correction method for skin medical data
CN113448843B (en) Image recognition software test data enhancement method and device based on defect analysis
CN114332577A (en) Colorectal cancer image classification method and system combining deep learning and image omics
CN111127400A (en) Method and device for detecting breast lesions
CN110110622B (en) Medical text detection method, system and storage medium based on image processing
CN112070760A (en) Bone mass detection method based on convolutional neural network
CN114067233B (en) Cross-mode matching method and system
CN111428734B (en) Image feature extraction method and device based on residual countermeasure inference learning and computer readable storage medium
CN110675378B (en) Image identification method and system for stability of spinal metastasis tumor
CN114066804A (en) Curved surface fault layer tooth position identification method based on deep learning
CN113223003A (en) Bile duct image segmentation method based on deep learning
CN115578400A (en) Image processing method, and training method and device of image segmentation network
Cao et al. A deep convolutional neural network-based label completion and correction strategy for supervised medical image learning
CN113409293A (en) Pathology image automatic segmentation system based on deep learning
CN113516616A (en) Automatic fundus photo detection method and system based on image recognition
Syeda-Mahmood et al. Learning the correlation between images and disease labels using ambiguous learning
CN111046951A (en) Medical image classification method
US20230359649A1 (en) Methods and Systems with Additive AI Models
CN117392693B (en) Method and equipment for removing handwriting of pathological image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant