CN115147873A - Method, equipment and medium for automatically classifying dental images based on dual-label cascade - Google Patents

Method, equipment and medium for automatically classifying dental images based on dual-label cascade Download PDF

Info

Publication number
CN115147873A
CN115147873A CN202211063304.3A CN202211063304A CN115147873A CN 115147873 A CN115147873 A CN 115147873A CN 202211063304 A CN202211063304 A CN 202211063304A CN 115147873 A CN115147873 A CN 115147873A
Authority
CN
China
Prior art keywords
photograph
dental
label
dual
cascade
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211063304.3A
Other languages
Chinese (zh)
Inventor
王都洋
王艳福
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hansf Hangzhou Medical Technology Co ltd
Original Assignee
Hansf Hangzhou Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hansf Hangzhou Medical Technology Co ltd filed Critical Hansf Hangzhou Medical Technology Co ltd
Priority to CN202211063304.3A priority Critical patent/CN115147873A/en
Publication of CN115147873A publication Critical patent/CN115147873A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, in particular to a method, equipment and a medium for automatically classifying dental images based on dual-label cascade, which comprises the following steps of; creating a dental image dataset; performing data enhancement on the dental image dataset; designing a lightweight network to detect a specific target of a dental image dataset, and classifying the image data into a plurality of categories; respectively as follows: frontal photograph, side photograph, intraoral left and right side photograph, intraoral frontal photograph, intraoral upper and lower jaws photograph, side photograph, panoramic photograph, positioning photograph, joint photograph and small dental photograph; performing secondary classification by using a dual-label classification network aiming at the left and right side photographs in the mouth, and performing secondary classification by using target detection aiming at the front photograph and the mouth upper and lower jaw photograph categories to obtain a front photograph and a smile photograph so as to obtain a mouth upper jaw photograph and a mouth lower jaw photograph; the method effectively solves the problems of poor automatic classification effect and low classification accuracy of the dental images, improves the automatic classification accuracy of the dental images, and stabilizes the performance of triple loss functions.

Description

Dental image automatic classification method, equipment and medium based on double-label cascade
Technical Field
The invention relates to the technical field of image processing, in particular to a method, equipment and medium for automatically classifying dental images based on dual-label cascade.
Background
With the rapid development of society and the rapid improvement of the quality of life of people, the oral cavity problem is more and more emphasized by people. The dentist takes various CT images of the patient such as a lateral photograph, a panorama photograph, a frontal photograph, a small dental photograph, etc. by using the CT machine, and acquires image data of a frontal photograph, a smile photograph, a side photograph, an intraoral photograph of various angles, etc. of the patient using a single-lens reflex camera. Through the collection of multiple pieces of relevant image data, the dentist can make a corresponding diagnosis.
The large amount of image data requires doctors to manually identify and classify, which is a huge workload for doctors, and the image distinguishing capability of doctors is reduced along with the increase of workload and working time. How to efficiently and accurately distinguish different images is the focus of research.
In recent years, with the rapid development of deep learning, many industries of society have maturely applied the deep learning, and the work and production efficiency is greatly improved. In dentistry, the dental patient images are automatically classified by using a deep learning method, so that the working efficiency of a dentist can be greatly improved. However, the similarity of some images in dental images is high, which causes great difficulty in automatic classification. Such as intraoral left and right side-lighting, from a visual point of view, the image is merely mirrored. The characteristics of dental data diversity and class imbalance are particularly prominent compared to large-scale data common in general classification tasks. Solving the automatic accurate classification of similar dental images is a crucial problem.
Therefore, there is a need for further improvement of the method, apparatus and medium for automatic classification of dental images based on dual-label cascade to solve the above problems.
Disclosure of Invention
The purpose of this application is: the method, the equipment and the medium for automatically classifying the dental images based on the dual-label cascade are provided, the problems of poor automatic classification effect and low classification accuracy of the dental images are effectively solved, the automatic classification accuracy of the dental images is improved, and the performance of a triple loss function is stabilized.
The purpose of the application is achieved through the following technical scheme, and the dental image automatic classification method based on double-label cascade is characterized by comprising the following steps of: the method comprises the following steps:
s1: creating a dental image dataset;
s2: performing data enhancement on the dental image dataset;
s3: designing a specific target of the dental image data set in the lightweight network detection step S2, and dividing the image data into a plurality of categories; respectively as follows: frontal photograph, side photograph, intraoral left and right side photograph, intraoral frontal photograph, intraoral upper and lower jaws photograph, side photograph, panoramic photograph, positioning photograph, joint photograph and small dental photograph;
s4: aiming at the left and right side photographs in the output port of the step S3, secondary classification is carried out by utilizing a dual-label classification network to obtain the left side photograph in the port and the right side photograph in the port
S5: performing secondary classification on the front face photos and the intraoral upper and lower jaw photos output in the step S3 by using target detection, classifying the front face photos to obtain the front face photos and the smile photos, and classifying the intraoral upper and lower jaw photos to obtain the intraoral upper jaw photos and the intraoral lower jaw photos;
wherein, the step S4 specifically includes:
s41, constructing a dual-label classification network;
s42, selecting positive and negative samples; each anchor point selects a positive sample and a negative sample which have large information quantity and are representative;
s43, self-adaptive triple loss; features of the same label are spatially close together and features of different labels are spatially far apart.
Preferably, the step S1 specifically includes:
s11, dental image data set composition: dividing actually shot dental images into a plurality of categories, and dividing the plurality of categories into single labels and double labels, wherein the double labels are additional labeling categories for special image data;
s12, labeling dental image data sets: labeling the dental image according to target detection, and labeling by using a labellimg labeling tool;
s13: dental image dataset partitioning: the dental image data set is divided into a training set, a verification set and a test set according to the proportion of 8.
Preferably, the data enhancement in step S2 specifically includes: and carrying out left-right mirror image adjustment or illumination and contrast adjustment on the dental image data set.
Preferably, the S3 specifically includes:
s31, designing a lightweight network: the model consists of 10 convolutional layers, and conventional convolution and depth separable convolution are utilized;
s32, a class balance loss function: the class imbalance problem is handled using CB-Focal local, which is the following formula:
Figure 655377DEST_PATH_IMAGE001
wherein,
Figure 260933DEST_PATH_IMAGE002
is a category
Figure 978353DEST_PATH_IMAGE003
The number of true samples of the sample,
Figure 560513DEST_PATH_IMAGE004
and
Figure 312568DEST_PATH_IMAGE005
are all the parameters of the weight that are,
Figure 620184DEST_PATH_IMAGE006
a probability value predicted for the network.
S33, model initialization: initialization is done using VGG 16 weights trained on ImageNet data sets.
Preferably, the S42 specifically includes:
s421, evaluating the information content of the dental image;
s422, selecting a representative sample from the positive sample and the negative sample with the largest information quantity to construct a triple;
s423. For each image
Figure 28163DEST_PATH_IMAGE007
Relative to anchor points
Figure 210751DEST_PATH_IMAGE008
Score of the amount of information
Figure 817313DEST_PATH_IMAGE009
To represent
Figure 499092DEST_PATH_IMAGE007
Whether it is a candidate positive sample sum
Figure 925525DEST_PATH_IMAGE010
To represent
Figure 849488DEST_PATH_IMAGE007
Whether the negative sample is a candidate negative sample or not, the calculation formula is as follows:
Figure 310556DEST_PATH_IMAGE011
Figure 474821DEST_PATH_IMAGE012
wherein,
Figure 404862DEST_PATH_IMAGE013
representing an image
Figure 617669DEST_PATH_IMAGE007
And
Figure 182512DEST_PATH_IMAGE008
the similarity of class labels is calculated by using the pair-wise similarity distance,
Figure 455361DEST_PATH_IMAGE014
representing images
Figure 860979DEST_PATH_IMAGE007
And
Figure 674215DEST_PATH_IMAGE008
the degree of difficulty between the two is calculated by the Euclidean distance of the embedding space of the two.
Preferably, the step S43 specifically includes:
s431, enabling the features of the same label to be close to each other in spatial position, enabling the features of different labels to be far away from each other in spatial position, and enabling the distance between the negative example and the positive example to be at least a long-distance term for two positive examples and one negative example of the same class; the formula is expressed as follows:
Figure 359143DEST_PATH_IMAGE015
wherein,
Figure 68473DEST_PATH_IMAGE016
indicating that the maximum value is taken in comparison with 0,
Figure 707527DEST_PATH_IMAGE017
Figure 527715DEST_PATH_IMAGE018
and
Figure 801571DEST_PATH_IMAGE019
respectively representing an anchor point, a positive sample and a negative sample,
Figure 681802DEST_PATH_IMAGE020
representing a distance term;
s432, setting a threshold value between the positive sample and the negative sample, namely
Figure 808152DEST_PATH_IMAGE021
Wherein
Figure 166452DEST_PATH_IMAGE022
the angle is represented as a function of time,
Figure 825972DEST_PATH_IMAGE023
i.e. by
Figure 877105DEST_PATH_IMAGE024
The adaptive triplet loss function is:
Figure DEST_PATH_IMAGE025
wherein,
Figure 225172DEST_PATH_IMAGE026
in order to be a hyper-parameter,
Figure 387163DEST_PATH_IMAGE027
expressed as a strict edge-to-edge term,
Figure 370032DEST_PATH_IMAGE028
indicating a slack margin term.
Preferably, the step S5 specifically includes:
s51, a target detection network; taking NanoDet as a detection network, selecting ShuffleNet V2 as a backbone, removing the last layer of convolution, extracting 8, 16 and 32 times of down-sampling features, and inputting the features into PAN for multi-scale feature fusion;
s52, a target detection loss function; selecting a Generalized local to train the network;
s53, target detection training; dual labels are used for training.
Preferably, the formula adopted in step S52 is:
Figure 592065DEST_PATH_IMAGE029
wherein,
Figure 693008DEST_PATH_IMAGE030
a real label of 0 to 1,
Figure 658690DEST_PATH_IMAGE031
in order to predict the value of the target,
Figure 27223DEST_PATH_IMAGE032
the present invention also provides an electronic device, comprising: one or more processors; storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a dual-label cascade-based dental image automatic classification method as provided by the present invention.
The present invention also provides a computer-readable storage medium storing a computer program executable by a computer processor to implement any one of the above-mentioned methods for automatic classification of dental images based on dual-label cascade.
Compared with the prior art, the application has the following obvious advantages and effects:
1. in the invention, the tooth classification is carried out in a dual-label and cascade mode, so that the problem of similarity of dental images is solved, and the accuracy of dental image classification is improved.
2. In the invention, the performance of the triple loss function is stabilized through the self-adaptive triple loss function.
3. In the invention, a new positive and negative sample selection strategy is designed, and samples are selected according to the information content of the images, namely the relevance, the difficulty and the representativeness of the images.
Drawings
Fig. 1 is a flow chart of a dual-label cascade network of the present application.
Fig. 2 is a schematic diagram of a dual-tag cascade network structure according to the present application.
Fig. 3 is a lightweight classification network in the present application.
Fig. 4 is a diagram of the composition of a dual-label classification network in the present application.
Fig. 5 is a diagram of a triplet loss function in the present application.
FIG. 6 is an intraoral maxillomandibular view under dual tag target detection in the present application.
Fig. 7 is a schematic structural diagram of an electronic device in the present application.
Reference numbers in this application:
processor 101, storage device 102, output device 103, output device 104, bus 105.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in greater detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations (or steps) can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The method, apparatus and medium for automatic classification of dental images based on dual tag cascade provided in the present application are described in detail by the following embodiments and alternatives thereof.
Fig. 1 is a flowchart of a method for automatically classifying dental images based on dual-label cascade in an embodiment of the present invention. Fig. 2 is a schematic diagram of a dual-tag cascade network structure according to the present application. The embodiment of the invention can be suitable for the condition of a double-label cascade dental image automatic classification method. The method can be executed by a double-label cascade dental image automatic classification device which can be realized in a software and/or hardware mode and is integrated on any electronic device with a network communication function. As shown in fig. 1, the method for automatically classifying a two-label cascade dental image provided in the embodiment of the present application may include the following steps:
s1: creating a dental image dataset; the step S1 specifically includes:
s11, dental image data set composition: dividing actually shot dental images into a plurality of categories, and dividing the plurality of categories into single labels and double labels, wherein the double labels are additional labeling categories for special image data;
in the embodiment of the present application, the data is derived from images actually taken by the doctor. The data are divided into 10 categories, including frontal photograph, side photograph, intraoral left and right side photograph, intraoral frontal photograph, intraoral upper and lower jaw photograph, side photograph, panoramic photograph, positioning photograph, articular photograph and small dental photograph. Wherein, the front lighting, the left and right side lighting in the mouth and the upper and lower jaw lighting in the mouth are double labels. Dual tags refer to the additional tagging of a class for particular data. Wherein, the ditag includes: the extra label under the front face, the front face is not smiling and is smiling, the extra label under the side-lighting about in the mouth, the side-lighting about in the mouth and the right side-lighting in the mouth, the lower jaw-lighting in the mouth are intraoral upper jaw-lighting and intraoral lower jaw-lighting.
S12, labeling of dental image data sets: labeling the dental image data set by a doctor with related professional experience according to target detection, and labeling by using a labelimg labeling tool;
s13: dental image dataset partitioning: dental image data sets are divided into training sets, verification sets and test sets in the proportion of 8.
S2: performing data enhancement on the dental image dataset;
in the embodiment of the present application, the data enhancement specifically includes: and carrying out left and right mirror image adjustment or illumination and contrast adjustment on the dental image data set. Specifically, the left and right mirror image adjustment is additionally performed on all the other images except for the adjustment of illumination and contrast only for the intraoral left and right side illumination in 10 label categories.
S3: designing a specific target of the dental image data set in the lightweight network detection step S2, and dividing the image data into a plurality of categories; respectively as follows: frontal irradiation, side irradiation, oral left and right side irradiation, oral front irradiation, oral upper and lower jaw irradiation, side film, panoramic film, normal film, joint film, and small dental film; and step S3 is a first stage, and single-label classification is carried out on the dental images by using a lightweight network, so that a foundation is laid for subsequent steps. S3 specifically comprises the following steps:
s31, designing a lightweight network: the model consists of 10 convolutional layers, and conventional convolution and depth separable convolution are utilized;
as shown in fig. 3, a lightweight classification network structure diagram in the present application is a lightweight deep CNN model capable of effectively utilizing embedded device resources is designed in the embodiment of the present application. The model consists of 10 convolutional layers, with conventional convolution and depth separable convolution. The first three layers are conventional convolutional layers, and the remaining layers are depth separable convolutional layers. The use of deep separable convolutions can increase the network computation time and make the model smaller. Layer 9 is the bottleneck layer and the output of the convolutional layer is then flattened and fed into the sigmoid classifier. Since most of the calculations and parameters occur in the fully connected layer, full connection is not employed in order to reduce model size and processing time. And the maximum pooling layer is adopted after the 3 rd, 5 th, 6 th, 7 th and 8 th layers, the kernel size of the pooling layer is 2, the step size is 2, and the maximum pooling reduces the calculation cost. The ReLU activation function is used for all layers, and the ReLU generally converges faster than other activation functions.
S32, class balance loss function: the choice of using CB-Focal local to deal with the class imbalance problem, CB is a re-weighting scheme, and Loss is re-balanced by the number of valid samples of each class, and the formula of CB-Focal local is as follows:
Figure 154579DEST_PATH_IMAGE033
wherein,
Figure 539555DEST_PATH_IMAGE034
is a category
Figure 43349DEST_PATH_IMAGE035
The number of true samples of the sample,
Figure 531968DEST_PATH_IMAGE036
and
Figure 564646DEST_PATH_IMAGE037
are all the parameters of the weight that are,
Figure 109022DEST_PATH_IMAGE038
a probability value predicted for the network.
S33, model initialization: initialization is done using VGG 16 weights trained on ImageNet data sets.
In the embodiment of the present application, weight initialization is an important step in training the model. The designed lightweight classification model was initialized with VGG 16 weights trained on the ImageNet dataset.
S4: aiming at the left and right side photographs in the mouth output in the step S3, in the second stage, secondary classification is carried out by using a double-label classification network to obtain the left side photograph in the mouth and the right side photograph in the mouth; step S4 specifically includes:
s41, constructing a dual-label classification network; as shown in fig. 4, a composition diagram of a dual-label classification network in the present application is shown, in the embodiment of the present application, the dual-label classification network adopts a lightweight classification network designed in a first stage, and performs secondary classification on intra-oral left and right side-view category images acquired in the first stage. The categories of this classification are intraoral left and intraoral right photographs. Unlike the network of the first stage, the second stage replaces the first three convolutional layers with one convolutional layer with a core size of 7 x 7. The 4 th and 5 th layers were replaced with one convolution layer with a core size of 5 x 5, the rest of the model remaining unchanged. The number of layers of the model is reduced from 10 to 7, and compared with the network of the first stage, the convolution kernel size of the first two layers of the model is larger, but the acceptance domain of the network is the same. The average processing time is reduced to 1 second and the number of parameters is also reduced. The design makes the model faster and smaller, and the precision is slightly reduced.
S42, selecting positive and negative samples; each anchor point selects a positive sample and a negative sample which have large information quantity and are representative; in the embodiment of the present application, the selection strategy of positive and negative samples aims to select positive and negative samples with large amount of information (i.e., correlation and difficulty) and representativeness (i.e., different from each other in the feature space) for each anchor point. The sample to anchor point correlation is defined based on its tag similarity with respect to the anchor point. If the class label similarity of a positive sample is high, then its correlation to the anchor point is high, and vice versa. If the class label similarity of a negative example is small, it is highly correlated with the anchor point, and vice versa. The difficulty of a sample is related to its distance to the anchor point in the feature space. In the embedding space, positive samples may be difficult if the distance to the anchor point is large, while negative samples may be difficult if the distance to the anchor point is small.
A positive and negative sample selection strategy: the information content (i.e., relevance and difficulty) of the image is first evaluated. Then, of the positive samples and the negative samples with the largest information amount, representative (diverse) samples are selected to construct the triples. For each image
Figure 416507DEST_PATH_IMAGE039
Relative to anchor points
Figure 290790DEST_PATH_IMAGE040
Score of the amount of information
Figure 25528DEST_PATH_IMAGE041
(is shown in
Figure 322780DEST_PATH_IMAGE039
Whether it is a candidate positive sample) and
Figure 168376DEST_PATH_IMAGE042
(is shown in
Figure 366008DEST_PATH_IMAGE039
Whether it is a negative sample of a candidate). The calculation formula is as follows:
Figure 271647DEST_PATH_IMAGE043
Figure 102200DEST_PATH_IMAGE044
wherein,
Figure DEST_PATH_IMAGE045
representing images
Figure 451621DEST_PATH_IMAGE046
And
Figure 707022DEST_PATH_IMAGE047
class label similarity between them, calculated using the pair-wise similarity distance.
Figure 580300DEST_PATH_IMAGE048
Representing images
Figure 852144DEST_PATH_IMAGE046
And
Figure 773963DEST_PATH_IMAGE047
the degree of difficulty between the two is calculated by the Euclidean distance of the embedding space of the two. A new positive and negative sample selection strategy is designed, and samples are selected according to the information quantity of the images, namely the relevance, the difficulty and the representativeness of the images. Selecting samples according to relevance and difficulty modes can enable the network to select samples better, and the samples are more helpful for network training and better in accuracy and generalization of the model.
S43, self-adaptive triple loss; features of the same label are spatially close together and features of different labels are spatially far apart.
Fig. 5 is a schematic diagram of a triplet loss function in the present application. In the embodiment of the present application, the objective of the triplet loss function is to make the features of the same label as close as possible in spatial position, and the features of different labels as far as possible in spatial position, and to avoid the features of the samples from converging into a very small space, it is required that for two positive examples and one negative example of the same class, the negative example should be at least a long distance term (margin) away from the positive example. The mathematical expression is as follows:
Figure 946188DEST_PATH_IMAGE049
wherein,
Figure 193629DEST_PATH_IMAGE050
indicating that the maximum value is taken in comparison with 0,
Figure 749507DEST_PATH_IMAGE051
Figure 740597DEST_PATH_IMAGE052
Figure 501748DEST_PATH_IMAGE053
representing the anchor point (candidate sample), positive and negative samples,
Figure 388933DEST_PATH_IMAGE054
representing a distance term. However, this expression only ensures that the distance between the anchor point and the positive sample is strictly smaller than the distance between the anchor point and the negative sample. This formula allows for a special case where both distances are arbitrarily small. In this case, although the margin term is increased
Figure 369789DEST_PATH_IMAGE054
The distance to the negative sample can be enlarged, but
Figure 164570DEST_PATH_IMAGE054
Greater than 0.5Resulting in a significant degradation of network performance. Therefore, in the present embodiment it is proposed to set a threshold between the positive and negative sample pairs, i.e.
Figure 842545DEST_PATH_IMAGE055
Wherein
Figure 431789DEST_PATH_IMAGE056
the angle is represented as a function of time,
Figure 899942DEST_PATH_IMAGE057
i.e. by
Figure 232834DEST_PATH_IMAGE058
. The adaptive triplet loss function proposed in this embodiment is:
Figure 516048DEST_PATH_IMAGE059
wherein,
Figure 791040DEST_PATH_IMAGE060
in order to be a hyper-parameter,
Figure DEST_PATH_IMAGE061
expressed as a strict edge distance term, the term,
Figure 684172DEST_PATH_IMAGE062
indicating a slack margin term. Through the adaptive triple loss function, the performance of the triple loss function is stabilized.
And S44, the intra-oral left and right side illumination output by the S3 is sent to a trained double-label classification network. The network further classifies the intraoral photographs according to the output category scores to obtain specific categories of the input intraoral photographs, namely intraoral left photographs or intraoral right photographs.
S5: in the third stage, performing secondary classification by using target detection according to the categories of the front face photograph and the intraoral upper and lower jaw photographs output in the step S3, classifying the front face photograph to obtain the front face photograph and the smiling photograph, and classifying the intraoral upper and lower jaw photographs to obtain the intraoral upper jaw photograph and the intraoral lower jaw photograph; s51, a target detection network; and (3) taking the NanoDet as a detection network, selecting the ShuffleNet V2 as a backbone, removing the last layer of convolution, extracting 8, 16 and 32 times of down-sampling characteristics, and inputting the characteristics into the PAN for multi-scale characteristic fusion. In the embodiment of the application, the structure of the PAN is different from that of the conventional PAN, all convolutions in the PAN are removed, only 1 × 1 convolution after extracting from the backbone network features is reserved for aligning the feature channel dimensions, and both up-sampling and down-sampling are completed by interpolation. Unlike the catanate operation used by yolo, the choice is to add Feature maps at multiple scales directly, so that the overall Feature fusion module becomes less computationally intensive. s
S52, a target detection loss function; in the present embodiment, in consideration of the dispersion of dental data and in order to ensure the consistency of training and reasoning, a Generalized Focal local is selected to train the network; the formula used is:
Figure 820755DEST_PATH_IMAGE063
wherein,
Figure 411006DEST_PATH_IMAGE064
a real label of 0 to 1,
Figure 342053DEST_PATH_IMAGE065
in order to predict the value of the target,
Figure 784797DEST_PATH_IMAGE066
s53, target detection training; as shown in fig. 6, it is an intraoral upper and lower jaw chart under the dual-label target detection in the present application, and in the present embodiment, in order to better distinguish the frontal photograph from the smile photograph and the intraoral upper and lower jaw photograph, the dual-label training is performed. The front lighting and the smiling lighting are additionally provided with mouth detection on the basis of the original front lighting, and the detected mouths are subjected to secondary classification and are divided into two categories of smiling and non-smiling. Detection of intraoral mandible photographs after detection of the oral cavity, a secondary classification was performed into mandible photographs with tongue and maxilla photographs without tongue.
And S54, sending the front face photograph and the intraoral jaw photograph output in the S3 into the trained target detection network. The network distinguishes whether the front face is smiling according to whether the front face contains the mouth and the characteristics of the mouth. The intraoral photographs are classified into maxillofacial photographs and mandibular photographs according to whether the photographs contain the tongue or not.
The method solves the problem that the similarity of dental data cannot be effectively solved by the traditional target detector and classifier by adopting a double-label training method, solves the problems of diversity and class imbalance of the dental data by a cascade frame, improves the automatic classification accuracy of dental images and stabilizes the performance of triple loss functions.
The present invention further provides an electronic device, as shown in fig. 7, which is a schematic structural diagram of an electronic device in the present application, and includes one or more processors 101 and a storage device 102; the processor 101 in the electronic device may be one or more, and fig. 7 illustrates one processor 101 as an example; storage 102 is used to store one or more programs; the one or more programs are executable by the one or more processors 101 to cause the one or more processors 101 to implement a method for automatic classification of dental images based on dual-label cascading as described in any of the embodiments of the present invention.
The electronic device may further include: an input device 103 and an output device 104. The processor 101, the storage device 102, the input device 103, and the output device 104 in the electronic apparatus may be connected by a bus 105 or other means, and fig. 7 illustrates an example in which the processor, the storage device 102, the input device 103, and the output device are connected by the bus 105.
The storage device 102 in the electronic apparatus is used as a computer-readable storage medium for storing one or more programs, which may be software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the method for automatic classification of dental images based on dual-label cascading provided in the embodiments of the present invention. The processor 101 executes various functional applications and data processing of the electronic device by running software programs, instructions and modules stored in the storage device 102, namely, the method for automatically classifying dental images based on dual-label cascade in the above method embodiment is realized.
The storage device 102 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. In addition, the storage device 102 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage 102 may further include memory located remotely from the processor 101, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 103 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus. The output device 104 may include a display device such as a display screen.
And, when one or more programs included in the above-mentioned electronic device are executed by the one or more processors 101, the programs perform the following operations:
creating a dental image dataset;
performing data enhancement on the dental image dataset;
designing a lightweight network to detect a specific target of a dental image dataset, and classifying the image data into a plurality of categories; respectively as follows: frontal irradiation, side irradiation, oral left and right side irradiation, oral front irradiation, oral upper and lower jaw irradiation, side film, panoramic film, normal film, joint film, and small dental film;
aiming at the left and right side illumination in the mouth, carrying out secondary classification by using a dual-label classification network to obtain the left and right side illumination in the mouth;
and (3) carrying out secondary classification by utilizing target detection aiming at the front face and the intraoral upper and lower jaw photographs, classifying the front face photographs to obtain the front face photographs and the smiling photographs, and classifying the intraoral upper and lower jaw photographs to obtain the intraoral upper jaw photographs and the intraoral lower jaw photographs.
Of course, it will be understood by those skilled in the art that when one or more programs included in the electronic device are executed by the one or more processors 101, the programs may also perform operations related to the method for automatically classifying dental images by two-tag cascade provided in any embodiment of the present invention.
It should be further noted that the present invention also provides a computer-readable storage medium, which stores a computer program, where the computer program can be executed by a computer processor, to implement the above-mentioned embodiment of the method for automatic classification of dental images based on dual-label cascade. The computer program may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Since any modifications, equivalents, improvements, etc. made within the spirit and principles of the application may readily occur to those skilled in the art, it is intended to be included within the scope of the claims of this application.

Claims (10)

1. A dental image automatic classification method based on dual-label cascade is characterized in that: the method comprises the following steps:
s1: creating a dental image dataset;
s2: performing data enhancement on the dental image dataset;
s3: designing a specific target of the dental image data set in the lightweight network detection step S2, and dividing the image data into a plurality of categories; respectively as follows: frontal photograph, side photograph, intraoral left and right side photograph, intraoral frontal photograph, intraoral upper and lower jaws photograph, side photograph, panoramic photograph, positioning photograph, joint photograph and small dental photograph;
s4: aiming at the left and right side illumination in the output port of the step S3, carrying out secondary classification by using a dual-label classification network to obtain the left side illumination in the port and the right side illumination in the port;
s5: performing secondary classification on the front face photos and the intraoral upper and lower jaw photos output in the step S3 by using target detection, classifying the front face photos to obtain the front face photos and the smile photos, and classifying the intraoral upper and lower jaw photos to obtain the intraoral upper jaw photos and the intraoral lower jaw photos;
wherein, the step S4 specifically includes:
s41, constructing a dual-label classification network;
s42, selecting positive and negative samples; each anchor point selects a positive sample and a negative sample which have large information quantity and are representative;
s43, self-adaptive triple loss; features of the same label are spatially close together and features of different labels are spatially far apart.
2. The method for automatic classification of dental images based on dual-label cascade as claimed in claim 1, wherein the step S1 specifically comprises:
s11, dental image data set composition: dividing actually shot dental images into a plurality of categories, and dividing the plurality of categories into single labels and double labels, wherein the double labels are additional labeling categories for special image data;
s12, labeling of dental image data sets: labeling the dental image according to target detection, and labeling by using a labellimg labeling tool;
s13: dental image dataset partitioning: dental image data sets are divided into training sets, verification sets and test sets in the proportion of 8.
3. The method for automatically classifying dental images based on dual-label cascade as claimed in claim 1, wherein the data enhancement in step S2 is specifically: and carrying out left and right mirror image adjustment or illumination and contrast adjustment on the dental image data set.
4. The automatic classification method for dental images based on dual-label cascade as claimed in claim 1, characterized in that: the S3 specifically includes:
s31, designing a lightweight network: the model consists of 10 convolutional layers, and conventional convolution and depth separable convolution are utilized;
s32, class balance loss function: the class imbalance problem is handled using CB-Focal local, which is the following formula:
Figure 143871DEST_PATH_IMAGE001
wherein,
Figure 890373DEST_PATH_IMAGE002
is a category
Figure 998006DEST_PATH_IMAGE003
The number of true samples of the sample,
Figure 721112DEST_PATH_IMAGE004
and
Figure 364845DEST_PATH_IMAGE005
are all the parameters of the weight that are,
Figure 515203DEST_PATH_IMAGE006
a predicted probability value for the network;
s33, model initialization: initialization is done using VGG 16 weights trained on ImageNet data sets.
5. The automatic classification method for dental images based on dual-label cascade as claimed in claim 1, characterized in that: the S42 specifically includes:
s421, evaluating the information content of the dental image;
s422, selecting a representative sample from the positive sample and the negative sample with the largest information quantity to construct a triple;
s423. For each image
Figure 110133DEST_PATH_IMAGE007
Relative to anchor points
Figure 138394DEST_PATH_IMAGE008
Score of the amount of information
Figure 135169DEST_PATH_IMAGE009
To represent
Figure 456429DEST_PATH_IMAGE007
Whether it is a candidate positive sample sum
Figure 273075DEST_PATH_IMAGE010
To represent
Figure 839448DEST_PATH_IMAGE007
Whether the negative sample is a candidate negative sample, the calculation formula is as follows:
Figure 690729DEST_PATH_IMAGE011
Figure 448470DEST_PATH_IMAGE012
wherein,
Figure 519456DEST_PATH_IMAGE013
representing images
Figure 122476DEST_PATH_IMAGE007
And
Figure 359422DEST_PATH_IMAGE014
the similarity of class labels is calculated by using the pair-wise similarity distance,
Figure 258370DEST_PATH_IMAGE015
representing images
Figure 315188DEST_PATH_IMAGE007
And
Figure 721898DEST_PATH_IMAGE014
the degree of difficulty between the two is calculated by the Euclidean distance of the embedding space of the two.
6. The automatic classification method for dental images based on dual-label cascade as claimed in claim 1, characterized in that: the step S43 specifically includes:
s431, enabling the features of the same label to be close to each other in spatial position, enabling the features of different labels to be far away from each other in spatial position, and enabling the distance between the negative example and the positive example to be at least a long-distance term for two positive examples and one negative example of the same class; the formula is expressed as follows:
Figure 49237DEST_PATH_IMAGE016
wherein,
Figure 883201DEST_PATH_IMAGE017
indicating that the maximum value is taken in comparison with 0,
Figure 692894DEST_PATH_IMAGE018
Figure 404760DEST_PATH_IMAGE019
and
Figure 85140DEST_PATH_IMAGE020
respectively representing an anchor point, a positive sample and a negative sample,
Figure 886743DEST_PATH_IMAGE021
representing a distance term;
s432. Set a threshold between the positive and negative sample pairs, i.e.
Figure 154038DEST_PATH_IMAGE022
Wherein
Figure 902551DEST_PATH_IMAGE023
the angle is represented by the number of angles,
Figure 703017DEST_PATH_IMAGE024
i.e. by
Figure 645828DEST_PATH_IMAGE025
The adaptive triplet loss function is:
Figure 898954DEST_PATH_IMAGE027
wherein,
Figure 716738DEST_PATH_IMAGE028
in order to be a hyper-parameter,
Figure 873175DEST_PATH_IMAGE029
expressed as a strict edge distance term, the term,
Figure 219843DEST_PATH_IMAGE030
indicating a slack margin term.
7. The automatic classification method for dental images based on dual-label cascade as claimed in claim 1, characterized in that: the step S5 specifically includes:
s51, a target detection network; taking NanoDet as a detection network, selecting ShuffleNet V2 as a backbone, removing the last layer of convolution, extracting 8, 16 and 32 times of down-sampling features, and inputting the features into PAN for multi-scale feature fusion;
s52, a target detection loss function; selecting a Generalized local to train the network;
s53, target detection training; dual labels are used for training.
8. The automatic classification method for dental images based on dual-label cascade as claimed in claim 1, characterized in that: the formula adopted in step S52 is:
Figure 225845DEST_PATH_IMAGE031
wherein,
Figure 83205DEST_PATH_IMAGE032
a real label of 0 to 1,
Figure 592683DEST_PATH_IMAGE033
in order to predict the value of the target,
Figure 110252DEST_PATH_IMAGE034
9. an electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the dual-label cascade-based dental image automatic classification method of any one of claims 1 to 8.
10. A computer-readable storage medium, having stored thereon a computer program, characterized in that the computer program is executable by a computer processor for executing computer-readable instructions for carrying out the method according to any one of claims 1 to 8.
CN202211063304.3A 2022-09-01 2022-09-01 Method, equipment and medium for automatically classifying dental images based on dual-label cascade Pending CN115147873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211063304.3A CN115147873A (en) 2022-09-01 2022-09-01 Method, equipment and medium for automatically classifying dental images based on dual-label cascade

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211063304.3A CN115147873A (en) 2022-09-01 2022-09-01 Method, equipment and medium for automatically classifying dental images based on dual-label cascade

Publications (1)

Publication Number Publication Date
CN115147873A true CN115147873A (en) 2022-10-04

Family

ID=83416227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211063304.3A Pending CN115147873A (en) 2022-09-01 2022-09-01 Method, equipment and medium for automatically classifying dental images based on dual-label cascade

Country Status (1)

Country Link
CN (1) CN115147873A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109875513A (en) * 2019-03-19 2019-06-14 厦门瑞谱拓医疗科技有限公司 A kind of tooth detection device and automatic classification method based on couple photographing mode
CN112418134A (en) * 2020-12-01 2021-02-26 厦门大学 Multi-stream multi-label pedestrian re-identification method based on pedestrian analysis
CN112734031A (en) * 2020-12-31 2021-04-30 珠海格力电器股份有限公司 Neural network model training method, neural network model recognition method, storage medium, and apparatus
CN113033555A (en) * 2021-03-25 2021-06-25 天津大学 Visual SLAM closed loop detection method based on metric learning
CN113723236A (en) * 2021-08-17 2021-11-30 广东工业大学 Cross-mode pedestrian re-identification method combined with local threshold value binary image
CN114022904A (en) * 2021-11-05 2022-02-08 湖南大学 Noise robust pedestrian re-identification method based on two stages
CN114283316A (en) * 2021-09-16 2022-04-05 腾讯科技(深圳)有限公司 Image identification method and device, electronic equipment and storage medium
CN114419672A (en) * 2022-01-19 2022-04-29 中山大学 Cross-scene continuous learning pedestrian re-identification method and device based on consistency learning
US20220138252A1 (en) * 2019-09-03 2022-05-05 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image searches based on word vectors and image vectors
CN114550729A (en) * 2022-01-22 2022-05-27 珠海亿智电子科技有限公司 Cry detection model training method and device, electronic equipment and storage medium
CN114862771A (en) * 2022-04-18 2022-08-05 四川大学 Smart tooth identification and classification method based on deep learning network
CN114898407A (en) * 2022-06-15 2022-08-12 汉斯夫(杭州)医学科技有限公司 Tooth target instance segmentation and intelligent preview method based on deep learning

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109875513A (en) * 2019-03-19 2019-06-14 厦门瑞谱拓医疗科技有限公司 A kind of tooth detection device and automatic classification method based on couple photographing mode
US20220138252A1 (en) * 2019-09-03 2022-05-05 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image searches based on word vectors and image vectors
CN112418134A (en) * 2020-12-01 2021-02-26 厦门大学 Multi-stream multi-label pedestrian re-identification method based on pedestrian analysis
CN112734031A (en) * 2020-12-31 2021-04-30 珠海格力电器股份有限公司 Neural network model training method, neural network model recognition method, storage medium, and apparatus
CN113033555A (en) * 2021-03-25 2021-06-25 天津大学 Visual SLAM closed loop detection method based on metric learning
CN113723236A (en) * 2021-08-17 2021-11-30 广东工业大学 Cross-mode pedestrian re-identification method combined with local threshold value binary image
CN114283316A (en) * 2021-09-16 2022-04-05 腾讯科技(深圳)有限公司 Image identification method and device, electronic equipment and storage medium
CN114022904A (en) * 2021-11-05 2022-02-08 湖南大学 Noise robust pedestrian re-identification method based on two stages
CN114419672A (en) * 2022-01-19 2022-04-29 中山大学 Cross-scene continuous learning pedestrian re-identification method and device based on consistency learning
CN114550729A (en) * 2022-01-22 2022-05-27 珠海亿智电子科技有限公司 Cry detection model training method and device, electronic equipment and storage medium
CN114862771A (en) * 2022-04-18 2022-08-05 四川大学 Smart tooth identification and classification method based on deep learning network
CN114898407A (en) * 2022-06-15 2022-08-12 汉斯夫(杭州)医学科技有限公司 Tooth target instance segmentation and intelligent preview method based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KHANH NGUYEN: "AdaTriplet:Adaptive Gradient Triplet Loss with Automatic Margin Learning for Forensic Medical Image Matching", 《CS.CV》 *
XIAONAN ZHAO: "a weekly supervised adaptive triplet loss for deep metric learning", 《CS.CV》 *
潘丽丽: "基于自适应三元组网络的细粒度图像检索算法", 《郑州大学学报》 *

Similar Documents

Publication Publication Date Title
CN111062871B (en) Image processing method and device, computer equipment and readable storage medium
EP4009231A1 (en) Video frame information labeling method, device and apparatus, and storage medium
WO2021068323A1 (en) Multitask facial action recognition model training method, multitask facial action recognition method and apparatus, computer device, and storage medium
WO2020199931A1 (en) Face key point detection method and apparatus, and storage medium and electronic device
WO2020182121A1 (en) Expression recognition method and related device
WO2020024484A1 (en) Method and device for outputting data
JP2019032773A (en) Image processing apparatus, and image processing method
JP6800351B2 (en) Methods and devices for detecting burr on electrode sheets
WO2021233017A1 (en) Image processing method and apparatus, and device and computer-readable storage medium
CN111680550B (en) Emotion information identification method and device, storage medium and computer equipment
CN113888541B (en) Image identification method, device and storage medium for laparoscopic surgery stage
CN113570689B (en) Portrait cartoon method, device, medium and computing equipment
WO2023207778A1 (en) Data recovery method and device, computer, and storage medium
CN111476878A (en) 3D face generation control method and device, computer equipment and storage medium
CN114549557A (en) Portrait segmentation network training method, device, equipment and medium
CN114565602A (en) Image identification method and device based on multi-channel fusion and storage medium
WO2024131407A1 (en) Facial expression simulation method and apparatus, device, and storage medium
CN113643297A (en) Computer-aided age analysis method based on neural network
CN115147873A (en) Method, equipment and medium for automatically classifying dental images based on dual-label cascade
WO2020215682A1 (en) Fundus image sample expansion method and apparatus, electronic device, and computer non-volatile readable storage medium
CN111967289A (en) Uncooperative human face in-vivo detection method and computer storage medium
CN115761226A (en) Oral cavity image segmentation identification method and device, electronic equipment and storage medium
WO2022226744A1 (en) Texture completion
CN113450381A (en) System and method for evaluating accuracy of image segmentation model
Nallapati et al. Identification of Deepfakes using Strategic Models and Architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20230811