CN110909780B

CN110909780B - Image recognition model training and image recognition method, device and system

Info

Publication number: CN110909780B
Application number: CN201911113506.2A
Authority: CN
Inventors: 郑瀚; 尚鸿; 孙钟前
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2020-11-03
Anticipated expiration: 2039-11-14
Also published as: CN110909780A

Abstract

The application relates to the technical field of computers, in particular to an image recognition model training and image recognition method, device and system, wherein each image is recognized according to an initial image recognition model, the predicted lesion type of each image is respectively obtained, the predicted lesion type is judged according to an image report related to each image, the lesion type of each image is marked according to the judgment result, the image recognition model is obtained through iterative training according to each marked image and an initial training image sample set, further the lesion type recognition can be carried out on the image to be recognized based on the trained image recognition model, the lesion type recognition result of the image to be recognized can be determined, therefore, the image report is used for iterative training, the additional marking cost is not required to be increased, the iteration rate is improved, and the iterative updating can be carried out continuously, and the identification accuracy is improved.

Description

Image recognition model training and image recognition method, device and system

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, and a system for training an image recognition model and recognizing an image.

Background

At present, the training method adopted in the endoscope image diagnosis system uses pre-labeled data in the iterative process, needs a large amount of labeled data, and the labeled data needs a doctor or an expert to label, so that the cost is high, the time consumption is long, and no relevant solution is provided for the method.

Disclosure of Invention

The embodiment of the application provides an image recognition model training and image recognition method, device and system, so that the iteration rate of the model training of lesion recognition is improved, and the cost is reduced.

The embodiment of the application provides the following specific technical scheme:

an embodiment of the present application provides an image recognition method, including:

acquiring an image to be identified;

extracting image characteristic information of the image to be identified;

based on a pre-trained image recognition model, obtaining a lesion type recognition result of the image to be recognized by taking image characteristic information of the image to be recognized as an input parameter, wherein the image recognition model is used for performing iterative training according to each image after being labeled and an initial training image sample set so as to determine the lesion type recognition result, each image after being labeled is obtained after lesion type labeling is performed on each image based on the initial image recognition model and an associated image report, the initial image recognition model is obtained by training according to the initial training image sample set, and the initial training image sample is an image sample with lesion type labeling.

Another embodiment of the present application provides an image recognition model training method, including:

acquiring each image, identifying each image according to an initial image identification model, and respectively acquiring the predicted lesion type of each image, wherein the initial image identification model is acquired by training according to an initial training image sample set, and the initial training image sample is an image sample with lesion type labels;

acquiring an image report associated with each image, judging the predicted lesion category of each image according to the image report, and labeling the lesion category of each image according to a judgment result, wherein the image report comprises description information of a lesion diagnosis result corresponding to the image;

and carrying out iterative training according to the marked image images and the initial training image sample set to obtain an image recognition model.

Another embodiment of the present application provides an image recognition system, including at least: image acquisition device, image processing device and output device, specifically:

the image acquisition equipment is used for acquiring an image to be identified;

the processing device is used for extracting image characteristic information of the image to be recognized, and acquiring a lesion category recognition result of the image to be recognized by taking the image characteristic information of the image to be recognized as an input parameter based on a pre-trained image recognition model, wherein the image recognition model is obtained by performing iterative training according to each image after being labeled and an initial training image sample set so as to determine a lesion category recognition result, each image after being labeled is obtained by performing lesion category labeling on each image based on the initial image recognition model and an associated image report, the initial image recognition model is obtained by training according to the initial training image sample set, and the initial training image sample is an image sample with lesion category labeling;

and the output equipment is used for outputting the lesion type identification result of the image to be identified.

Another embodiment of the present application provides an image recognition apparatus, including:

the acquisition module is used for acquiring an image to be identified;

the extraction module is used for extracting the image characteristic information of the image to be identified;

the identification module is used for obtaining a lesion type identification result of the image to be identified by taking image characteristic information of the image to be identified as an input parameter based on a pre-trained image identification model, wherein the image identification model is used for performing iterative training according to each image after being labeled and an initial training image sample set so as to determine the lesion type identification result, each image after being labeled is obtained after lesion type labeling is performed on each image based on the initial image identification model and an associated image report, the initial image identification model is obtained by training according to the initial training image sample set, and the initial training image sample is an image sample with lesion type labels.

Another embodiment of the present application provides an image recognition model training apparatus, including:

the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring each image, identifying each image according to an initial image identification model and respectively acquiring the predicted lesion type of each image, the initial image identification model is acquired by training according to an initial training image sample set, and the initial training image sample is an image sample with lesion type marks;

the processing module is used for acquiring the image report related to each image, judging the predicted lesion type of each image according to the image report, and marking the lesion type of each image according to the judgment result, wherein the image report comprises the description information of the lesion diagnosis result corresponding to the image;

and the iterative training module is used for performing iterative training to obtain an image recognition model according to the marked image images and the initial training image sample set.

Another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements any one of the image recognition model training methods or the steps of the image recognition method when executing the program.

Another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements any one of the image recognition model training methods or the steps of the image recognition method.

In the embodiment of the application, each image is identified according to the initial image identification model, the predicted lesion type of each image is respectively obtained, the predicted lesion type is judged according to the image report associated with each image, the lesion type of each image is labeled according to the judgment result, and the image identification model is obtained through iterative training according to each image labeled and the initial training image sample set, so that the image identification model can be iteratively trained by using the image report without increasing additional labeling cost and reducing cost, the image identification model can be rapidly iterated, the efficiency is improved, the upgrading speed of products is accelerated, the accuracy of the image identification model can be improved along with the continuous iterative updating of the image identification model, and the lesion type identification is carried out on the image to be identified based on the trained image identification model, the lesion category identification result of the image to be identified can be determined, and the identification accuracy is improved.

Drawings

FIG. 1 is a schematic diagram of an application architecture of an image recognition model training and image recognition method in an embodiment of the present application;

FIG. 2 is a flowchart of an image recognition method in an embodiment of the present application;

FIG. 3 is a flowchart of an image recognition model training method in an embodiment of the present application;

FIG. 4 is a schematic diagram of a splicing structure of a mini-batch during training in the embodiment of the present application;

FIG. 5 is a schematic diagram of an image recognition model training method in an embodiment of the present application;

FIG. 6 is a schematic diagram of a network structure of a classification deep network at an initialization stage of training in an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an image recognition system according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image recognition model training device in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

For the purpose of facilitating an understanding of the embodiments of the present application, a brief introduction of several concepts is provided below:

medical image video: the image video representing the scanning at the time of medical diagnosis, for example, an endoscope image video, including various endoscope images of the digestive tract, ear, nose, and throat, and the like.

Image: the embodiment of the present application shows an image of one frame extracted from a medical image video, for example, an image of an endoscope in the digestive tract, including an image captured by an endoscope in a gastroscope or an enteroscope.

And (3) image reporting: the endoscope examination report is an examination report which can be output by a doctor after the endoscope examination is carried out, and the image report comprises description information of a lesion diagnosis result.

Weak supervision learning: a training method is shown that utilizes annotation information that is weaker than the task requirements.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image Recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, three-dimensional (3D) technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and further include common biometric technologies such as face Recognition and fingerprint Recognition. For example, in the embodiment of the present application, an artificial intelligence technology may be applied to the medical field, and the embodiment of the present application mainly relates to a computer vision technology in artificial intelligence, and may implement image feature extraction by an image semantic understanding technology in the computer vision technology, for example, image feature information of an image sample in an initial training image sample set or an image to be recognized is extracted, and for example, inference recognition may be performed based on an image classification technology in the image semantic understanding technology, so that a lesion category recognition result of the image to be recognized may be determined according to preset image feature information of each lesion category.

Along with the research and progress of artificial intelligence technology, the artificial intelligence technology develops research and application in a plurality of fields, for example, common intelligent home, intelligent wearable equipment, virtual assistant, intelligent sound box, intelligent marketing, unmanned driving, automatic driving, unmanned aerial vehicle, robot, intelligent medical treatment, intelligent customer service and the like.

The scheme provided by the embodiment of the application mainly relates to the technologies of artificial intelligence, such as computer vision and the like, and is specifically explained by the following embodiment:

at present, the endoscope image diagnosis system mainly adopts a full-supervision training method to obtain an endoscope image and manually label the endoscope image, a large amount of labeled data is needed in an iteration process, and the labeled data needs a doctor or an expert to label, so that the cost is high, and the time consumption is long.

In view of the above, an embodiment of the present application provides an image recognition model training method, where an initial image recognition model is obtained by using an initial training image sample set for training initially, so that the initial image recognition model has a certain lesion recognition capability on an image, and then the image recognition model can be updated iteratively by using an image report and an image associated with the image report, without increasing additional labeling cost, so as to improve an iteration rate of the image recognition model, and further perform lesion recognition on an image to be recognized according to the image recognition model updated iteratively, so as to obtain a lesion category recognition result of the image to be recognized.

Fig. 1 is a schematic diagram of an application architecture of an image recognition model training and image recognition method in the embodiment of the present application, including a server 100 and a terminal device 200.

The terminal device 200 may be a medical device, for example, a user may view an image lesion recognition result based on the terminal device 200, and may also capture an endoscopic image through the terminal device 200.

The terminal device 200 and the server 100 can be connected via the internet to communicate with each other. Optionally, the internet described above uses standard communication techniques and/or protocols. The internet is typically the internet, but can be any Network including, but not limited to, Local Area Networks (LANs), Metropolitan Area Networks (MANs), Wide Area Networks (WANs), mobile, wired or wireless networks, private networks, or any combination of virtual private networks. In some embodiments, data exchanged over a network is represented using techniques and/or formats including Hypertext Mark-up Language (HTML), Extensible Markup Language (XML), and the like. All or some of the links may also be encrypted using conventional encryption techniques such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (IPsec), and so on. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of, or in addition to, the data communication techniques described above.

The server 100 may provide various network services for the terminal device 200, wherein the server 100 may be a server, a server cluster composed of several servers, or a cloud computing center.

Specifically, the server 100 may include a processor 110 (CPU), a memory 120, an input device 130, an output device 140, and the like, the input device 130 may include a keyboard, a mouse, a touch screen, and the like, and the output device 140 may include a Display device, such as a Liquid Crystal Display (LCD), a Cathode Ray Tube (CRT), and the like.

Memory 120 may include Read Only Memory (ROM) and Random Access Memory (RAM), and provides processor 110 with program instructions and data stored in memory 120. In the embodiment of the present application, the memory 120 may be used to store a program of an image recognition model training method or an image recognition method in the embodiment of the present application.

The processor 110 is configured to execute the steps of any one of the image recognition model training methods or the image recognition methods in the embodiments of the present application according to the obtained program instructions by calling the program instructions stored in the memory 120.

It should be noted that, in the embodiment of the present application, the image recognition model training method or the image recognition method is mainly executed by the server 100, for example, for the image recognition method, the terminal device 200 may send the acquired endoscopic image to the server 100, and the server 100 performs lesion recognition on the image and may return a lesion recognition result to the terminal device 200. For another example, the terminal device 200 may send the medical image video and the related image report to the server 100, and the server 100 may update the iterative image recognition model according to the medical image video and the related image report to improve the accuracy of the image recognition model, as shown in fig. 1, the application architecture is described as being applied to the server 100 side, but of course, the image recognition method in the embodiment of the present application may be executed by the terminal device 200, for example, the terminal device 200 may obtain the trained image recognition model from the server 100 side, and perform lesion recognition on the image based on the image recognition model to obtain a lesion recognition result, which is not limited in the embodiment of the present application, and in general, due to the performance limitation of the terminal device 200, the image recognition model training method is executed by the server 100 side.

The application architecture diagram in the embodiment of the present application is for more clearly illustrating the technical solution in the embodiment of the present application, and is not limited to the technical solution provided in the embodiment of the present application, and certainly, is not limited to endoscopic image service applications, and for other application architectures and service applications, the technical solution provided in the embodiment of the present application is also applicable to similar problems.

The various embodiments of the present application are schematically illustrated as applied to the application architecture diagram shown in fig. 1.

Based on the foregoing embodiment, referring to fig. 2, a flowchart of an image recognition method in the embodiment of the present application is shown, where the method includes:

step 200: and acquiring an image to be identified.

For example, the image of the digestive tract acquired by the digestive tract endoscope can be a video or an image, and further the acquired image can be sent to the server, if the image is a video, the server can perform framing processing on the image video to obtain an image, namely an image to be identified, which can also be called as an image of the digestive tract endoscope to be identified.

Step 210: and extracting image characteristic information of the image to be identified.

Specifically, the neural network system may be used to extract image feature information, and other image feature extraction methods may also be used, which is not limited in this embodiment.

Step 220: and based on a pre-trained image recognition model, obtaining a lesion type recognition result of the image to be recognized by taking the image characteristic information of the image to be recognized as an input parameter.

The image recognition model is obtained by performing iterative training according to each image after being labeled and an initial training image sample set to determine a lesion type recognition result, each image after being labeled is obtained after performing lesion type labeling on each image based on the initial image recognition model and an associated image report, the initial image recognition model is obtained by training according to the initial training image sample set, and the initial training image sample is an image sample with lesion type labels.

That is to say, in the embodiment of the present application, the image recognition model may be applied to, for example, an endoscope auxiliary diagnosis system, and is used to recognize a lesion type, the image recognition model may continuously update iteration using an image report, and since the image report is a file that is usually output by a relevant person during endoscope examination, no additional labor cost needs to be added, the image report includes a lesion diagnosis result, and may be used as supervision information to iteratively train the image recognition model, so that the training efficiency of the image recognition model is improved, and the cost is reduced.

Based on the above embodiment, the following describes an image recognition model training method in the embodiment of the present application, and referring to fig. 3, which is a flowchart of the image recognition model training method in the embodiment of the present application, and the method includes:

step 300: acquiring each image, identifying each image according to an initial image identification model, and respectively acquiring the predicted lesion type of each image, wherein the initial image identification model is acquired by training according to an initial training image sample set, and the initial training image sample is an image sample with lesion type labels.

In the embodiment of the application, the initial training image sample can be obtained by artificially labeling the lesion category of an image, and an initial image recognition model is obtained by training according to an initial training image sample set, so that the image recognition model has basic lesion category recognition capability on the image.

When step 300 is executed, the method specifically includes:

and S1, acquiring each image.

The method comprises the following steps: 1) and acquiring a medical image video.

2) Sampling the medical image video according to a preset frame period to obtain the extracted image of each frame.

For example, a uniform sampling method may be adopted, one frame may be extracted every n frames, and other sampling methods may also be adopted to extract the image of each frame, which is not limited in this embodiment of the application.

Further, in order to improve efficiency and accuracy, the proposed image images of each frame may be filtered, and specifically, a possible implementation manner is provided in the embodiment of the present application, the extracted image images of each frame are filtered, so as to filter out image images with a resolution smaller than a preset size, and then each filtered image may be input into an initial image recognition model for recognition, so as to filter out low-quality image images, such as completely black or too blurred image images, avoid recognition of the low-quality image images, and further reduce the time cost of recognition operation of the image recognition model.

And S2, recognizing each image according to the initial image recognition model, and respectively obtaining the predicted lesion type of each image.

In the embodiment of the application, the initial image recognition model has a certain lesion type recognition capability, can recognize each image to obtain a prediction result, and can correct the prediction result by using the associated image report as correct supervision information.

Step 310: and acquiring an image report related to each image, judging the predicted lesion type of each image according to the image report, and marking the lesion type of each image according to a judgment result, wherein the image report comprises description information of a lesion diagnosis result corresponding to the image.

When step 310 is executed, the method specifically includes:

and S1, acquiring video reports related to the video images.

In practice, when performing endoscopic diagnosis, an image report is outputted along with the image, which is used to describe a lesion diagnosis result of a corresponding image, for example, when diagnosing cancer, the lesion diagnosis result may include positive, negative, etc., and may also describe other information, such as a lesion area size, etc., so that a related image report may be directly obtained without manually labeling a sample additionally.

And S2, judging the predicted lesion type of each image according to the image report, and marking the lesion type of each image according to the judgment result.

The method specifically comprises the following steps:

and S2.1, extracting keywords related to the lesion type from the image report.

In the embodiment of the application, the lesion diagnosis result described in the image report may be obtained by extracting a keyword, where the required keyword is related to a lesion recognition task, for example, if the recognition target of the auxiliary diagnosis system is to distinguish whether an image contains cancer, the keyword related to cancer in the image report is only required to be screened to obtain the keyword related to cancer, so as to determine the lesion diagnosis result.

And S2.2, determining the pathological change diagnosis result of each image according to the extracted keywords.

For example, for identification of cancer, the diagnosis result of the lesion may include positive and negative, wherein positive indicates normal cancer, and positive indicates abnormal cancer.

And S2.3, comparing the lesion diagnosis result of each image with the predicted lesion type, marking the lesion type of each image as the predicted lesion type if the lesion diagnosis result is consistent with the predicted lesion type, and marking the lesion type of each image as the type indicated by the lesion diagnosis result if the lesion diagnosis result is inconsistent with the predicted lesion type.

The lesion diagnosis result in the image report is taken as a correct result, and the predicted lesion type obtained based on the initial image recognition model is corrected by taking the lesion diagnosis result in the image report as supervision information.

For example, if the predicted lesion type of each predicted image is not positive for a positive image video, the predicted lesion type is incorrect, the lesion type of the image is marked as positive, and further for the binary classification task, the threshold value of the initial image recognition model can be reduced, and the recall of the image sample of the partial image is ensured. If the predicted lesion type of the image with the negative lesion diagnosis result in one image report is positive, the predicted lesion type is corrected, and the lesion type of the image is marked as negative.

It should be noted that, in practice, different hospitals or medical institutions may have different image report formats, so that the contents of the image reports need to be sorted and output according to different strategies, and the predicted lesion categories are determined and corrected by using the image reports, and different strategies may be adopted based on different tasks of lesion category identification, and may not be limited to a way of matching according to the extracted keywords, for example, information representing the required lesion categories may also be directly extracted from corresponding positions in a preset template according to the preset template of the image reports.

In this way, the predicted lesion type is corrected by using the image report, and the obtained labeled image can be used as a training image sample to judge that the lesion type of each labeled image is correct, so that the accuracy of the trained image recognition model is improved.

Step 320: and performing iterative training to obtain an image recognition model according to the marked image images and the initial training image sample set.

In step 320, the present application provides two possible implementations:

the first embodiment: and retraining to obtain an image recognition model according to the marked image images and the initial training image sample set.

That is to say, in the embodiment of the present application, the image recognition model may be retrained by combining all of the previous initial training image sample sets and the annotated image data, which may require a higher time cost due to the need to restart training the image recognition model, but may ensure that the model training process fits the full amount of data.

The second embodiment: and updating the training initial image recognition model according to the marked image images and the initial training image sample set.

That is, in the embodiment of the present application, iterative update may be performed on the basis of the initial image recognition model, and further, fine adjustment may be performed on the initial image recognition model, for example, for a binary classification task, a predicted threshold may be adjusted, and then iterative update may be performed on the basis of the fine-adjusted initial image recognition model.

At this time, it is also necessary to use the marked video images and the initial training image sample set for training at the same time, because if only the marked video images are trained, the model will "forget" the initial image sample training set, and the model effect will be reduced, so when the training is updated, the initial image recognition model is updated by using the initial training image sample set and the marked video images at the same time, so that the iterative updating can be performed on the basis of the preorder model, and the speed is faster.

For example, in order to increase the algorithm training speed, a mini-batch gradient descent method is adopted, and the mini-batch is a training mode of dividing the whole large training set into a plurality of small training sets, so in the second embodiment, the mini-batch used in training can be divided into two parts, as shown in fig. 4, a schematic diagram of a splicing structure of the mini-batch used in training in the embodiment of the present application is shown, one part is only from the previous initial training image sample set, the other part is only sampled from each image after labeling, and the structure of each mini-batch is shown in fig. 4, so that the iteration speed can be increased.

In the embodiment of the present application, an incremental learning method may also be used for the strategy of fine-tuning the initial image recognition model in combination with the corrected and labeled image, and is not limited.

Further, after step 320 is executed, the image recognition model obtained by iterative training may be evaluated, so as to determine whether to update the initial image recognition model and the initial image sample training set, which is provided in the embodiment of the present application, and the possible evaluation method specifically includes:

1) a sample set of test images is acquired.

In the embodiment of the application, the test image sample set is mainly used for evaluating the effects of the models before and after the iterative training, and in order to ensure the correctness of the comparison, a fixed test image sample set can be adopted, and each evaluation is performed on the basis of the fixed test image sample set.

2) And according to the image recognition model after iterative training, carrying out lesion type recognition on each test image sample in the test image sample set, and according to the recognition result, determining the accuracy of the image recognition model after iterative training.

Further, when the initial image recognition model is evaluated, lesion type recognition is carried out on all the test image samples in the test image sample set according to the initial image recognition model, and the accuracy of the initial image recognition model is determined according to the recognition result.

3) Further, the following two processing methods can be classified according to the comparison result:

the first processing mode is as follows: and if the accuracy of the image recognition model after iterative training is determined to be greater than the accuracy of the initial image recognition model, merging the marked image images and the initial training image sample set, using the merged image images as the initial training image sample set of the next iterative training, and using the image recognition model after iterative training as the initial image recognition model of the next iterative training.

The image recognition model after the iterative training is determined to be higher in accuracy, the effect of the image recognition model after the iterative training is improved, the recognition effect is better, the iteration can be considered to be successful, the marked image images and the previous initial image sample set can be merged and updated into the next initial training image sample set, the number of training image samples can be expanded, and the image recognition model after the iterative training is used as the output of the weak supervision training and is used as the initial image recognition model of the next iterative training.

The second processing mode is as follows: and if the accuracy of the image recognition model after the iterative training is determined to be not more than the accuracy of the initial image recognition model, taking the initial training image sample set as the initial training image sample set of the next iterative training, and taking the initial image recognition model as the initial image recognition model of the next iterative training.

That is, if it is determined that the effect of the image recognition model after the iterative training is reduced, the initial training image sample set does not need to be updated, that is, the initial training image sample set is not changed, and the initial image recognition model before the iterative training is used as the output of the current weak supervised training.

Of course, when evaluating the effect of the image recognition model after the iterative training, the embodiment of the present application may not be limited to the parameter index of the accuracy, and may also use other parameter indexes or a combination of multiple parameter indexes for evaluation, without limitation.

Therefore, more accurate automatic updating of the image recognition model is realized through model evaluation, the iterative training effect of the image recognition model is ensured, and the accuracy of the image recognition model is improved.

In the embodiment of the application, each image is identified according to the initial image identification model, the predicted lesion category of each image is respectively obtained, the predicted lesion category is judged and corrected according to the associated image report, the lesion category of each image is labeled according to the judgment result, and the image identification model is obtained through iterative training according to each image after being labeled and the initial training image sample set, so that a weak supervision training method which can use the image report as supervision information is provided, the image identification model with certain identification capability can be quickly iterated without increasing additional labeling cost, on one hand, a large amount of online running water data and corresponding image report information are effectively used, on the other hand, the cost required by model iteration can be reduced, the iteration speed is increased, and the upgrading speed of products is accelerated, for example, the image recognition model training method can be applied to model iteration of an auxiliary diagnosis system of the gastrointestinal endoscope, and in the project advancing process, the model is iterated by using the inspection image video with the image report, so that the iteration updating rate is increased, and the cost is reduced.

Based on the above embodiments, a brief description is provided below of a training principle process of an image recognition model in the embodiments of the present application, and reference is made to fig. 5, which is a schematic diagram of a training method of an image recognition model in the embodiments of the present application.

As shown in fig. 5, the overall process of the image recognition model training method may be divided into several parts, namely initialization, model inference, inference correction, model iteration, and iterative evaluation, which are briefly described below.

1) And (5) initializing.

In the embodiment of the application, in an initialization stage, an image recognition model is trained depending on an endoscope image with a manual mark, namely, an image sample with a lesion type mark is obtained first and is used as an initial training image sample set, the image sample can be obtained by depending on the manual mark at the beginning, and can be updated continuously without depending on manual work subsequently.

Specifically, a classification deep network structure may be designed initially, as shown in fig. 6, which is a schematic network structure diagram of a classification deep network in an initialization training stage in the embodiment of the present application, and a classification deep network related to an image generally includes a feature extraction portion formed by convolutional layers and a classifier formed by fully-connected layers.

2) And (6) model inference.

In the model estimation stage, each image is mainly identified according to the initial image identification model, and the predicted lesion type of each image is obtained respectively.

Furthermore, the medical image video can be preprocessed, the preprocessing comprises frame sampling to obtain image images of all frames and filtering and screening low-quality image images, the final image images are input into an initial image recognition model, each image is deduced and predicted, and the predicted lesion type recognized by each image is recorded.

3) And (6) deducing and correcting.

In the inference and correction stage, mainly combining with an image report to correct the recognition result of the initial image recognition model, specifically comprising: and judging the predicted lesion type of each image according to the image report, and marking the lesion type of each image according to the judgment result.

It should be noted that, in the inference and correction stage in the embodiment of the present application, the adopted policy may be different for different lesion type identification target tasks, for example, the system applied by the image identification model is a disease classification system, that is, the target task is to distinguish whether a lesion is included, and then the predicted lesion type may be compared with the predicted lesion type according to a keyword associated with the lesion type in the image report, so as to correct the predicted lesion type and label the lesion type of each image.

4) And (6) model iteration.

In the model iteration stage, an image recognition model is obtained through iterative training according to the marked image images and the initial training image sample set based on a certain strategy, and the purpose is to ensure that the image recognition model learns the new marked image images under the condition that the recognition effect of the original initial training image sample set is not reduced.

Specifically, the different policies may include: retraining to obtain an image recognition model according to the marked images, the image and the initial training image sample set; or updating the training initial image recognition model according to the marked image images and the initial training image sample set.

5) And (6) performing iterative evaluation.

In the iteration evaluation stage, the image recognition model after the iteration training is evaluated mainly based on a unified test set to evaluate whether the iteration is successful or not, and finally the image recognition model output by the iteration is obtained, namely, whether the image recognition model after iterative training has better effect than the initial image recognition model is evaluated, if the result is successful, the version of the initial image recognition model can be updated, the image recognition model after the iterative training is output in the current iteration, and can also merge each image after marking with the initial training image sample set, update the initial training image sample set, otherwise return to the initial image recognition model version, the iteration output is the initial image recognition model, still regard the initial image recognition model as the initial image recognition model of the next iteration training, and the initial training image sample set is still used as the initial training image sample set of the next iteration training.

Therefore, the whole process of the training method in the embodiment of the application can be used as an automatic updating scheme, the image report which is output along with the endoscopy by a doctor or related personnel is used as supervision information for weak supervision training, and no additional manual labeling sample is needed to be added, so that the image recognition model can be automatically updated according to a large amount of running water data of the endoscopy, the iteration of the image recognition model is accelerated, and the cost is reduced.

Based on the above embodiments, referring to fig. 7, a schematic structural diagram of an image recognition system in an embodiment of the present application is shown.

The image recognition system comprises at least an image acquisition device 70, a processing device 71 and an output device 72. In the embodiment of the present application, the image capturing device 70, the processing device 71, and the output device 72 are related medical devices, and may be integrated in the same medical device, or may be divided into a plurality of devices, which are connected to each other for communication to form a medical system for use, for example, for diagnosing digestive tract diseases, the image capturing device 70 may be an endoscope, and the processing device 71 and the output device 72 may be computer devices communicating with the endoscope, and the like.

Specifically, the image capturing device 70 is used to acquire an image to be recognized.

And the processing device 71 is used for extracting the image characteristic information of the image to be recognized, and obtaining a lesion type recognition result of the image to be recognized by taking the image characteristic information of the image to be recognized as an input parameter based on a pre-trained image recognition model.

The image recognition model is obtained by performing iterative training according to each image after being labeled and an initial training image sample set to determine a lesion type recognition result, the image images after being labeled are obtained after lesion type labeling is performed on the image images based on the initial image recognition model and an associated image report, the initial image recognition model is obtained by training according to the initial training image sample set, and the initial training image sample is an image sample with lesion type labeling.

And the output device 72 is used for outputting the lesion type identification result of the image to be identified.

Based on the above embodiments, referring to fig. 8, an image recognition apparatus in an embodiment of the present application specifically includes:

an obtaining module 80, configured to obtain an image to be identified;

an extraction module 81, configured to extract image feature information of the image to be identified;

the identification module 82 is configured to obtain a lesion category identification result of the image to be identified based on a pre-trained image identification model, where the image characteristic information of the image to be identified is used as an input parameter, the image identification model is obtained by performing iterative training according to each image after being labeled and an initial training image sample set to determine the lesion category identification result, each image after being labeled is obtained after performing lesion category labeling on each image based on the initial image identification model and an associated image report, the initial image identification model is obtained by training according to the initial training image sample set, and the initial training image sample is an image sample with lesion category labeling.

Optionally, the training mode of the image recognition model includes:

the prediction module 83 is configured to obtain each image, identify each image according to an initial image identification model, and obtain a predicted lesion type of each image;

a processing module 84, configured to obtain an image report associated with each image, determine a predicted lesion type of each image according to the image report, and label the lesion type of each image according to a determination result, where the image report includes description information of a lesion diagnosis result corresponding to the image;

and the iterative training module 85 is configured to perform iterative training to obtain an image recognition model according to the labeled image images and the initial training image sample set.

Optionally, when the predicted lesion category of each image is determined according to the image report and the lesion category of each image is labeled according to the determination result, the processing module 84 is specifically configured to:

extracting keywords associated with the lesion category from the image report;

determining the pathological change diagnosis result of each image according to the extracted keywords;

and comparing the lesion diagnosis result of each image with a predicted lesion category, if the lesion diagnosis result is consistent with the predicted lesion category, marking the lesion category of each image as the predicted lesion category, and if the lesion diagnosis result is inconsistent with the predicted lesion category, marking the lesion category of each image as the category represented by the lesion diagnosis result.

Optionally, when the image recognition model is obtained through iterative training according to the labeled image samples and the initial training image sample set, the iterative training module 85 is specifically configured to:

and retraining to obtain an image recognition model according to the marked image images and the initial training image sample set.

Optionally, when the image recognition model is obtained by training according to the labeled image samples and the initial training image sample set, the iterative training module 85 is specifically configured to:

and updating and training the initial image recognition model according to the marked image images and the initial training image sample set.

Optionally, after the iterative training obtains the image recognition model, the evaluating module 86 is further configured to:

acquiring a test image sample set;

according to the image recognition model after iterative training, carrying out lesion category recognition on each test image sample in the test image sample set, and according to a recognition result, determining the accuracy of the image recognition model after iterative training;

if the accuracy of the image recognition model after the iterative training is determined to be greater than the accuracy of the initial image recognition model, merging each image after the marking and the initial training image sample set, and taking the image recognition model after the iterative training as the initial training image sample set of the next iterative training;

and if the accuracy of the image recognition model after the iterative training is determined to be not more than the accuracy of the initial image recognition model, taking the initial training image sample set as the initial training image sample set of the next iterative training, and taking the initial image recognition model as the initial image recognition model of the next iterative training.

Based on the above embodiment, referring to fig. 9, an image recognition model training apparatus in an embodiment of the present application specifically includes:

the prediction module 90 is configured to obtain each image, identify each image according to an initial image identification model, and obtain a predicted lesion type of each image, respectively, where the initial image identification model is obtained by training according to an initial training image sample set, and the initial training image sample is an image sample with lesion type labels;

the processing module 91 is configured to obtain an image report associated with each image, determine a predicted lesion type of each image according to the image report, and label the lesion type of each image according to a determination result, where the image report includes description information of a lesion diagnosis result corresponding to the image;

and the iterative training module 92 is configured to perform iterative training to obtain an image recognition model according to the labeled image images and the initial training image sample set.

Optionally, when obtaining each image, the prediction module 90 is specifically configured to:

acquiring a medical image video;

and sampling the medical image video according to a preset frame period to obtain the extracted image of each frame.

Optionally, the prediction module 90 is further configured to: and screening the extracted image images of each frame, and filtering out the image images with the resolution ratio smaller than the preset size.

Based on the foregoing embodiments, an electronic device of another exemplary embodiment is provided in this application embodiment, and in some possible embodiments, the electronic device in this application embodiment may include a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor may implement the steps of the image recognition model training method or the image recognition method in the foregoing embodiments when executing the program.

For example, taking an electronic device as the server 100 in fig. 1 of the present application for illustration, a processor in the electronic device is the processor 110 in the server 100, and a memory in the electronic device is the memory 120 in the server 100.

Based on the above embodiments, in the embodiments of the present application, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the image recognition model training method or the image recognition method in any of the above method embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the embodiments of the present application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims

1. An image recognition method, comprising:

acquiring an image to be identified;

extracting image characteristic information of the image to be identified;

obtaining a lesion type identification result of the image to be identified by taking image characteristic information of the image to be identified as an input parameter based on a pre-trained image identification model, wherein the image identification model is used for performing iterative training according to each image after being labeled and an initial training image sample set so as to determine the lesion type identification result, each image after being labeled is obtained after labeling the lesion type of each image based on the initial image identification model and an associated image report, the initial image identification model is obtained by training according to the initial training image sample set, and the initial training image sample is an image sample with lesion type labels;

the training mode of the image recognition model comprises the following steps:

acquiring each image, identifying each image according to an initial image identification model, and respectively acquiring the predicted lesion category of each image;

2. The method of claim 1, wherein determining the predicted lesion type of each image according to the image report and labeling the lesion type of each image according to the determination result comprises:

extracting keywords associated with the lesion category from the image report;

3. The method of claim 1, wherein iteratively training to obtain an image recognition model according to the labeled image samples and the initial training image sample set specifically comprises:

4. The method of claim 1, wherein iteratively training to obtain an image recognition model according to the labeled image samples and the initial training image sample set specifically comprises:

5. The method of claim 3 or 4, wherein after iteratively training to obtain the image recognition model, further comprising:

acquiring a test image sample set;

6. An image recognition model training method is characterized by comprising the following steps:

7. The method of claim 6, wherein acquiring each image comprises:

acquiring a medical image video;

8. The method of claim 7, further comprising:

and screening the extracted image images of each frame, and filtering out the image images with the resolution ratio smaller than the preset size.

9. An image recognition system, characterized by comprising at least: image acquisition device, image processing device and output device, specifically:

the output device is used for outputting a lesion category identification result of the image to be identified;

the training mode of the image recognition model comprises the following steps:

10. An image recognition apparatus, comprising:

the acquisition module is used for acquiring an image to be identified;

the identification module is used for obtaining a lesion type identification result of the image to be identified by taking image characteristic information of the image to be identified as an input parameter based on a pre-trained image identification model, wherein the image identification model is used for performing iterative training according to each image after being labeled and an initial training image sample set so as to determine the lesion type identification result, each image after being labeled is obtained after lesion type labeling is performed on each image based on the initial image identification model and an associated image report, the initial image identification model is obtained by training according to the initial training image sample set, and the initial training image sample is an image sample with lesion type labels;

the training mode of the image recognition model comprises the following steps:

11. An image recognition model training apparatus, comprising:

the prediction module is used for acquiring each image, recognizing each image according to an initial image recognition model and respectively acquiring the predicted lesion type of each image, wherein the initial image recognition model is acquired by training according to an initial training image sample set, and the initial training image sample is an image sample with lesion type labels;

12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1-5 or 6-8 are performed when the program is executed by the processor.

13. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1-5 or 6-8.