CN111524570B

CN111524570B - Ultrasonic follow-up patient screening method based on machine learning

Info

Publication number: CN111524570B
Application number: CN202010371381.XA
Authority: CN
Inventors: 张敬谊; 李静; 潘怀燕; 郑文婕; 李学源; 李光亚; 肖筱华
Original assignee: SHANGHAI PUBLIC HEALTH CLINICAL CENTER; WONDERS INFORMATION CO Ltd
Current assignee: SHANGHAI PUBLIC HEALTH CLINICAL CENTER; WONDERS INFORMATION CO Ltd
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2024-01-16
Anticipated expiration: 2040-05-06
Also published as: CN111524570A

Abstract

The invention provides an ultrasonic follow-up patient screening method based on machine learning. Due to the rapid development of deep learning technology, natural language processing technology and deep learning technology are used as important means for analyzing medical texts, and are effective ways for replacing manual text screening. According to the invention, the text content is segmented by a JIEBA segmentation tool, word vectors are respectively constructed by adopting a TF-IDF method and a Word2Vec algorithm, and a chi-square test method is further utilized to select feature vectors. The classification model selects XGBoost, lightgbm and CNN to perform training modeling on the characteristic data, so that automatic screening of an ultrasonic examination follow-up list is realized.

Description

Ultrasonic follow-up patient screening method based on machine learning

Technical Field

The invention relates to an ultrasonic follow-up patient screening method based on an electronic health record, and belongs to the field of ultrasonic follow-up knowledge discovery.

Background

With the rapid development of ultrasound technology in recent years, the application of ultrasound technology in clinical diagnosis is becoming more and more widespread, and the ultrasound technology becomes one of the standard configurations of most hospitals, and the ultrasound examination does not expose patients to ionizing radiation and is free from risk interference of radiation-induced cancers. The accuracy of ultrasonic diagnosis depends on two aspects: firstly, whether a doctor can acquire clear images enough to support clinical diagnosis through an ultrasonic probe or not; and secondly, whether the sonographer gives a correct diagnostic description. Because the accuracy of the diagnostic results affects the diagnosis and treatment of diseases, medical institutions in various countries are striving to improve the ultrasonic diagnostic level.

In China, in order to ensure the ultrasonic diagnosis level, the ultrasonic department often carries out retrospective investigation on ultrasonic examination results in a follow-up mode. The three-level hospital ultrasound quality control guidelines issued by the society of sonographers of the Chinese society indicate that each three-level hospital should carry out selective, periodic or unscheduled follow-up of patients and their data after ultrasound examination, the ultrasound follow-up being based on diagnostic data and periodic follow-up, the following diagnostic and prognostic data being considered: (1) conclusion of pathological examination; (2) discovery of surgical treatment; (3) important laboratory examination results; (4) results of other medical imaging examinations (e.g., CT, magnetic resonance, nuclear or cardiovascular imaging, etc.); (5) data related to scientific research projects; (6) other data requiring follow-up collection. The ultrasonic follow-up data should be periodically subjected to statistical analysis, and the pathological or surgical operation and the like are used as references to calculate the pathological change positioning diagnosis coincidence rate and the physical property diagnosis coincidence rate of ultrasonic examination. The conforming ultrasonic department and the ultrasonic positioning diagnosis conforming rate and the physical property diagnosis conforming rate should reach more than 95 percent. For cases of misdiagnosis or missed diagnosis, the cause should be analyzed in time.

The primary link of the ultrasonic follow-up is to find out the important patients needing follow-up, namely those who have undergone ultrasonic examination and then have undergone surgery or image examination or pathological examination on the same part. In the past, acquiring a follow-up patient list requires arranging a specialized person to review a large number of archival medical records of the patient, find all image reports and pathology reports that the ultrasound patient subsequently made, and exclude a large number of irrelevant report contents. Therefore, the traditional follow-up patient screening mode is large in workload and low in efficiency, and the accuracy often depends on the capability level of staff and the period of medical history filing.

With the advancement of IT technology, hospital electronic medical record systems contain more and more abundant patient information, and electronic medical records include medical orders, inspection reports, image reports, ultrasound reports, pathology reports, operation records, and various disease course records, but most reports still adopt unstructured text formats, and descriptions related to ultrasound examination sites exist in a large section of free text.

Disclosure of Invention

The invention aims to solve the technical problems that: traditional follow-up patient screening methods are heavy in workload and inefficient, and accuracy often depends on the level of staff's ability and the period of medical history archiving.

In order to solve the technical problems, the technical scheme of the invention is to provide an ultrasonic follow-up patient screening method based on machine learning, which is characterized by comprising the following steps:

step 1, collecting patient treatment record data, wherein the patient treatment record data comprises pathology report data, image report data, ultrasonic report data and a patient unique identifier corresponding to a patient, and constructing a basic information data warehouse according to the collected patient treatment record data of different patients.

Step 2, according to the unique patient identification, associating all patient treatment record data in the basic information data warehouse with the patient, and constructing an ultrasonic patient information table associated with each patient;

step 3, aiming at the problem of unbalance of the samples in the ultrasonic patient information table obtained in the step 2, processing the data in an oversampling mode so as to achieve balance of positive and negative sample sizes; then dividing the samples in the ultrasonic patient information table into follow-up samples and non-follow-up samples according to the follow-up information, and marking the follow-up samples and the non-follow-up samples with different values respectively; finally, sampling from the population, and then merging to obtain a training sample;

step 4, performing word segmentation processing on the pathology report and the ultrasonic report in the training sample, and screening out some irrelevant word segmentation results;

step 5, converting text Word segmentation results into feature vector matrixes by using a TF-IDF method and a Word2Vec method respectively to describe the document;

step 6, selecting the feature vector constructed in the step 5 through chi-square test, and selecting useful information to perform machine learning modeling;

step 7, selecting XGBoost, lightgbm and CNN three models for two-class modeling, predicting to obtain a probability value of a sample for a follow-up patient, selecting a training feature matrix from TF-IDF and Word2Vec according to model effect comparison, and selecting one model from XGBoost, lightgbm and CNN as a model finally used for prediction;

and 8, setting a threshold value, adding samples with predicted probability values larger than or equal to the set threshold value into a follow-up patient list, wherein samples with predicted probability values smaller than the set threshold value are non-follow-up patients, calculating model evaluation indexes according to model classification results obtained in the step 7, and selecting an optimal model according to the model indexes.

Preferably, the step 2 includes the steps of:

step 201, invalid data in patient treatment record data in a basic information data warehouse are removed;

step 202, combining an ultrasonic field and an ultrasonic field which belong to the same examination in the patient treatment record data into an ultrasonic report, and simultaneously combining a pathological field and a pathological field into a pathological report;

step 203, performing many-to-many matching on the ultrasonic report and the pathology report in each patient treatment record data, so as to split the patient treatment record data into a plurality of new data records, and after performing many-to-many matching, constructing a new data set by each patient treatment record comprising one ultrasonic report and one pathology report of the same patient;

step 204, extracting patient characteristic information in text data from the data in the new data set obtained in step 203 through a regular expression, and converting the patient characteristic information into numerical data; then filling the missing value and processing the abnormal value; and finally, eliminating irrelevant indexes, deleting indexes with the missing value proportion being larger than a certain value, and normalizing the data to obtain an ultrasonic patient information table.

Preferably, in step 201, the invalid data is patient visit record data with ultrasound report but without pathology report.

Preferably, in step 4, the word segmentation tool employs JIEBA word segmentation.

Preferably, in step 5, a TF-IDF algorithm is used to construct a word feature vector matrix, and the TF-IDF matrix is trained on the segmentation results of the pathology report and the ultrasound report in the labeled sample, including the following contents:

for each word in each document set, the weight value K (t, D) in the document is calculated by using TF-IDF algorithm _i ) Weight value K (t, D _i ) Representing word t in document D _i Weight in (i=1, 2, …, M), total number of M training documents. The TF-IDF algorithm takes into account the probability TF that a word t appears in a single document and the weight IDF of the word t in the whole set of documents. The weight idf of the word t is calculated as: idf (t) =log (M/n) _t +0.01), where n _t Training forThe number of documents in which word t appears in the document set is exercised. The computational formula of the TF-IDF algorithm is:

wherein tf (t, D _i ) For word t in document D _i The denominator of the word frequency of the Chinese word is a normalization factor.

Preferably, in step 5, word vector training is performed by using a CBOW neural network framework in a Word2Vec deep learning model, and a feature vector matrix is constructed, wherein:

the CBOW neural network is a three-layer neural network, which is obtained by inputting the current word w _t C words of the context to output a word w for the current word _t Is expressed in mathematical terms:

wherein w is _t For a word in the dictionary D, i.e. w is predicted by a window T adjacent to the word _t Probability of occurrence; p (w) _i |Context _i ) Representing the current word w _i Probability of c words before and after occurrence;

the output layer of the CBOW neural network takes N words appearing in the dictionary D as leaf nodes, takes the number of times of word appearing in the corpus as weight value to construct a binary tree, and uses a random gradient rising algorithm to project a layer vector X _w Is predicted so thatMaximizing, and finally obtaining N-dimensional word vectors w corresponding to each word segmentation through model training.

Preferably, in step 6, feature screening is performed by chi-square test, by analyzing the deviation result of the actual value and the calculated theoretical value, if the obtained deviation result is smaller than a preset threshold value, it can be judged that the two variables are independent, if the obtained deviation result is larger than the preset threshold valueThreshold, then consider the correlation between the two variables, and calculate chi-square value for each feature based on this ² And (t, c), sorting the chi-square values from large to small, and selecting the characteristics larger than the threshold value. The formula of the chi-square test method is as follows:

wherein t is a feature, c is a category,representing the characteristic value e obtained by calculation _t Class e _c Theoretical value of->Representing a characteristic value e _t Class e _c Is a real value of (c).

Preferably, in step 8, the model evaluation index includes precision P, recall F, F1, area under curve AUC, accuracy ACC, specificity TNR and sensitivity TPR, and the calculation formula is as follows:

TP is a real example and represents the sample quantity predicted to be positive by the model in the positive sample; FP is a false positive example, representing the amount of samples in the negative samples that are predicted by the model to be positive; TN is true negative, representing the number of samples in the negative samples that are predicted negative by the model; FN is a false negative example, representing the number of samples in the positive samples that are predicted negative by the model.

Due to the rapid development of deep learning technology, natural language processing technology and deep learning technology are used as important means for analyzing medical texts, and are effective ways for replacing manual text screening. According to the invention, the text content is segmented by a JIEBA segmentation tool, word vectors are respectively constructed by adopting a TF-IDF method and a Word2Vec algorithm, and a chi-square test method is further utilized to select feature vectors. The classification model selects XGBoost, lightgbm and CNN to perform training modeling on the characteristic data, so that automatic screening of an ultrasonic examination follow-up list is realized.

The invention has the following advantages: firstly, based on mass data of medical institutions, automatic mining of follow-up patient information is achieved; then, quantitatively analyzing the high-dimensional index by means of a machine learning method, so that a more accurate ultrasonic follow-up patient can be rapidly and accurately excavated; finally, by using the method, a follow-up screening system aiming at ultrasonic data can be established, and the method is easy to popularize in different medical institutions.

Drawings

FIG. 1 is a flow chart of a machine learning based ultrasound follow-up patient screening method provided by the invention;

fig. 2 is a method for matching an ultrasound report with a pathology report in a visit record, where a and b are natural numbers.

Detailed Description

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.

The invention provides an ultrasonic follow-up patient screening method based on machine learning, which aims to solve the problems that a large amount of acquired data is needed and ultrasonic follow-up patients are screened manually in the prior art, and achieves the aim of assisting clinical scientific research, and specifically comprises the following steps:

and step 1, collecting patient treatment record data including pathology report data, image report data and ultrasonic report data. In this embodiment, the patient's visit record data may be collected from the electronic health record, and may include basic information data, physical examination data, admission record data, discharge record data, medical records diagnosis data, operation information data, medical history and genetic history data in addition to pathology report data, image report data and ultrasound report data. And constructing a basic information data warehouse according to the acquired patient treatment record data of different patients.

Step 2, according to patient unique identifiers corresponding to different patients, such as patient case numbers, index serial numbers, and the like, associating all patient treatment record data in a basic information data warehouse with the patients, and constructing an ultrasonic patient information table associated with each patient, comprising the following steps:

step 201, invalid data in patient treatment record data in a basic information data warehouse is removed. In this embodiment, the invalid data is patient visit record data with ultrasound reports but without pathology reports.

The field names and meanings of the patient visit record data are shown in table 1 below:

table 1 field names and meanings of patient visit records (raw data set)

It is possible that the same patient may be subjected to multiple ultrasound and pathology examinations, so that each visit record may contain multiple, various exam reports for one patient, resulting in multiple visit records for the same patient in the ultrasound patient information table. To facilitate subsequent text processing, the method further comprises the following steps:

step 202, merging the ultrasonic field and the ultrasonic field which belong to the same examination in the patient treatment record data into an ultrasonic report, and merging the pathological field and the pathological field into a pathological report.

In step 203, since the original patient record data indicates multiple pathological examination and multiple ultrasonic report information of a patient, the ultrasonic report and the pathological report in each piece of the original patient record data are subjected to many-to-many matching, so that the patient record data are split into a plurality of new data records, and the specific matching method is shown in fig. 2.

Assuming that after step 202, a patient visit record includes a ultrasound reports and b pathology reports, the ultrasound report and pathology report in the current patient visit record can be split into a×b new data by many-to-many matching, a and b are natural numbers, and a new data set is constructed.

After many-to-many matching, each patient visit record contains one ultrasound report and one pathology report for the same patient. The data forms of the original data set and the new data set formed by steps 202 and 203 are shown in tables 2 and 3 below:

table 2 original patient visit record form

Table 3 new data set format

Step 204, extracting patient characteristic information (such as indexes of affected parts and the like) in the text data from the data in the new data set obtained in step 203 through a regular expression, and converting the patient characteristic information into numerical data. Then, the missing value is filled with "-1", and the outlier is processed by means of sample deletion. And finally, eliminating irrelevant indexes (such as patient sources and the like), deleting indexes with the missing value ratio larger than 0.5, and normalizing the data to obtain an ultrasonic patient information table.

And step 3, aiming at the problem of unbalance of the samples in the ultrasonic patient information table obtained in the step 2, processing the data in an oversampling mode, thereby achieving the purpose of balancing the positive and negative sample sizes. In this embodiment, the data is randomly extracted from the minority samples with a place of substitution, so as to increase the number of the minority samples, and thus, the proportion of the positive and negative samples is balanced.

The samples in the ultrasound patient information table are then divided into follow-up samples and non-follow-up samples based on the follow-up information and marked with 1 and 0, respectively. Setting the follow-up sample number and the non-follow-up sample number to be 2000, sampling from the population by adopting a system sampling method, and then merging.

And 4, because the human body parts in the ultrasonic report and the pathology report are important characteristic information, the traditional method for selecting and visiting the patients is to calculate the text similarity of the ultrasonic report and the pathology report according to the key words of the human body parts, and select the patients within a certain threshold range to enter a follow-up list. The specific method is that all human body part keywords are extracted from training samples, a TF-IDF method is utilized to train to obtain word vectors, and then cosine theorem is utilized to calculate the similarity of two sections of texts. Under each similarity level, a follow-up patient list can be selected, and each index of the model is calculated. When the text similarity is 0.1, each performance index of the model reaches the optimal.

In order to improve the model effect, the invention firstly uses 17765 patients' records obtained in the step 3 as training samples to train word vectors, firstly carries out JIEBA word segmentation processing on 30586 pathological reports and 50069 ultrasonic reports of 17765 patients, and the word segmentation results in a word size of 4030 and the total number of the contained words of 2001603. And then, training a Word2Vec model on the text obtained after Word segmentation to obtain a Word vector model with the dimension of 200. And then performing JIEBA word segmentation processing on the pathology report and the ultrasonic report of the marked sample. The important characteristics of the model obtained according to the service experience are human body parts, so that when the JIEBA is used for word segmentation, the JIEBA word segmentation result is adjusted from the angles of word length, part of speech, word frequency and the like, extraneous information is best filtered out, some prepositions, adjectives, numbers, letters, punctuation marks and the like are deleted, and a total of 1984 words are obtained

And 5, after the word segmentation result is obtained in the step 4, converting the text word segmentation result into a feature vector matrix for subsequent modeling analysis. The invention adopts two methods to train word vectors and compare model effects.

The first method is as follows: and constructing a word feature vector matrix by adopting a TF-IDF algorithm, and training the word segmentation results of the pathological report and the ultrasonic report in the marked sample into a TF-IDF matrix, wherein the matrix size is 62604x1984.

For each word in each document set, the weight value K (t, D) in the document is calculated by using TF-IDF algorithm _i ) Weight value K (t, D _i ) Representing word t in document D _i Weight in (i=1, 2, …, M), total number of M training documents. The TF-IDF algorithm takes into account the probability TF that a word t appears in a single document and the weight IDF of the word t in the whole set of documents. The weight idf of the word t is calculated as: idf (t) =log (M/n) _t +0.01), where n _t The number of documents in which word t appears in the training document set. The computational formula of the TF-IDF algorithm is:

Besides constructing a Word feature vector matrix by adopting a TF-IDF algorithm, a Word2Vec deep learning model can be used for Word vector training by using a CBOW neural network framework to construct a feature vector matrix. Thus, the second method is: and vector representation is carried out on the pathological report and the Word segmentation result of the ultrasonic report in the marked sample, 200-dimensional Word vectors of each Word are respectively extracted aiming at the Word segmentation result in each record, then the Word vectors are added and averaged to obtain 200-dimensional vector representation of each record, and the final feature matrix size is 62604x200.

CBOW is a three-layer neural network characterized by the input of the current word w _t C words of the context to output a word w for the current word _t The mathematical expression of (a) is:

wherein w is _t For a word in the dictionary D, i.e. w is predicted by a window T adjacent to the word _t Probability of occurrence; p (w) _i |Context _i ) Representing the current word w _i Probability of occurrence of c words before and after.

The output layer of CBOW uses N words that appear in dictionary D as leaf nodes and uses the number of times that the word appears in the corpus as weights to construct a binary tree. Projection layer vector X by random gradient ascent algorithm _w Is predicted so thatMaximization. Finally, through model training, an N-dimensional word vector w corresponding to each word segmentation is obtained, wherein w= (v) ₁ ,v ₂ ,...,v _N )。

According to the invention, the characteristic vector matrix is respectively constructed by using the TF-IDF method and the Word2Vec method to describe the document.

And 6, selecting the feature vector constructed in the step 5 through chi-square test, setting a threshold value of 0.1, and reserving the feature as an important feature when the p value of the result parameter of the chi-square test of the feature is smaller than the threshold value of 0.1, otherwise, rejecting the feature, thereby realizing feature selection. Class labels are labeled 'yes' with '1' and 'no' with '0'. Useful information is selected for machine learning modeling.

And (3) carrying out feature screening by using chi-square test, analyzing the deviation result of the actual value and the calculated theoretical value, judging that the two variables are independent if the obtained deviation result is smaller than a preset threshold value, and considering that the two variables are related if the obtained deviation result is larger than the preset threshold value. On the basis of the above-mentioned, the chi-square value of each characteristic is calculated ² And (t, c), sorting the chi-square values from large to small, and selecting the characteristics larger than the threshold value. The formula of the chi-square test method is as follows:

And 7, selecting XGBoost, lightgbm and CNN three models for two-class modeling, and predicting to obtain a probability value of the sample as a follow-up patient.

Model training effects are obtained for different feature combination modes, and evaluation indexes are compared as shown in the following table 4.

TABLE 4 model effect comparison of different feature projects

As can be seen from the comparison of model effects of the table, better effects can be obtained by combining a TF-IDF training feature matrix with a machine learning algorithm, so that TF-IDF features are selected as model training features.

While XGBoost is slightly higher than Lightgbm in terms of model accuracy from the performance metrics of the model in the table. XGBoost and Lightgbm are both Tree Boosting tools with high speed and good effect, and are suitable for large-scale data calculation. The model tuning mode is more mature due to longer release time of XGBoost, and the model tuning mode is better than Lightgbm in accuracy. The Lightgbm is fast and efficient, but the model tuning mode is still to be perfected because of the short release time. Therefore, XGBoost is selected as the final model for prediction, taking into account the accuracy and speed of the model.

And 8, follow-up patient screening. And setting a threshold value, adding samples with predicted probability values larger than or equal to the set threshold value into a follow-up patient list, and taking samples with predicted probability values smaller than the set threshold value as non-follow-up patients. And calculating a model evaluation index according to the model classification result, and selecting an optimal model according to the model index.

In step 8, the classification result is comprehensively evaluated through various indexes, and the optimal classification model is selected. The evaluation indexes comprise precision P, recall F, F1 measurement, area under curve AUC, accuracy ACC, specificity TNR and sensitivity TPR, and the calculation formulas are as follows:

In this example, screening predictions were made for patient data at month 2 of 2018. 1808 pieces of data of 310 patients are obtained after pretreatment, wherein 840 ultrasonic reports and 468 pathological reports are manually marked by 3 experts, 351 pieces of data marked as '1' (meeting the follow-up requirement) and 1457 pieces of data marked as '0' (not meeting the follow-up requirement) are obtained. And adding samples with the predicted probability value larger than or equal to the set threshold value into a follow-up patient list according to the threshold value screening, wherein the samples with the predicted probability value smaller than the set threshold value are non-follow-up patients. According to the model classification result, the number of patients to be visited is 143 after model prediction, 318 pieces of data with model prediction of '1' and 1490 pieces of data with model prediction of '0' are obtained after model prediction, the whole data marking accuracy rate reaches 93%, and the effect is approved by experts, so that the feasibility and the effectiveness of the model are proved.

Claims

1. The ultrasonic follow-up patient screening method based on machine learning is characterized by comprising the following steps of:

step 1, collecting patient treatment record data, wherein the patient treatment record data comprises pathology report data, image report data, ultrasonic report data and a patient unique identifier corresponding to a patient, and constructing a basic information data warehouse according to the collected patient treatment record data of different patients;

step 5, converting text Word segmentation results into feature vector matrixes to describe a document by using a TF-IDF method and a Word2Vec method, wherein the TF-IDF algorithm is adopted to construct a Word feature vector matrix, and the Word segmentation results of a pathological report and an ultrasonic report in a marked sample are used for training the TF-IDF matrix, and the method comprises the following steps:

for each word in each document set, the weight value K (t, D) in the document is calculated by using TF-IDF algorithm _i ) Weight value K (t, D _i ) Representing word t in document D _i Weight in (i=1, 2, …, M), total number of M training documents; the TF-IDF algorithm comprehensively considers the probability TF of the word t in a single document and the weight IDF of the word t in the whole document set; the weight idf of the word t is calculated as: idf (t) =log (M/n) _t +0.01), where n _t The number of documents in which word t appears in the training document set; the computational formula of the TF-IDF algorithm is:

wherein tf (t, D _i ) For word t in document D _i The denominator of the word frequency of the Chinese word is a normalization factor;

word vector training is carried out by using a CBOW neural network framework in a Word2Vec deep learning model, and a feature vector matrix is constructed, wherein:

the output layer of the CBOW neural network takes N words appearing in the dictionary D as leaf nodes, takes the number of times of word appearing in the corpus as weight value to construct a binary tree, and uses a random gradient rising algorithm to project a layer vector X _w Is predicted so thatMaximizing, and finally obtaining N-dimensional word vectors w corresponding to each word segmentation through model training;

step 6, selecting the feature vector constructed in the step 5 through chi-square test, selecting useful information for machine learning modeling, wherein the chi-square test is utilized for feature screening, the deviation result of the actual value and the calculated theoretical value is analyzed, if the obtained deviation result is smaller than a preset threshold value, the two variables can be judged to be independent, if the obtained deviation result is larger than the preset threshold value, the two variables are considered to be related, and the chi-square value of each feature is calculated on the basis ² (t, c) sorting the chi-square values from large to small, and selecting the characteristics larger than a threshold value; the formula of the chi-square test method is as follows:

wherein t is a feature, c is a category,representing the characteristic value e obtained by calculation _t Class e _c Theoretical value of->Representing a characteristic value e _t Class e _c Is the actual value of (2);

2. The machine learning based ultrasound follow-up patient screening method of claim 1, wherein step 2 comprises the steps of:

3. A machine learning based ultrasound follow-up patient screening method as claimed in claim 2 wherein in step 201, the invalid data is patient visit record data with ultrasound report but without pathology report.

4. The machine learning based ultrasound follow-up patient screening method of claim 1, wherein in step 4, the segmentation tool employs JIEBA segmentation.

5. The machine learning based ultrasound follow-up patient screening method of claim 1, wherein in step 8, the model evaluation index comprises a precision P, a recall F, F metric, an area under curve AUC, an accuracy ACC, a specificity TNR and a sensitivity TPR, and the calculation formula is as follows: