CN116779172A - Lung cancer disease burden risk early warning method based on ensemble learning - Google Patents
Lung cancer disease burden risk early warning method based on ensemble learning Download PDFInfo
- Publication number
- CN116779172A CN116779172A CN202310786560.3A CN202310786560A CN116779172A CN 116779172 A CN116779172 A CN 116779172A CN 202310786560 A CN202310786560 A CN 202310786560A CN 116779172 A CN116779172 A CN 116779172A
- Authority
- CN
- China
- Prior art keywords
- model
- data
- prediction
- lung cancer
- disease burden
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 72
- 206010058467 Lung neoplasm malignant Diseases 0.000 title claims abstract description 49
- 201000005202 lung cancer Diseases 0.000 title claims abstract description 49
- 208000020816 lung neoplasm Diseases 0.000 title claims abstract description 49
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000000694 effects Effects 0.000 claims abstract description 35
- 238000012216 screening Methods 0.000 claims abstract description 21
- 238000004140 cleaning Methods 0.000 claims abstract description 11
- 201000010099 disease Diseases 0.000 claims description 28
- 238000004422 calculation algorithm Methods 0.000 claims description 26
- 238000012549 training Methods 0.000 claims description 25
- 238000012360 testing method Methods 0.000 claims description 23
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000012795 verification Methods 0.000 claims description 13
- YHXISWVBGDMDLQ-UHFFFAOYSA-N moclobemide Chemical compound C1=CC(Cl)=CC=C1C(=O)NCCN1CCOCC1 YHXISWVBGDMDLQ-UHFFFAOYSA-N 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 10
- 238000003915 air pollution Methods 0.000 claims description 9
- 238000004458 analytical method Methods 0.000 claims description 8
- 238000011160 research Methods 0.000 claims description 8
- 238000003062 neural network model Methods 0.000 claims description 7
- 238000010219 correlation analysis Methods 0.000 claims description 6
- 230000007613 environmental effect Effects 0.000 claims description 6
- 238000003912 environmental pollution Methods 0.000 claims description 6
- 238000012417 linear regression Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 5
- 238000007619 statistical method Methods 0.000 claims description 5
- 230000000007 visual effect Effects 0.000 claims description 5
- 239000002131 composite material Substances 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 4
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000007405 data analysis Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 3
- 238000013441 quality evaluation Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000007418 data mining Methods 0.000 claims description 2
- 230000010354 integration Effects 0.000 abstract 1
- 230000001932 seasonal effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- MWUXSHHQAYIFBG-UHFFFAOYSA-N nitrogen oxide Inorganic materials O=[N] MWUXSHHQAYIFBG-UHFFFAOYSA-N 0.000 description 6
- 238000005311 autocorrelation function Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- RAHZWNYVWXNFOC-UHFFFAOYSA-N Sulphur dioxide Chemical compound O=S=O RAHZWNYVWXNFOC-UHFFFAOYSA-N 0.000 description 4
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000005096 rolling process Methods 0.000 description 4
- 101001095088 Homo sapiens Melanoma antigen preferentially expressed in tumors Proteins 0.000 description 3
- 102100037020 Melanoma antigen preferentially expressed in tumors Human genes 0.000 description 3
- 206010028980 Neoplasm Diseases 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- XKMRRTOUMJRJIA-UHFFFAOYSA-N ammonia nh3 Chemical compound N.N XKMRRTOUMJRJIA-UHFFFAOYSA-N 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000007789 gas Substances 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 description 2
- 238000013112 stability test Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- UGFAIRIUMAVXCW-UHFFFAOYSA-N Carbon monoxide Chemical compound [O+]#[C-] UGFAIRIUMAVXCW-UHFFFAOYSA-N 0.000 description 1
- CBENFWSGALASAD-UHFFFAOYSA-N Ozone Chemical compound [O-][O+]=O CBENFWSGALASAD-UHFFFAOYSA-N 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 239000000809 air pollutant Substances 0.000 description 1
- 231100001243 air pollutant Toxicity 0.000 description 1
- QGZKDVFQNNGYKY-UHFFFAOYSA-N ammonia Natural products N QGZKDVFQNNGYKY-UHFFFAOYSA-N 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 239000003738 black carbon Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 229910052799 carbon Inorganic materials 0.000 description 1
- 229910002091 carbon monoxide Inorganic materials 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 239000000356 contaminant Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013501 data transformation Methods 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 239000002375 environmental carcinogen Substances 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000556 factor analysis Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000008821 health effect Effects 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 229910000069 nitrogen hydride Inorganic materials 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 239000013618 particulate matter Substances 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 239000011164 primary particle Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000005316 response function Methods 0.000 description 1
- 238000013058 risk prediction model Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 239000002910 solid waste Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
- 239000012855 volatile organic compound Substances 0.000 description 1
- 239000002351 wastewater Substances 0.000 description 1
- 238000003911 water pollution Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Medical Informatics (AREA)
- Mathematical Analysis (AREA)
- Public Health (AREA)
- Pure & Applied Mathematics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Quality & Reliability (AREA)
- Fuzzy Systems (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Algebra (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a lung cancer disease burden risk early warning method based on ensemble learning, which belongs to the technical field of big data, and comprises the steps of integrating and cleaning data, screening prediction indexes, reducing dimension, measuring hysteresis effect, establishing a prediction model pool, verifying and optimizing models, evaluating the prediction effect of the models, and carrying out stacking integration combination on a plurality of models to solve the technical problem of providing more accurate reference data for predicting lung cancer disease burden.
Description
Technical Field
The invention belongs to the technical field of big data, and particularly relates to a lung cancer disease burden risk early warning method based on ensemble learning.
Background
There are studies suggesting that most cancers are attributable to environmental factors rather than genetic factors, which are diseases caused by prolonged exposure to low doses of environmental carcinogens.
Numerous studies have demonstrated a significant relationship between air pollution and tumors, but the contaminants studied are limited to PM2.5, PM10, SO2, etc., and are less related to NH3, OC, BC, CO, NOx, NMVOC, etc. Meanwhile, a prediction model which is integrated with multidimensional characteristics such as environment, air pollution, economy, weather and the like is lacking;
considering that the influence of environmental economy and other factors has different hysteresis effects, the hysteresis analysis of the prediction index can greatly extend the external prediction window length of the model, and the current model lacks the consideration of the hysteresis effect;
there is no research in the prior art directed to the analysis of the burden-related relationship between air pollutants and lung cancer diseases in a longer time series.
ARIMA is a traditional multivariate time series data model, has relatively high requirements on data, needs a long continuous time series, has poor model reliability if the series is too short, and is relatively complex in model identification and calculation. Current common methods fail to meet the increasing medical big data demands. The different methods are applicable to different data, and the disease burden prediction method suitable for various data distribution, integrated deep learning, machine learning, statistical regression models and other models is proposed, so that time series data with high latitude and different time fine granularity can be processed, and the prediction precision is improved.
Disclosure of Invention
The invention aims to provide a lung cancer disease burden risk early warning method based on ensemble learning, which solves the technical problem of providing more accurate reference data for predicting lung cancer disease burden.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a lung cancer disease burden risk early warning method based on ensemble learning comprises the following steps:
step 1: establishing a database server, wherein the database server acquires disease burden data, meteorological data, air pollution data, regional economic data and time characteristic data through the Internet, integrates and cleans the data to construct a lung cancer disease burden characteristic database, and visually displays database data through a chart to display time sequence characteristics of diseases and characteristics;
step 2: establishing a model server, acquiring data integrated and cleaned in a database server by the model server, performing reduction and screening of prediction indexes through information entropy and main components, and analyzing and measuring and calculating hysteresis effects of the prediction indexes on the burden of lung cancer diseases through gray correlation;
respectively constructing a prediction model pool on a training sequence, wherein the prediction model pool comprises a GAM model, an LSTM model, a GM (1, N) model, an ARIMA model, an XGBoost algorithm model, an RFR algorithm model, a BP neural network model and an AdaBoost algorithm model, verifying each model in the prediction model pool, optimizing each model parameter, updating and iterating each model, evaluating the prediction performance of each model on a test set, and sequencing each model according to the prediction performance;
step 3: an integrated model server is established, and 4 models with the predictive performance arranged at the front 4 are selected from a predictive model pool by the integrated model server to be used as a first layer of base learning device of Stacking integrated learning; fitting is carried out on the verification set and the prediction set by each predictor respectively to form a new training set and a new testing set which are used as the input of the meta learner of the Stacking second layer; taking a linear regression model and a ridge regression model in the model as candidate element learners, and preferentially obtaining a final integrated model through predicting performance evaluation; providing relevant reference data for the prediction of the s-step future period based on the hysteresis effect index;
step 4: and (3) the integrated model server performs visual display on the result obtained in the step (3).
Preferably, when the step 1 is executed, the data is integrated and cleaned, specifically, the abnormal data, the missing data, the repeated data and the inconsistent data are cleaned.
Preferably, in executing step 1, the missing data is filled up by adopting a mathematical statistical method such as a mean value method, a regression method or a multiple filling method, a variable with the missing proportion exceeding 10% is removed, and standard data is obtained after integrating and cleaning the data through the steps of data analysis, definition of a cleaning strategy, data inspection, data cleaning execution, data quality evaluation and clean data backflow.
Preferably, when the step 1 is executed, the visual display of database data through a chart specifically comprises collecting as much data as possible, after data mining and cleaning, arranging the data from different sources into primary indexes such as disease burden, weather, air pollution, economy and other environmental data, constructing a lung cancer disease burden risk early warning primary database, carrying out descriptive statistical analysis on the environmental pollution, weather characteristic and economic characteristic distribution of the region through means, standard deviation, extremum and quartile, and calculating the annual average composite growth rate of the disease burden.
Preferably, when executing step 2, the screening of the prediction index specifically includes the following steps:
step 2-1: acquiring initial indexes based on importance screening through subjective expert interviews and literature theory collection;
step 2-2: screening initial indexes based on information entropy, calculating the comparison information entropy of different initial indexes and lung cancer disease burden, eliminating indexes with lower relevance to the disease burden from the initial indexes, and eliminating redundant indexes with higher relevance;
step 2-3: screening important indexes or extracting main components as new indexes based on main component analysis, and specifically comprises the following steps:
step 2-3-1: construction of an index matrixWherein x is np The p index value of the nth sample is represented, and n and p respectively represent the row number and the column number of the index in the matrix;
step 2-3-1: performing standardized transformation on the matrix X to obtain Z;
step 2-3-2: calculating a correlation coefficient matrix of the standardized matrix ZWherein m represents the number of samples, and T represents the matrix transposition;
step 2-3-3: calculating eigenvalue lambda of correlation coefficient matrix R j And corresponding orthogonalization unit feature vector a j ;
Obtaining a principal component score F i =a 1i x 1 +a 2i x 2 +…+a pi x p The method comprises the steps of carrying out a first treatment on the surface of the Wherein i is the number of the main component, and p is the total index number;
step 2-3-4: calculating factor load, index x j In the main component F i The load on isReflecting the principal component F i And index x j The degree of correlation between the two variables represents the importance of each variable in the main component and the contribution of each variable to the result, and the degree of correlation can be calculated by |l| #F i ,x j ) Screening out important indexes, wherein j is an index number, and i is a main component number;
step 2-3-5: when the index is excessive, k main components are selected as new indexes, the k value is determined by the information contribution rate of the main components reaching 80%,
preferably, in the step 2, hysteresis effects of the influence of the predictive indicators on the burden of the lung cancer disease are measured and calculated through gray correlation analysis, specifically, gray correlation analysis is carried out by quantitatively comparing the geometric shapes of the research variable sequence and the related factor sequence to judge the correlation degree of the related factors and the research variable, and the influence degree and the hysteresis effects of the predictive indicators on the morbidity and mortality of the lung cancer are analyzed through Dunn correlation.
Preferably, when executing step 3, the method specifically comprises the following steps:
step 3-1: the first layer of the modeling integrated model comprises a GAM model, an LSTM model, a GM (1, N) model, an ARIMA model, an XGBoost algorithm model, an RFR algorithm model, a BP neural network model and an AdaBoost algorithm model to form a prediction model pool, and 4 regression algorithm models with the prediction performance arranged at the front 4 are selected from the prediction model pool to serve as a modeling first layer;
step 3-2: fitting each predictor which is optimized by parameters in a verification set and a prediction set respectively, combining prediction results of the verification set to form a new training set, and forming a new test set by the prediction results of the test set through weighted average, wherein the new test set is used as input of a Stacking second layer;
step 3-3: introducing a meta learner into a second layer of the modeling integrated model, respectively carrying out regression training on the prediction result of the previous layer as a training set and a testing set, taking a linear regression model and a ridge regression model as the meta learner, and obtaining a final meta learner preferentially through prediction effect evaluation;
step 3-4: based on the hysteresis effect index, relevant reference data is provided for the prediction of the s-step future period.
The lung cancer disease burden risk early warning method based on ensemble learning solves the technical problem of providing more accurate reference data for predicting lung cancer disease burden, the method fuses multi-source data to provide more comprehensive information, fully and comprehensively utilizes various prediction model information, combines a plurality of model results to generate a strong predictor, fully utilizes the advantages of different models, reduces uncertainty and deviation of a single model, improves prediction accuracy and stability, and can provide more accurate prediction reference data than a single prediction model. The model of the present invention has different features and capabilities in processing time series data. By combining the models, the data with various characteristics can be processed, and different data characteristics and trends are considered more comprehensively, so that the accuracy of the data is improved, and the time sequence relation between different indexes and disease burden can be captured by analyzing the hysteresis effect of different prediction indexes. By considering the hysteresis effect, the prediction model can be established more accurately, and the accuracy of prediction is improved. The method can provide prediction reference data within a longer time range by using hysteresis effect, adopts a rolling window technology to realize cross verification of time series data, performs training of a single prediction model and a meta learning model, and can help model parameter estimation.
Drawings
FIG. 1 is a diagram of a data architecture of the present invention;
FIG. 2 is a schematic diagram of a data cleansing flow according to the present invention;
FIG. 3 is a flow chart of index screening of the present invention;
FIG. 4 is a schematic diagram of an LSTM network architecture of the present invention;
fig. 5 is a schematic view of the Stacking structure of the present invention.
Detailed Description
The lung cancer disease burden risk early warning method based on ensemble learning shown in fig. 1-5 comprises the following steps:
step 1: establishing a database server, wherein the database server acquires disease burden data, meteorological data, air pollution data, regional economic data and time characteristic data through the Internet, integrates and cleans the data to construct a lung cancer disease burden characteristic database, and visually displays database data through a chart to display time sequence characteristics of diseases and characteristics;
air pollution: primary particles (particulate matter PM10 and PM2.5, carbonaceous morphology (black carbon BC, organic carbon OC)), acidified gases (nitrogen oxides NOx, sulfur dioxide SO 2), ozone precursor gases (carbon monoxide CO, nitrogen oxides NOx, non-methane volatile organic compounds NMVOC), ammonia NH3, and the like.
Weather factors: average relative humidity, average air temperature, average rainfall, average barometric regional economic level: GDP, personnel income.
Time characteristic data: season, holiday, week data.
Other environmental pollution: water pollution data such as wastewater discharge, chemical oxygen demand, total ammonia nitrogen discharge and the like, and pollution data of the production amount of general industrial solid waste.
Disease burden: including sex, number of lung cancer, morbidity, mortality, DALYs, and DALYs rate.
When the step 1 is executed, the data are integrated and cleaned, specifically, abnormal data, missing data, repeated data and inconsistent data are cleaned, the missing data are filled by adopting a mathematical statistics method such as a mean value method, a regression method or a multiple filling method, variables with the missing proportion exceeding 10% are removed, standard data are obtained after the data are integrated and cleaned through the steps of data analysis, definition of a cleaning strategy, data inspection, execution of data cleaning, data quality evaluation and clean data backflow, the data are subjected to data visualization display through a chart, specifically, the data are collected as much as possible, after the data are mined and cleaned, the data from different sources are arranged into primary indexes such as disease burden, weather, air pollution, economy and other environmental data, a lung cancer disease burden risk early warning primary database is constructed, the regional environmental pollution, weather characteristics and economic characteristic distribution are subjected to descriptive statistical analysis through the mean value, standard deviation, extremum and quartile, the disease burden annual average composite growth rate (Compound Annual Growth Rate) is calculated, and the specific formula of the disease burden annual average composite growth rate is as follows:
wherein y represents a disease burden value, and n represents the years of the disease burden sequence.
Step 2: establishing a model server, acquiring data integrated and cleaned in a database server by the model server, performing reduction and screening of prediction indexes through information entropy and main components, and analyzing and measuring and calculating hysteresis effects of the prediction indexes on the burden of lung cancer diseases through gray correlation;
respectively constructing a prediction model pool on a training sequence, wherein the prediction model pool comprises a GAM model, an LSTM model, a GM (1, N) model, an ARIMA model, an XGBoost algorithm model, an RFR algorithm model, a BP neural network model and an AdaBoost algorithm model, verifying each model in the prediction model pool, optimizing each model parameter, updating and iterating each model, evaluating the prediction performance of each model on a test set, and sequencing each model according to the prediction performance;
the dataset typically contains indicators that are partially unimportant or redundant, severely impacting predictive performance in the model. In addition, redundancy tends to have a large correlation among indexes, which causes multiple collinearity problems in the regression model. It is therefore desirable to select indices that are highly correlated with the burden of lung cancer disease, while not correlating with each other. And removing indexes which are not actually related or redundant with the lung cancer disease burden prediction, wherein the removal of the indexes does not cause information loss, but can realize the effects of shortening the model training time, reducing the overfitting and the like, thereby establishing a real and effective prediction index system and improving the model accuracy.
Forming an initial index system set on the basis of disease burden risk factor analysis, and then forming a final prediction index system by adopting a method combining subjective analysis and objective analysis, wherein the screening of the prediction index specifically comprises the following steps:
step 2-1: acquiring initial indexes based on importance screening through subjective expert interviews and literature theory collection;
step 2-2: screening initial indexes based on information entropy, calculating the comparison information entropy of different initial indexes and lung cancer disease burden, eliminating indexes with lower relevance to the disease burden from the initial indexes, and eliminating redundant indexes with higher relevance;
information gain, g (x, y) =h (x) -H (x|y), is calculated, wherein H (x) is the information entropy of index x, and H (x|y) is the conditional entropy.
The entropy of the comparison information is calculated,which reflects the degree of correlation between the indicators or the degree of correlation between the indicators and the burden of lung cancer disease.
According to the above formula, calculating the correlation degree of the index and lung cancer prognosis, if IR (x i ,y)≤η 1 The index is considered to have low correlation with the burden of lung cancer disease, and is eliminated, wherein eta 1 Representing the information entropy threshold.
Calculating the correlation degree between the indexes after screening according to the above method, if IR (x i ,x j )≥η 2 If the two indexes are considered to have redundancy, eliminating the index with lower degree of relevance to the burden of lung cancer diseases, wherein eta 2 Representing the set information entropy threshold.
Step 2-3: screening important indexes or extracting main components as new indexes based on main component analysis, and specifically comprises the following steps:
step 2-3-1: construction of an index matrixWherein x is np The p index value of the nth sample is represented, and n and p respectively represent the row number and the column number of the index in the matrix;
step 2-3-1: performing standardized transformation on the matrix X to obtain Z;
step 2-3-2: calculating a correlation coefficient matrix of the standardized matrix ZWherein n represents the number of samples, and T represents the matrix transposition;
step 2-3-3: calculating eigenvalue lambda of correlation coefficient matrix R j And corresponding orthogonalization unit feature vector a j ;
Obtaining a principal component score F i =a 1i x 1 +a 2i x 2 +…+a pi x p The method comprises the steps of carrying out a first treatment on the surface of the Wherein p is the total index number, i is the number of the main component, and p is the total index number;
step 2-3-4: calculating factor load, index x j In the main component F i The load on isReflecting the principal component F i And index x j The degree of correlation between the two variables indicates the importance of each variable in the principal component and the contribution to the result by |l (F i ,x j ) Screening out important indexes, wherein j is an index number, and i is a main component number;
step 2-3-5: when the index is excessive, k main components are selected as new indexes, the k value is determined by the information contribution rate of the main components reaching 80%,
the hysteresis effect of the influence of each prediction index on the lung cancer disease burden is calculated through gray correlation analysis, specifically comprises the steps of quantitatively comparing the geometrical similarity or dissimilarity degree of a research variable sequence and a related factor sequence through gray correlation analysis so as to judge the correlation degree of the related factor and the research variable, and analyzing the influence degree and the hysteresis effect of each prediction index on lung cancer morbidity and mortality by adopting Dunn correlation.
In this embodiment, there is a hysteresis effect in calculating the influence of environmental pollution such as air, weather factors, and economic indicators on diseases, and the method specifically includes the following steps:
step S1: reference sequence X based on disease burden 0 =(x 0 (1),…,x 0 (k),…,x 0 (n));
Step S2: respectively takes environmental pollution, weather and other indexes of different lag phases as a comparison sequence X i =(x i (1),…,x i (k),…,x i (n));
Step S3: calculating the association coefficient and association degree of each index in the current period and the burden of the lung cancer diseases, and comparing the ith comparison sequence X i Reference sequence X for disease burden 0 The correlation coefficient at the point k is,the resolution coefficient phi is 0.5;
dunn association degree, i-th comparison sequence X i Reference sequence X for disease burden 0 The Deng's gray correlation degree of (C) is set,
step S4: calculating the association degree of different lag-t sequences and lung cancer disease burden, gamma i (-t);
Step S5: after T years, gamma i (-t) is the largest, giving X i The index hysteresis effect is T;
step S6: cycling until all indicators of hysteresis are obtained.
Step 3: an integrated model server is established, and 4 models with the predictive performance arranged at the front 4 are selected from a predictive model pool by the integrated model server to be used as a first layer of base learning device of Stacking integrated learning; fitting is carried out on the verification set and the prediction set by each predictor respectively to form a new training set and a new testing set which are used as the input of the meta learner of the Stacking second layer; taking a linear regression model and a ridge regression model in the model as candidate element learners, and preferentially obtaining a final integrated model through predicting performance evaluation; providing relevant reference data for the prediction of the s-step future period based on the hysteresis effect index;
the method specifically comprises the following steps:
step 3-1: the first layer of the modeling integrated model comprises a GAM model, an LSTM model, a GM (1, N) model, an ARIMA model, an XGBoost algorithm model, an RFR algorithm model, a BP neural network model and an AdaBoost algorithm model to form a prediction model pool, and 4 regression algorithm models with the prediction performance arranged at the front 4 are selected from the prediction model pool to serve as a modeling first layer;
in this embodiment, a sliding window is used to divide the data into a training sequence, a verification sequence, and a test sequence. GAM, LSTM, GM (1, N), ARIMA models and the like are respectively constructed on the training sequences, and after verification, model parameters are optimized, and iteration is updated, the model is the first layer of the modeling integrated model.
Generalized Addition Model (GAM)
GAM is an extension of the generalized linear model, originally proposed by hasie and Tibshirani, and can evaluate both the linear and nonlinear correlations of environmental factors, time, etc. with health effects. Confounding effects caused by time-dependent variables (e.g., seasonal and long-term trends) can be controlled. The GAM has less requirements on samples and wide applicability, and the expression is as follows:
Y=g(u)+ε;
g(u i )=β 0 +f(x i )+f 2 (x 2 )+…+f i (x i )+…+f m (x m );
wherein f (x) i ) Is about the prediction index x i Is a smooth function of (a). g (u) i ) As a connecting function, because cancer morbidity and mortality are subject to the characteristics of Poisson distribution, a Poisson regression model is adopted to establish a lung cancer disease burden and risk prediction model.
Long-short period memory model (LSTM)
A long-short-term memory model (LSTM) is used as an improved cyclic neural network model (RNN), and in the robustness problem of treating long-term dependency, the problems of gradient disappearance and gradient explosion are solved, so that the LSTM model has more accurate prediction effect in a longer sequence compared with the common RNN.
Each unit has components such as an input door, a forget door, and an output door.
And (3) parameter determination: x is x t Information input indicating time t, c t-1 The network memory state at the time t-1 is h t-1 The information output at time t-1 is also the information input at time t. i.e t 、f t And o t Input gate unit variables, forget gate unit variables, and output gate unit variables, respectively. Sigma represents a Sigmoid activation function; tanh represents a tanh activation function; the ";a cell state update value at time t; w (W) i Representing an input weight; u (U) i Representing the output weight; b i Indicating the deviation.
The forget gate decides the information to be selected for removal from the features stored in the hidden layer from the original output and the new input.
f t =σ(W f ×(h t-1 ,x t )+b f )
The input gate determines new information to store in the module's characteristic information and is used to update the cell state.
i t =σ(W i ×(h t-1 ,x t )+b i )
The output gate outputs state information at the current time and decides the value of the next hidden state.
h t =σ(W o ×(h t-1 ,x t )+b o )×tanh(C t )
Model training: the LSTM model is trained using training data. In the training process, the input sequence is provided to the LSTM model, so that the model and the characteristics of the sequence can be learned. During training, the weights and bias of the model are adjusted using the loss function and optimization algorithm to minimize the difference between the predicted output and the actual output.
GM (1, N) model
The gray prediction model is suitable for processing the problems of small sample size and poor information. The GM (1, N) model is a basic model of a multi-variable gray system modeling method, can perform overall and dynamic analysis on multiple factors, and reflects the dynamic change relation between a research variable sequence and a related factor sequence. The model contains one study variable and N-1 influencing factor variables. GM (1, N) time response function and subtraction reduction are respectively
Wherein, the liquid crystal display device comprises a liquid crystal display device,for the original study variable sequence, +.>The new data series generated by first-order accumulation of the original sequence is characterized in that a is a system development coefficient, and bi is a driving coefficient of each related factor.
ARIMA model
And judging the stability of the sequence according to the time sequence diagram and the stability test of the original data.
If the sequence is a non-stable sequence, the sequence is required to be stabilized through difference or data transformation, and the stability of the sequence after difference is determined through stability test.
Model types are preliminarily identified by an autocorrelation function (autocorrelation function, ACF) diagram and a partial autocorrelation function (partial autocorrelation function, PACF) diagram, and model orders are determined.
Depending on whether the original data sequence has a seasonal trend, the model can be divided into seasonal ARIMA (P, D, Q) S and non-seasonal ARIMA (P, D, Q), where (P, D, Q) and (P, D, Q) are the orders of non-seasonal and seasonal Autoregressions (ARs), differencing and Moving Averages (MA), respectively, and S represents the seasonal period.
The optimal model is filtered according to the red pool information criterion (AIC) and the Bayesian criterion (BIC).
Other models
The embodiment also builds a measurement model based on XGBoost algorithm, RFR algorithm, BP neural network, adaBoost and other algorithms.
In this embodiment, the model parameter tuning uses a hyper-parametric optimization algorithm of grid search and cross-validation evaluation based on a rolling prediction origin, which ensures that sufficient basic predictions are generated for model training through a rolling window technique. And combining the super-parameter intervals to be tested into a multi-dimensional space, dividing the test space into specific grids according to the search step length of each interval, wherein each grid corresponds to a parameter set value, then, each grid corresponds to a model test once to obtain evaluation indexes corresponding to the super-parameter combinations, and selecting super-parameters corresponding to the most optimal evaluation indexes as optimized super-parameters of a prediction model, thereby improving the prediction performance.
The time fine granularity optimization aims at a model with poor prediction effect, and time sequences with different time scales are selected for prediction, wherein the time sequences comprise fine granularity prediction and coarse granularity prediction.
The new data learning comprises a historical time sequence and new existing data supplementation, and new real data is dynamically added for model updating learning.
In this embodiment, the evaluation of the prediction effect of each model specifically includes: testing each prediction model on a test set respectively; and MER, MAPE, MAE, RMSE and other indexes are adopted to evaluate the performance of the prediction model, and the model precision is higher as the index value is smaller.
Average error rate (Modulation error ratio, MER):
MER = mean absolute value of mean error/mean actual value
Average absolute percentage error (Mean Absolute Percentage Error, MAPE), when MAPE is lower than 10% -15%, the prediction accuracy is better.
Mean absolute error (Mean Absolute Error, MAE)
Root mean square error (Root Mean Squared Error, RMSE), mean of the squares of the true and predicted error
And y is i Respectively represented by a fitting value and an actual value,
Step 3-2: fitting each predictor which is optimized by parameters in a verification set and a prediction set respectively, combining prediction results of the verification set to form a new training set, and forming a new test set by the prediction results of the test set through weighted average, wherein the new test set is used as input of a Stacking second layer;
step 3-3: introducing a meta learner into a second layer of the modeling integrated model, respectively carrying out regression training on the prediction result of the previous layer as a training set and a testing set, taking a linear regression model and a ridge regression model as the meta learner, and obtaining a final meta learner preferentially through prediction effect evaluation;
step 3-4: based on the hysteresis effect index, relevant reference data is provided for the prediction of the s-step future period.
Step 4: and (3) the integrated model server performs visual display on the result obtained in the step (3).
The lung cancer disease burden risk early warning method based on ensemble learning solves the technical problem of providing more accurate reference data for predicting lung cancer disease burden, the method fuses multi-source data to provide more comprehensive information, fully and comprehensively utilizes various prediction model information, combines a plurality of model results to generate a strong predictor, fully utilizes the advantages of different models, reduces uncertainty and deviation of a single model, improves prediction accuracy and stability, and can provide more accurate prediction reference data than a single prediction model. The model of the present invention has different features and capabilities in processing time series data. By combining the models, the data with various characteristics can be processed, and different data characteristics and trends are considered more comprehensively, so that the accuracy of the data is improved, and the time sequence relation between different indexes and disease burden can be captured by analyzing the hysteresis effect of different prediction indexes. By considering the hysteresis effect, the prediction model can be established more accurately, and the accuracy of prediction is improved. The method can provide prediction reference data within a longer time range by using hysteresis effect, adopts a rolling window technology to realize cross verification of time series data, performs training of a single prediction model and a meta learning model, and can help model parameter estimation.
Claims (7)
1. A lung cancer disease burden risk early warning method based on ensemble learning is characterized in that: the method comprises the following steps:
step 1: establishing a database server, wherein the database server acquires disease burden data, meteorological data, air pollution data, regional economic data and time characteristic data through the Internet, integrates and cleans the data to construct a lung cancer disease burden characteristic database, and visually displays database data through a chart to display time sequence characteristics of diseases and characteristics;
step 2: establishing a model server, acquiring data integrated and cleaned in a database server by the model server, performing reduction and screening of prediction indexes through information entropy and main components, and analyzing and measuring and calculating hysteresis effects of the prediction indexes on the burden of lung cancer diseases through gray correlation;
respectively constructing a prediction model pool on a training sequence, wherein the prediction model pool comprises a GAM model, an LSTM model, a GM (1, N) model, an ARIMA model, an XGBoost algorithm model, an RFR algorithm model, a BP neural network model and an AdaBoost algorithm model, verifying each model in the prediction model pool, optimizing each model parameter, updating and iterating each model, evaluating the prediction performance of each model on a test set, and sequencing each model according to the prediction performance;
step 3: an integrated model server is established, and 4 models with the predictive performance arranged at the front 4 are selected from a predictive model pool by the integrated model server to be used as a first layer of base learning device of Stacking integrated learning; fitting is carried out on the verification set and the prediction set by each predictor respectively to form a new training set and a new testing set which are used as the input of the meta learner of the Stacking second layer; taking a linear regression model and a ridge regression model in the model as candidate element learners, and preferentially obtaining a final integrated model through predicting performance evaluation; providing relevant reference data for the prediction of the s-step future period based on the hysteresis effect index;
step 4: and (3) the integrated model server performs visual display on the result obtained in the step (3).
2. The lung cancer disease burden risk early warning method based on ensemble learning according to claim 1, wherein: when the step 1 is executed, the data are integrated and cleaned, specifically, the abnormal data, the missing data, the repeated data and the inconsistent data are cleaned.
3. The lung cancer disease burden risk early warning method based on ensemble learning according to claim 2, wherein: and (2) when the step (1) is executed, filling the missing data by adopting a mathematical statistical method such as a mean value method, a regression method or a multiple filling method, removing the variable with the missing proportion exceeding 10%, and integrating and cleaning the data through the steps of data analysis, definition of a cleaning strategy, data inspection, execution of data cleaning, data quality evaluation and clean data backflow to obtain standard data.
4. The lung cancer disease burden risk early warning method based on ensemble learning according to claim 2, wherein: when the step 1 is executed, the visual display of database data is carried out through a chart, wherein the method specifically comprises the steps of collecting data as much as possible, after data mining and cleaning, arranging the data from different sources into primary indexes such as disease burden, weather, air pollution, economy and other environmental data, constructing a lung cancer disease burden risk early warning primary database, carrying out descriptive statistical analysis on environmental pollution, weather characteristic and economic characteristic distribution in the region through means, standard deviation, extremum and quartile, and calculating the annual average composite growth rate of the disease burden.
5. The lung cancer disease burden risk early warning method based on ensemble learning according to claim 1, wherein: when executing the step 2, the screening of the prediction index specifically includes the following steps:
step 2-1: acquiring initial indexes based on importance screening through subjective expert interviews and literature theory collection;
step 2-2: screening initial indexes based on information entropy, calculating the comparison information entropy of different initial indexes and lung cancer disease burden, eliminating indexes with lower relevance to the disease burden from the initial indexes, and eliminating redundant indexes with higher relevance;
step 2-3: screening important indexes or extracting main components as new indexes based on main component analysis, and specifically comprises the following steps:
step 2-3-1: construction of an index matrixWherein x is np The p index value of the nth sample is represented, and n and p respectively represent the row number and the column number of the index in the matrix;
step 2-3-1: performing standardized transformation on the matrix X to obtain Z;
step 2-3-2: calculating a correlation coefficient matrix of the standardized matrix ZWherein m represents the number of samples, and T represents the matrix transposition;
step 2-3-3: calculating eigenvalue lambda of correlation coefficient matrix R j And corresponding orthogonalization unit feature vector a j ;
Obtaining a principal component score F i =a 1i x 1 +a 2i x 2 +…+a pi x p The method comprises the steps of carrying out a first treatment on the surface of the Wherein i is the number of the main component, and p is the total index number;
step 2-3-4: calculating factor load, index x j In the main component F i The load on isReflecting the principal component F i And index x j The degree of correlation between the two variables indicates the importance of each variable in the principal component and the contribution to the result by |l (F i ,x j ) Screening out important indexes, wherein j is an index number, and i is a main component number;
step 2-3-5: when the index is excessive, k main components are selected as new indexes, the k value is determined by the information contribution rate of the main components reaching 80%,
6. the lung cancer disease burden risk early warning method based on ensemble learning according to claim 1, wherein: and (2) when the step (2) is executed, measuring and calculating the hysteresis effect of each prediction index on the burden of the lung cancer disease through gray correlation analysis, wherein the gray correlation analysis specifically comprises the steps of quantitatively comparing the geometrical shapes of a research variable sequence and a related factor sequence to judge the correlation degree of the related factor and the research variable, and analyzing the influence degree and the hysteresis effect of each prediction index on the lung cancer morbidity and mortality through Dunn correlation.
7. The lung cancer disease burden risk early warning method based on ensemble learning according to claim 1, wherein: when executing the step 3, the method specifically comprises the following steps:
step 3-1: the first layer of the modeling integrated model comprises a GAM model, an LSTM model, a GM (1, N) model, an ARIMA model, an XGBoost algorithm model, an RFR algorithm model, a BP neural network model and an AdaBoost algorithm model to form a prediction model pool, and 4 regression algorithm models with the prediction performance arranged at the front 4 are selected from the prediction model pool to serve as a modeling first layer;
step 3-2: fitting each predictor which is optimized by parameters in a verification set and a prediction set respectively, combining prediction results of the verification set to form a new training set, and forming a new test set by the prediction results of the test set through weighted average, wherein the new test set is used as input of a Stacking second layer;
step 3-3: introducing a meta learner into a second layer of the modeling integrated model, respectively carrying out regression training on the prediction result of the previous layer as a training set and a testing set, taking a linear regression model and a ridge regression model as the meta learner, and obtaining a final meta learner preferentially through prediction effect evaluation;
step 3-4: based on the hysteresis effect index, relevant reference data is provided for the prediction of the s-step future period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310786560.3A CN116779172A (en) | 2023-06-30 | 2023-06-30 | Lung cancer disease burden risk early warning method based on ensemble learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310786560.3A CN116779172A (en) | 2023-06-30 | 2023-06-30 | Lung cancer disease burden risk early warning method based on ensemble learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116779172A true CN116779172A (en) | 2023-09-19 |
Family
ID=88007871
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310786560.3A Pending CN116779172A (en) | 2023-06-30 | 2023-06-30 | Lung cancer disease burden risk early warning method based on ensemble learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116779172A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117094184A (en) * | 2023-10-19 | 2023-11-21 | 上海数字治理研究院有限公司 | Modeling method, system and medium of risk prediction model based on intranet platform |
-
2023
- 2023-06-30 CN CN202310786560.3A patent/CN116779172A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117094184A (en) * | 2023-10-19 | 2023-11-21 | 上海数字治理研究院有限公司 | Modeling method, system and medium of risk prediction model based on intranet platform |
CN117094184B (en) * | 2023-10-19 | 2024-01-26 | 上海数字治理研究院有限公司 | Modeling method, system and medium of risk prediction model based on intranet platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113919448B (en) | Method for analyzing influence factors of carbon dioxide concentration prediction at any time-space position | |
Ma et al. | A Lag-FLSTM deep learning network based on Bayesian Optimization for multi-sequential-variant PM2. 5 prediction | |
CN110726694A (en) | Characteristic wavelength selection method and system of spectral variable gradient integrated genetic algorithm | |
CN115495991A (en) | Rainfall interval prediction method based on time convolution network | |
Middya et al. | Pollutant specific optimal deep learning and statistical model building for air quality forecasting | |
CN116779172A (en) | Lung cancer disease burden risk early warning method based on ensemble learning | |
CN110852496A (en) | Natural gas load prediction method based on LSTM recurrent neural network | |
CN115542429A (en) | XGboost-based ozone quality prediction method and system | |
CN114358435A (en) | Pollution source-water quality prediction model weight influence calculation method of two-stage space-time attention mechanism | |
Sun et al. | Spatial-temporal prediction of air quality based on recurrent neural networks | |
CN114372707A (en) | High-cold-wetland degradation degree monitoring method based on remote sensing data | |
CN114595861A (en) | MSTL (modeling, transformation, simulation and maintenance) and LSTM (least Square TM) model-based medium-and-long-term power load prediction method | |
CN114429077A (en) | Time sequence multi-scale analysis method based on quantum migration | |
CN115456245A (en) | Prediction method for dissolved oxygen in tidal river network area | |
CN115879607A (en) | Electric energy meter state prediction method, system, equipment and storage medium | |
CN114862032A (en) | XGboost-LSTM-based power grid load prediction method and device | |
Li et al. | A neural networks based method for multivariate time-series forecasting | |
Sharma et al. | Forecasting and prediction of air pollutants concentrates using machine learning techniques: the case of India | |
Wang et al. | The prediction model for haze pollution based on stacking framework and feature extraction of time series images | |
CN116720080A (en) | Homologous meteorological element fusion inspection method | |
Asaei-Moamam et al. | Air quality particulate-pollution prediction applying GAN network and the Neural Turing Machine | |
CN117217419A (en) | Method and system for monitoring full life cycle carbon emission of industrial production | |
CN115935283B (en) | Drought cause tracing method based on multi-element nonlinear causal analysis | |
CN114638039B (en) | Structural health monitoring characteristic data interpretation method based on low-rank matrix recovery | |
CN115145903A (en) | Data interpolation method based on production process |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |