CN117114184A - Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device - Google Patents

Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device Download PDF

Info

Publication number
CN117114184A
CN117114184A CN202311071418.7A CN202311071418A CN117114184A CN 117114184 A CN117114184 A CN 117114184A CN 202311071418 A CN202311071418 A CN 202311071418A CN 117114184 A CN117114184 A CN 117114184A
Authority
CN
China
Prior art keywords
carbon emission
data
gdp
urban
factors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311071418.7A
Other languages
Chinese (zh)
Inventor
罗帅
边疆
王洋
刘宁
李浩然
韩悦
张毅
高毅
王森
***
班全
李娜
王坤
路菲
甘智勇
艾邓鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Tianjin Electric Power Co Ltd
Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Tianjin Electric Power Co Ltd, Electric Power Research Institute of State Grid Tianjin Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202311071418.7A priority Critical patent/CN117114184A/en
Publication of CN117114184A publication Critical patent/CN117114184A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Educational Administration (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for extracting and predicting the characteristics of influence factors of urban carbon emission in medium and long term, which are based on an economic-energy-carbon emission balance system, select influence factors of urban carbon emission and acquire influence factor historical data; performing data preprocessing on the acquired data to acquire a high-quality data set; based on a high-quality data set, effectively analyzing the action mechanism and contribution degree of influence factors of the urban carbon emission by adopting a generalized Diels index decomposition method, and acquiring key influence factors influencing the urban carbon emission; based on the key influence factors, predicting single variables by adopting an integrated empirical mode decomposition-long-short-term memory network method; based on the obtained influence factor prediction result, an XGBoost network optimized based on a particle swarm algorithm is adopted to construct a medium-long term prediction model of urban carbon emission. The method comprehensively considers the factors such as energy consumption, economic growth and the like, and considers the interaction and influence of multiple factors, thereby realizing accurate carbon emission prediction.

Description

Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device
Technical Field
The invention relates to the field of carbon emission prediction and analysis, in particular to a method and a device for extracting characteristics of influence factors of urban carbon emission and predicting medium and long periods.
Background
The problem of climate warming has attracted considerable attention both to the destruction of the ecological environment and to the threat of human survival development. Global co-coping with the problem of climate warming has become a necessary choice. In order to achieve the aim of carbon emission reduction and improve the energy utilization efficiency and the environmental protection level, the driving factors of carbon emission need to be deeply explored and analyzed, a carbon emission medium-long-term prediction model capable of considering various influencing factors is established, and scientific basis is provided for formulating and implementing carbon emission reduction policies and improving the energy utilization efficiency and the environmental protection level.
Studies have shown that carbon emissions are generally commonly affected by a variety of factors including demographics, economics, technological advances, and industrial structures. In studying carbon emission influencing factors, the mainstream methods include a Logarithmic Mean Di Index (LMDI) decomposition method and an STIRPAT model regression analysis method. The prior literature adopts GDP, energy consumption per person, urban rate and carbon dioxide emission intensity to decompose and predict the carbon emission factors in China, and analyzes the contributions of different influencing factors through an error decomposition method; the gray model and the support vector regression model are adopted to predict the influence of energy consumption, industrial structure and population growth, and a carbon emission reduction policy taking energy structure adjustment and technical progress as the dominant is provided; the method comprises the steps of predicting and analyzing carbon emission in China and influence factors thereof by adopting a time sequence model and a gray scale related model, wherein the influence of energy consumption, population scale, economic growth rate and energy structure on the carbon emission is considered to be most obvious; by adopting grey correlation analysis and time sequence analysis, the relation between the carbon emission and factors such as energy structure, industrial structure, economic growth, technological innovation and the like in China is researched, and the future carbon emission trend in China is predicted.
However, there are some problems and challenges in the decomposition and prediction of carbon emission factors. First, existing carbon emission factorization and prediction models still have drawbacks such as factors that are too simplistic, data loss, insufficient sample capacity, and the like. Second, the factors affecting carbon emissions are very complex, and the interactions and effects of multiple factors need to be considered, so how to effectively extract and analyze the information of each influencing factor still needs to be studied in depth.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a method for extracting the characteristics of the influence factors of urban carbon emission and predicting the medium and long term, comprehensively considering the factors such as energy consumption, economic growth and the like, and considering the interaction and influence of multiple factors, so as to realize accurate carbon emission prediction.
The invention collects the indexes of energy consumption, economic growth rate, power production and carbon emission, and quantifies the influence contribution by the GDIM method so as to select the influence factors with strong correlation; the XGBoost algorithm of particle swarm optimization is used for predicting the future carbon emission trend, so that the time sequence characteristics of the data can be effectively identified and utilized, and the estimation accuracy and reliability of the model are improved.
The specific scheme comprises the following steps:
s1: selecting urban carbon emission influence factors based on an economic-energy-carbon emission balance system, and acquiring influence factor historical data;
s2: performing data preprocessing on the data set acquired in the step S1, and performing bad data identification, data cleaning and data complementation to acquire a high-quality data set;
s3: based on the data provided by S2, effectively analyzing the action mechanism and contribution degree of the influence factors of the urban carbon emission by adopting a generalized Diels index decomposition method (GDIM), and acquiring key influence factors influencing the urban carbon emission;
s4: acquiring key influence factors based on the S3, and adopting an EEMD-LSTM (integrated empirical mode decomposition-long-short-term memory network) method to realize univariate prediction and realize accurate prediction of each influence factor in a medium-long-term state;
s5: based on the influence factor prediction result obtained in the step S4, an XGBoost (PSO-XGBoost) network optimized based on a particle swarm algorithm is adopted to construct a medium-long-term prediction model of urban carbon emission;
in addition, in the step S1, based on the "economical-energy-carbon emission" balance system, the influence factors of urban carbon emission are selected, influence factor history data are obtained, and the influence factors are selected as follows:
the Kaya identity is established, and is mainly used for the problem of decomposition of carbon emission influencing factors, which are decomposed according to the carbon emission amount (C), the total energy consumption amount (E), the total production value (GDP) and the total population (P), expressed as:
the influence factors are defined as:
in S2, the data set acquired in S1 is subjected to data preprocessing, bad data identification, data cleaning and data complement work are carried out, and a high-quality data set is acquired;
the method mainly comprises the steps of supplementing missing industrial product yield month data and removing economic data with obvious data anomalies;
data cleaning is carried out by adopting LOF and GAKNNI methods;
in addition, in S3, the generalized Diels index decomposition method (GDIM) is adopted to effectively analyze the action mechanism and contribution degree of the influence factors of the urban carbon emission, and key influence factors influencing the urban carbon emission are obtained, specifically:
C=X 3 X 4
X 3 X 4 -X 5 X 6 =0
X 3 X 4 -X 1 X 2 =0
X 3 -X 1 X 8 =0
X 1 -X 5 X 7 =0
wherein factor X (x=x 1 ,X 2 ,X 3 ,X 4 ,X 5 ,X 6 ,X 7 ,X 8 ) The contribution to the carbon emission variation is a function C (X),
x1 is the total actual production value GDP in the region, X2 is the carbon emission C/GDP of the production unit GDP, X3 is the carbon emission C/E, X generated by the consumption of energy E, X4 as the consumption of unit energy, the resident population P, X6 is the average carbon emission C/P, X7 is the average actual GDP of the human, and X8 is the energy consumption E/GDP of the production unit GDP
From the GDIM, a jacobian matrix is constructed:
wherein, according to the GDIM principle, the change amount delta C of carbon emission can be decomposed into a form of adding up the contribution of each factor:
wherein L represents a time span; i represents an identity matrix; "+" indicates a generalized inverse matrix; c= (0 0 0X) 4 X 3 0 0 0 0) T The method comprises the steps of carrying out a first treatment on the surface of the Jacobian matrixIs not related to the column vector linearity of (2)
Wherein the change in carbon emissions can be decomposed into a sum of 8 effects: Δx1, Δx2, Δx3, Δx4, Δx5, Δx6, Δx7, Δx8. Wherein Δx1, Δx3 and Δx5 are absolute amounts reflecting the effects of the economic development level, energy consumption and population scale variation on the energy consumption carbon emissions, respectively; the remaining five effects are relative quantitative factors, and represent the carbon emission amount of the production unit GDP, the carbon emission amount generated by consuming unit energy, the carbon emission amount per person, the influence of the energy consumed by the production unit GDP and the production unit GDP on the carbon emission change in sequence.
The EEMD-LSTM method adopted in S4 is specifically as follows:
the invention utilizes EEMD method to realize the acquisition of the integral trend characteristic, season characteristic and random fluctuation characteristic of each influencing factor. Decomposing data by EEMD, respectively establishing long-term and short-term memory networks for the decomposed data, and superposing the prediction results to obtain final prediction values
Using EEMD method, it is necessary to add different white noise to the initial time sequence, and time-series data x of power is used k,t Into a series of eigenmode (Intrinsic Mode Fuction, IMF) components.
In imf k,i Represents the kth user class power utilization time sequence data obtained by using EEMD to decomposei eigenmode components; r is (r) k,n Is a residual component after decomposing the original time series data.
EEMD decomposes time series data by connecting local maximum and minimum values by using cubic spline curve to form upper and lower envelope curves, and taking average value m j-1 Then from the initial sequence x k,t Subtracting the average of these envelopes to obtain the remaining sequence r k,i Thus, when each IMF process is obtained, there are:
r k,i =r k,i-1 -imf k,i
wherein when i=1, r is present k,0 =x k,t
Iterating continuously, for the remaining sequence r k,i Extracting the (i+1) th IMF until the remaining sequence r k,n And can not be decomposed any more, and finally, the average value of the IMFs obtained by adding different white noises is obtained. And finally, obtaining the overall trend characteristics, the seasonal characteristics and the random fluctuation characteristics of each influence factor.
The specific training process of the LSTM model is mainly completed by three gate structures, namely an input gate, a forget gate and an output gate. The input gate updates the information in combination with the current input content and the output of the previous period, and selectively stores the updated portion in the cell state c t Is a kind of medium. The forgetting gate determines the degree to which the cell state is forgotten or updated and determines the information that is forgotten. The output gate calculates the output content at that time, i.e., the response load adjustment amount of the user, based on the updated cell state. The specific process is shown in the attached drawings.
In addition, an XGBoost (PSO-XGBoost) network optimized based on a particle swarm optimization is adopted in S5, and an urban carbon emission influence factor analysis and medium-long-term prediction model is constructed, wherein the PSO-XGBoost algorithm is specifically;
the XGBoost algorithm realizes overall optimal solution by expanding the second-order Taylor of the loss function and introducing a regularization term, so that overall complexity of the model is controlled, and generalization capability of the algorithm is effectively improved. In addition, in order to prevent the over-fitting problem, the algorithm adopts a method of simultaneously selecting and processing the characteristics in parallel, so that the running speed of the algorithm is faster, and the result is more interpretable.
Assume that a data set d= { (x) containing n sample numbers is given i ,y i )∣x i ∈R m ,y i E R, i=1, 2, …, n } consists of m features for a total of n samples, where Rm and R represent m-dimensional real vector data sets and real sets, respectively.
Where fk is a regression tree, K is the total number of regression trees, and F is the regression tree space.
The objective function is expressed as:
wherein: l represents a loss function for measuring the error between the classification predicted value and the true value;classifying the predicted value; y is i Is a true value; omega (f) k ) Is a regular term.
The XGBoost algorithm adopts gradient lifting iterative operation, and a new regression tree is added after each iterative process, so that the t-th iterative operation result is as follows:
the formulas are combined, and the expression of the objective function of the t iteration is calculated as
Performing a second-order taylor expansion, and adding a regularization term Ω (f k ) And preventing the occurrence of the over fitting phenomenon.
Wherein: gamma represents She Zishu penalty coefficient; t is the number of leaf child nodes; omega is the leaf weight; λ is the weight penalty coefficient.
Particle Swarm Optimization (PSO) is a swarm intelligence algorithm built based on mimicking the foraging behavior of a flock of birds. The PSO algorithm defines the solution of the optimization problem as searching particles in a limited dimensional space, each particle consists of a position vector and a speed vector, all particles cooperate and preferentially search to a better position through the self optimal value and the optimal value of a particle group. And each particle calculates an adaptation value through an adaptation function to measure the advantages and disadvantages of the position of the particle, and all the particles in the particle group follow the position of the current optimal particle in the solution space to search.
The method for constructing the medium-long-term prediction model of the urban carbon emission by adopting the XGBoost network optimized based on the particle swarm optimization comprises the following steps: initializing a model; training an XGBoost model; calculating a fitness value of the particle; calculating an individual optimal value and a group optimal value; iteratively updating the optimal solution; if the conditions are met, the optimal parameters are assigned to the XGBoost model, and the XGBoost model is retrained by the optimal parameters; carrying out carbon emission prediction; if the condition is not satisfied, the model initialization is performed again.
Based on XGBoost principle and PSO algorithm theory, PSO is applied to parameter optimization of XGBoost regression prediction, a PSO-XGBoost carbon emission medium-long term regression prediction model is constructed, and the flow is as follows
(1) Carrying out data cleaning, data restoration and standardized pretreatment on carbon emission data and driving factor data obtained based on statistical data:
(2) GDIM analysis is carried out on the pretreated carbon emission data, important driving factors are selected according to the contribution rate, and a data set is formed according to the analysis result:
(3) Dividing a training set and a testing set according to the ratio of 8:2, setting a proper fitness function, initializing individual optimal values and global optimal values of particles, and optimizing leamingrate, max _depth and other parameters:
(4) The particle speed and the position are updated in a generation-by-generation mode, and the individual optimal value and the global optimal value are continuously updated until the generation-by-generation termination condition is reached by calculating the fitness value of the particle speed and the position:
(5) And selecting an optimal parameter value, constructing a parameter-optimized XGBoost regression prediction model, and importing a training set to perform model training learning.
The invention also provides a device for extracting the characteristics of the influence factors of the urban carbon emission and predicting the medium and long periods, which comprises the following steps:
the selection module is used for selecting urban carbon emission influence factors based on an economic-energy-carbon emission balance system and acquiring influence factor historical data;
the data processing module is used for acquiring a high-quality data set through data preprocessing;
the analysis module is used for effectively analyzing the action mechanism and contribution degree of the influence factors of the urban carbon emission by adopting a generalized Diels index decomposition method and acquiring key influence factors influencing the urban carbon emission;
the prediction module predicts the univariate by adopting an integrated empirical mode decomposition-long-short-term memory network method; and constructing a medium-long-term prediction model of urban carbon emission by adopting an XGBoost network optimized based on a particle swarm optimization.
The technical scheme provided by the invention has the beneficial effects that:
(1) Compared with the prior art, the method comprehensively considers the factors such as energy consumption, economic growth and the like, and considers the interaction and influence of multiple factors, so that accurate carbon emission prediction is realized.
(2) Compared with the regression prediction method commonly used in the prior art, the method provided by the invention analyzes the carbon emission trend through the PSO improved XGBoost algorithm and performs medium-long-term prediction of carbon emission. The feasibility and accuracy of the proposed method were verified by practical research data.
Therefore, the urban carbon emission influence factor characteristic extraction and medium-long-term prediction method provided by the invention analyzes important carbon emission driving factors by providing a systematic method, realizes medium-long-term prediction of carbon emission, and establishes an effective strategy to reduce the emissions.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow chart of an EEMD algorithm proposed by the present invention;
FIG. 3 is a diagram of training and prediction processes for an LSTM network;
FIG. 4 is a PSO-XGBoost carbon emission prediction flow chart;
fig. 5 is a diagram of an example of calculation using actual data of Tianjin city as an example by using the method of the present invention.
Detailed Description
The invention is further illustrated by the following examples, which are intended to be illustrative only and not limiting in any way.
Examples
A method for extracting features of influence factors of urban carbon emission and predicting medium and long term, comprising the following steps:
1: analysis of carbon emission influencing factors based on GDMI;
in order to effectively analyze the action mechanism and contribution degree of each driving factor affecting carbon emission, an exponential decomposition method is used for quantitatively measuring and calculating key factors affecting carbon emission and analyzing the influence degree of the key factors. The conventional LMDI method considers only one absolute quantity factor, and has serious interdependence among factors, so that the result accuracy is insufficient, and the GDIM method is adopted to analyze the driving factors. The specific method comprises the following steps:
101: establishing the kaya identity
The Kaya identity is established, and is mainly used for the problem of decomposition of carbon emission influencing factors, which are decomposed according to the carbon emission amount (C), the total energy consumption amount (E), the total production value (GDP) and the total population (P), expressed as:
102: the influence factors are defined as:
influencing factors Meaning of Unit (B)
C Carbon emission of energy consumption Wan Dun
X1=GDP Total actual production of the region (economic development level) Yiyuan
X2=C/GDP Carbon emission (produced carbon intensity) per GDP Ton/ten thousand yuan
X3=E Consumption of energy Ten thousand ton standard coal
X4=C/E Carbon emission (energy consumption carbon intensity) generated by consumption of unit energy Ton/ton standard coal
X5=P Population number (population size) Universal person
X6=C/P Carbon emission per capita Ton/person
X7=GDP/P Human average actual GDP Ten thousand yuan/person
2: data cleaning;
201: outlier detection
Performing abnormal detection on the feature data to be detected by using an LOF algorithm, determining an abnormal point set by calculating an outlier factor value of each data point in the data set, and judging whether all the data to be detected are detected;
based on the historical characteristic data set, complementing the abnormal and missing data by using a GAKNNI algorithm, and judging whether all the data to be complemented are completed or not; the data complement is to calculate K data points of the input sample, which are closest to the training data set in space, by using GAKNNI, and then calculate the similarity or dissimilarity degree of the development trend among the data;
the method for determining the abnormal point set by calculating the outlier factor value of each data point in the data set by using the LOF algorithm to perform abnormal detection on the feature data to be detected comprises the following steps:
firstly, analyzing the average density of data points in a data set, and determining the number M of abnormal points in the data set according to the distribution condition of the density 1 Anomaly set D 1 M is then determined by calculating outlier factors 2 (M 2 =M 1 ) Abnormal points and abnormal set D 2 Taking D 1 And D 2 As the final outlier cluster.
The method for determining the abnormal points of the data set according to the distribution condition of the density comprises the following steps:
calculating the k distance of the data object q to be defined as the distance from the kth point closest to the data object q in the data set to q, and recording the distance as k-distance (q), wherein the distance refers to Euclidean distance, namely straight line distance;
calculating data points in the data set, wherein the distance between the data points and the data object q is not greater than k distance;
calculating p and q to be any two points in a data set;
calculating the local reachable density of q, namely the average local reachable density of q to all points in the neighborhood of q;
calculating the outlier degree of the data;
outliers in the relevant data are determined and deleted based on the outliers of the data.
202: data completion
The method for complementing the abnormal and missing data by utilizing the GAKNNI algorithm based on the historical characteristic data set comprises the following steps:
and (3) finding out K data points of the missing point which are closest to the training data set in space by using KNN, and then analyzing based on gray correlation degree based on K data point similarity weights, wherein the complement value of the missing point is the sum of products of the K data points and the similarity weights.
The method for obtaining the complement value of the missing point comprises the following steps:
establishing time sequence characteristic data of carbon emission and influence factors thereof to obtain a sample matrix T without missing items train Time series t of missing item samples lack
Calculating a sample matrix T without missing items train With missing item sample time series t lack Euclidean distance parameter d;
from T train Middle screening and t lack Nearest K similarity time sequences
Calculating t lack And (3) withGray correlation coefficients of K similarity time series in (a);
calculation ofSimilarity weight omega for K similarity times in a series k
And calculating the complement value y of the missing point.
3: key influencing factor selection
301: construction of jacobian matrix
C=X 3 X 4
X 3 X 4 -X 5 X 6 =0
X 3 X 4 -X 1 X 2 =0
X 3 -X 1 X 8 =0
X 1 -X 5 X 7 =0
Wherein factor X (x=x 1 ,X 2 ,X 3 ,X 4 ,X 5 ,X 6 ,X 7 ,X 8 ) The contribution to the carbon emission variation is a function C (X), from GDIM, a jacobian matrix is constructed:
302: contribution decomposition
According to the GDIM principle, the amount of change in carbon emissions ac can be decomposed into forms in which the contributions of the various factors add up:
wherein L represents a time span; i represents an identity matrix; "+" indicates a generalized inverse matrix; c= (0 0 0X) 4 X 3 0 0 0 0) T The method comprises the steps of carrying out a first treatment on the surface of the Jacobian matrixIs not related to the column vector linearity of (2)
Wherein the change in carbon emissions can be decomposed into a sum of 8 effects: Δx1, Δx2, Δx3, Δx4, Δx5, Δx6, Δx7, Δx8. Wherein Δx1, Δx3 and Δx5 are absolute amounts reflecting the effects of the economic development level, energy consumption and population scale variation on the energy consumption carbon emissions, respectively; the remaining five effects are relative quantitative factors, and represent the carbon emission amount of the production unit GDP, the carbon emission amount generated by consuming unit energy, the carbon emission amount per person, the influence of the energy consumed by the production unit GDP and the production unit GDP on the carbon emission change in sequence.
4: influence factor prediction
401: sequential feature extraction
White noise addition
Using EEMD method, it is necessary to add different white noise to the initial time sequence, and time-series data x of power is used k,t Into a series of eigenmode (Intrinsic Mode Fuction, IMF) components.
In imf k,i Representing an ith intrinsic mode component obtained by decomposing the kth user class power utilization time sequence data by using EEMD; r is (r) k,n Is a residual component after decomposing the original time series data.
Component determination
EEMD decomposes time series data by connecting local maximum and minimum values by using cubic spline curve to form upper and lower envelope curves, and taking average value m j-1 Then from the initial sequence x k,t Subtracting the average of these envelopes to obtain the remaining sequence r k,i Thus, when each IMF process is obtained, there are:
r k,i =r k,i-1 -imf k,i
wherein when i=1, r is present k,0 =x k,t
Iterating continuously, for the remaining sequence r k,i Extracting the (i+1) th IMF until the remaining sequence r k,n And can not be decomposed any more, and finally, the average value of the IMFs obtained by adding different white noises is obtained. And finally, obtaining a regularity component, a trend component and a random component.
402: LSTM network prediction
The model training adopts LSTM algorithm to predict multivariable time sequence.
F i =LSTM θ (P regular,m ,P trend,m ,P random,m )
Wherein F is i Representing predicted values of influence factors; p (P) regular,m ,P trend,m ,P random,m Representing the regularity component, the trending component and the random component of different influencing factors.
5: carbon emission medium-long term prediction model based on PSO-XGBOOST method
501: PSO-XGBOOST model construction
The XGBoost algorithm realizes overall optimal solution by expanding the second-order Taylor of the loss function and introducing a regularization term, so that overall complexity of the model is controlled, and generalization capability of the algorithm is effectively improved. In addition, in order to prevent the over-fitting problem, the algorithm adopts a method of simultaneously selecting and processing the characteristics in parallel, so that the running speed of the algorithm is faster, and the result is more interpretable.
Assume that a data set d= { (x) containing n sample numbers is given i ,y i )∣x i ∈R m ,y i E R, i=1, 2, …, n } consists of m features for a total of n samples, where Rm and R represent m-dimensional real vector data sets and real sets, respectively.
Where fk is a regression tree, K is the total number of regression trees, and F is the regression tree space.
The objective function is expressed as:
wherein: l represents a loss function for measuring the error between the classification predicted value and the true value;classifying the predicted value; y is i Is a true value; omega (f) k ) Is a regular term.
The XGBoost algorithm adopts gradient lifting iterative operation, and a new regression tree is added after each iterative process, so that the t-th iterative operation result is as follows:
the formulas are combined, and the expression of the objective function of the t iteration is calculated as
Performing a second-order taylor expansion, and adding a regularization term Ω (f k ) And preventing the occurrence of the over fitting phenomenon.
Wherein: gamma represents She Zishu penalty coefficient; t is the number of leaf child nodes; omega is the leaf weight; λ is the weight penalty coefficient.
Based on XGBoost principle and PSO algorithm theory, PSO is applied to parameter optimization of XGBoost regression prediction, and a PSO-XGBoost carbon emission medium-long-term regression prediction model is constructed.
502: carbon emission prediction
(1) Carrying out data cleaning, data restoration and standardized pretreatment on carbon emission data and driving factor data obtained based on statistical data:
(2) GDIM analysis is carried out on the pretreated carbon emission data, important driving factors are selected according to the contribution rate, and a data set is formed according to the analysis result:
(3) Dividing a training set and a testing set according to the ratio of 8:2, setting a proper fitness function, initializing individual optimal values and global optimal values of particles, and optimizing leamingrate, max _depth and other parameters:
(4) The particle speed and the position are updated in a generation-by-generation mode, and the individual optimal value and the global optimal value are continuously updated until the generation-by-generation termination condition is reached by calculating the fitness value of the particle speed and the position:
(5) And selecting an optimal parameter value, constructing a parameter-optimized XGBoost regression prediction model, and importing a training set to perform model training learning.
The overall flow is shown in fig. 1.
The invention further provides a device for extracting the characteristics of the influence factors of the urban carbon emission and predicting the medium and long periods, which comprises the following components:
the selection module is used for selecting urban carbon emission influence factors based on an economic-energy-carbon emission balance system and acquiring influence factor historical data;
the data processing module is used for acquiring a high-quality data set through data preprocessing;
the analysis module is used for effectively analyzing the action mechanism and contribution degree of the influence factors of the urban carbon emission by adopting a generalized Diels index decomposition method and acquiring key influence factors influencing the urban carbon emission;
the prediction module predicts the univariate by adopting an integrated empirical mode decomposition-long-short-term memory network method; and constructing a medium-long-term prediction model of urban carbon emission by adopting an XGBoost network optimized based on a particle swarm optimization.
Simulation analysis is carried out on the above embodiments through actual data of Tianjin city, and the beneficial effects obtained by the invention are shown in figure 5. The model provided by the invention is used for predicting carbon emission in Tianjin city for a long time. The error rate between the test set and the true value is kept at about 3% and good predictive power is maintained when the carbon emission data fluctuates drastically. The predictive value obtained using the methods described herein is considered reasonable enough to be practical and informative.
In summary, the embodiment of the invention provides a method for extracting the characteristics of the influence factors of urban carbon emission and predicting the medium-long term, which makes full use of the history information to accurately and reliably predict the medium-long term variation condition of carbon emission.
The embodiment of the invention does not limit the types of other devices except the types of the devices, so long as the devices can complete the functions.
Those skilled in the art will appreciate that the drawings are schematic representations of only one preferred embodiment, and that the above-described embodiment numbers are merely for illustration purposes and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (8)

1. The method for extracting the characteristics of the influence factors of the urban carbon emission and predicting the medium and long term is characterized by comprising the following steps of:
s1: selecting urban carbon emission influence factors based on an economic-energy-carbon emission balance system, and acquiring influence factor historical data;
s2: carrying out data preprocessing on the data acquired in the step S1 to acquire a high-quality data set;
s3: based on the high-quality data set of S2, effectively analyzing the action mechanism and contribution degree of the influence factors of the urban carbon emission by adopting a generalized Diels index decomposition method, and acquiring key influence factors influencing the urban carbon emission;
s4: acquiring key influence factors based on the step S3, and predicting single variables by adopting an integrated empirical mode decomposition-long-short-term memory network method;
s5: and (3) based on the influence factor prediction result obtained in the step (S4), constructing a medium-long-term prediction model of urban carbon emission by adopting an XGBoost network optimized based on a particle swarm algorithm.
2. The method according to claim 1, characterized in that: the city carbon emission influencing factors selected in the S1 comprise: the energy consumption carbon emission C, the total actual production value GDP in the region, the carbon emission C/GDP of the production unit GDP, the energy consumption E, the carbon emission C/E of the unit energy consumption, the resident population P, the average carbon emission C/P, the average actual GDP and the energy consumption E/GDP of the production unit GDP.
3. The method according to claim 2, characterized in that: the data preprocessing in S2 comprises: and carrying out anomaly detection on the characteristic data to be detected by using a LOF algorithm, and carrying out complementation on the anomaly and missing data by using a GAKNNI algorithm.
4. A method according to claim 3, characterized in that: the generalized Diels-Alder index decomposition method in S3 comprises a Jacobian matrix construction step and a contribution decomposition step, wherein the Jacobian matrix is as follows:
C=X 3 X 4
X 3 X 4 -X 5 X 6 =0
X 3 X 4 -X 1 X 2 =0
X 3 -X 1 X 8 =0
X 1 -X 5 X 7 =0
wherein factor X (x=x 1 ,X 2 ,X 3 ,X 4 ,X 5 ,X 6 ,X 7 ,X 8 ) The contribution to the carbon emission variation is a function C (X);
the contribution decomposition is to decompose the change amount deltac of the carbon emission into a form in which contributions of the respective factors are added up:
wherein L represents a time span; i represents an identity matrix; "+" indicates a generalized inverse matrix;if jacobian matrix->Is not related to the column vector linearity of (2)
X1 is the total actual production value GDP in the region, X2 is the carbon emission C/GDP of the production unit GDP, X3 is the carbon emission C/E, X generated by the consumption of unit energy E, X, the resident population P, X6 is the average carbon emission C/P, X7 is the average actual GDP, and X8 is the energy consumption E/GDP of the production unit GDP;
the change in carbon emissions decomposes into the sum of 8 effects: Δx1, Δx2, Δx3, Δx4, Δx5, Δx6, Δx7, Δx8. Wherein Δx1, Δx3 and Δx5 are absolute amounts reflecting the effects of the economic development level, energy consumption and population scale variation on the energy consumption carbon emissions, respectively; the other five effects are relative quantity factors, and represent the carbon emission C/GDP of the production unit GDP, the carbon emission C/E generated by unit energy consumption, the carbon emission C/P of the average person, the actual GDP of the average person and the energy consumption E/GDP of the production unit GDP in sequence.
5. The method according to claim 4, wherein: s4, an integrated empirical mode decomposition-long-short-term memory network method is adopted, and specifically comprises the following steps:
the method comprises the steps of utilizing an integrated empirical mode decomposition method to obtain overall trend characteristics, seasonal characteristics and random fluctuation characteristics of each influence factor, utilizing the integrated empirical mode decomposition to decompose data, respectively establishing long-period and short-period memory networks for predicting the decomposed data, and then superposing prediction results to obtain a final prediction value.
6. The method according to claim 5, wherein: the integrated empirical mode decomposition method comprises the following steps: adding different white noise to the initial time sequence, and using the electric power time sequence data x k,t Is decomposed into a series of eigenmode components,
in imf k,i Representing an ith intrinsic mode component obtained by using integrated empirical mode decomposition on the kth user class power utilization time sequence data; r is (r) k,n Is a residual component after decomposing the original time sequence data;
the time sequence data is decomposed by connecting local maximum and minimum values by using a cubic spline curve to form an upper envelope line and a lower envelope line, and taking the average value m j-1 Then from the initial sequence x k,t Subtracting the average of these envelopes to obtain the remaining sequence r k,i Thus, when each IMF process is obtained, there are:
r k,i =r k,i-1 -imf k,i
wherein when i=1, r is present k,0 =x k,t
Iterating continuously, for the remaining sequence r k,i Extracting the (i+1) th IMF until the remaining sequence r k,n And the IMF cannot be decomposed any more, the average value of IMFs obtained by adding different white noises is obtained, and the overall trend characteristics, the seasonal characteristics and the random fluctuation characteristics of all influence factors are obtained.
7. The method according to claim 6, wherein: s5, adopting an XGBoost network optimized based on a particle swarm optimization, and constructing a medium-long-term prediction model of urban carbon emission, wherein the method comprises the following steps of:
initializing a model;
training an XGBoost model;
calculating a fitness value of the particle;
calculating an individual optimal value and a group optimal value;
iteratively updating the optimal solution;
if the conditions are met, the optimal parameters are assigned to the XGBoost model, and the XGBoost model is retrained by the optimal parameters; carrying out carbon emission prediction; if the condition is not satisfied, the model initialization is performed again.
8. The utility model provides a city carbon emission influence factor characteristic draws and long-term prediction device which characterized in that includes:
the selection module is used for selecting urban carbon emission influence factors based on an economic-energy-carbon emission balance system and acquiring influence factor historical data;
the data processing module is used for acquiring a high-quality data set through data preprocessing;
the analysis module is used for effectively analyzing the action mechanism and contribution degree of the influence factors of the urban carbon emission by adopting a generalized Diels index decomposition method and acquiring key influence factors influencing the urban carbon emission;
the prediction module predicts the univariate by adopting an integrated empirical mode decomposition-long-short-term memory network method; and constructing a medium-long-term prediction model of urban carbon emission by adopting an XGBoost network optimized based on a particle swarm optimization.
CN202311071418.7A 2023-08-24 2023-08-24 Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device Pending CN117114184A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311071418.7A CN117114184A (en) 2023-08-24 2023-08-24 Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311071418.7A CN117114184A (en) 2023-08-24 2023-08-24 Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device

Publications (1)

Publication Number Publication Date
CN117114184A true CN117114184A (en) 2023-11-24

Family

ID=88799580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311071418.7A Pending CN117114184A (en) 2023-08-24 2023-08-24 Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device

Country Status (1)

Country Link
CN (1) CN117114184A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371667A (en) * 2023-12-04 2024-01-09 中国长江电力股份有限公司 Analysis method of carbon emission influence factor and related equipment
CN117992806A (en) * 2024-04-07 2024-05-07 中清能源(杭州)有限公司 Carbon accounting method based on time sequence data analysis

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117371667A (en) * 2023-12-04 2024-01-09 中国长江电力股份有限公司 Analysis method of carbon emission influence factor and related equipment
CN117371667B (en) * 2023-12-04 2024-03-12 中国长江电力股份有限公司 Analysis method of carbon emission influence factor and related equipment
CN117992806A (en) * 2024-04-07 2024-05-07 中清能源(杭州)有限公司 Carbon accounting method based on time sequence data analysis
CN117992806B (en) * 2024-04-07 2024-06-04 中清能源(杭州)有限公司 Carbon accounting method based on time sequence data analysis

Similar Documents

Publication Publication Date Title
CN105391083B (en) Wind power interval short term prediction method based on variation mode decomposition and Method Using Relevance Vector Machine
Angelov et al. A new type of simplified fuzzy rule-based system
CN109242223B (en) Quantum support vector machine evaluation and prediction method for urban public building fire risk
CN112488415A (en) Power load prediction method based on empirical mode decomposition and long-and-short-term memory network
CN117114184A (en) Urban carbon emission influence factor feature extraction and medium-long-term prediction method and device
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN106951611A (en) A kind of severe cold area energy-saving design in construction optimization method based on user's behavior
CN112100911B (en) Solar radiation prediction method based on depth BILSTM
CN116316599A (en) Intelligent electricity load prediction method
Zaidan et al. Predicting atmospheric particle formation days by Bayesian classification of the time series features
CN116169670A (en) Short-term non-resident load prediction method and system based on improved neural network
CN111222689A (en) LSTM load prediction method, medium, and electronic device based on multi-scale temporal features
CN115759415A (en) Power consumption demand prediction method based on LSTM-SVR
CN115456245A (en) Prediction method for dissolved oxygen in tidal river network area
CN114862032A (en) XGboost-LSTM-based power grid load prediction method and device
Damaševičius et al. Decomposition aided attention-based recurrent neural networks for multistep ahead time-series forecasting of renewable power generation
CN112926251B (en) Landslide displacement high-precision prediction method based on machine learning
CN114596726A (en) Parking position prediction method based on interpretable space-time attention mechanism
Asaei-Moamam et al. Air quality particulate-pollution prediction applying GAN network and the Neural Turing Machine
Wu et al. A forecasting model based support vector machine and particle swarm optimization
CN116822742A (en) Power load prediction method based on dynamic decomposition-reconstruction integrated processing
CN115713144A (en) Short-term wind speed multi-step prediction method based on combined CGRU model
Fu et al. Prediction of financial economic time series based on group intelligence algorithm based on machine learning
CN113723707A (en) Medium-and-long-term runoff trend prediction method based on deep learning model
Zhang et al. A Multi-Model Prediction Method for Coal Mine Gas Concentration with Hierarchical Structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination