CN112132321A - Method for predicting and analyzing forest fire based on machine learning - Google Patents

Method for predicting and analyzing forest fire based on machine learning Download PDF

Info

Publication number
CN112132321A
CN112132321A CN202010865182.4A CN202010865182A CN112132321A CN 112132321 A CN112132321 A CN 112132321A CN 202010865182 A CN202010865182 A CN 202010865182A CN 112132321 A CN112132321 A CN 112132321A
Authority
CN
China
Prior art keywords
data
forest
fire
adopting
machine learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010865182.4A
Other languages
Chinese (zh)
Inventor
戴维序
彭玉泉
郭鉴威
史岩岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Xinde Zhitu Beijing Science And Technology Co ltd
Original Assignee
Aerospace Xinde Zhitu Beijing Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Xinde Zhitu Beijing Science And Technology Co ltd filed Critical Aerospace Xinde Zhitu Beijing Science And Technology Co ltd
Priority to CN202010865182.4A priority Critical patent/CN112132321A/en
Publication of CN112132321A publication Critical patent/CN112132321A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Molecular Biology (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a forest fire prediction analysis method based on machine learning, relates to the field of prediction analysis, adopts various machine learning algorithms, predicts forest fire probability through big data analysis, and effectively avoids the problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method.

Description

Method for predicting and analyzing forest fire based on machine learning
Technical Field
The invention relates to the field of predictive analysis, in particular to the field of predictive analysis of forest fires based on machine learning.
Background
At present, a semi-quantitative method is mainly adopted for the research of fire risk evaluation. For example, evaluation indexes, index weights and scores of the evaluation indexes are often determined according to expert experiences in a fuzzy comprehensive evaluation method, an index method and a matter element analysis method, and the evaluation mode mainly takes linearity as a main mode and depends heavily on subjective initiative and experience knowledge of individuals; qualitative evaluation methods such as safety checklists and pre-risk analysis lack clear measurable evaluation criteria; quantitative evaluation methods such as accident trees, for example, rely on expert judgment as well as on the probability of each event occurring. The evaluation theory of the risk of various forest fires is not mature enough, the evaluation standards are not uniform, and obvious subjectivity exists.
Moreover, even with existing prediction methods, the atmospheric environment and the flammability of vegetation are not taken into account, resulting in large deviations in the prediction results.
Disclosure of Invention
The invention aims to solve the technical problem of overcoming the defects in the prior art and provides a forest fire prediction analysis system based on machine learning.
The invention is realized by the following technical scheme:
the method adopts various machine learning algorithms and predicts the forest fire probability through big data analysis, and comprises the following steps:
data preprocessing: the data preprocessing comprises data acquisition and data processing;
data acquisition: the acquired data comprises two parts. The forecasting independent variable is composed of data of a plurality of parts, such as forest information condition, fire fighting facility condition, local geographic biological condition, weather condition, rainwater condition and ambient environment condition; the dependent variable to be predicted consists of fire information of historical fire information of a fire department.
Data processing: cleaning original data and removing repeated redundant data; and encoding non-numerical data in the original data.
For the fixed type data, adopting One-Hot coding for forest structure type, forest block use and forest combustible material type, converting the fixed type data into vector data which can be processed and identified by a computer,
for short text data, adopting One-Hot coding to the short text data of fire-fighting hidden danger and reporting information in the history record, adopting Word2vec to process the association among text vocabularies, and converting the association into dense Word vectors;
and for the long text data, generating a corresponding vector by adopting an LDA topic model so as to be used for subsequent processing.
Dimension reduction and feature selection:
selecting attributes closely related to fire occurrence by adopting a Relief characteristic selection method, deleting attribute variables with variance lower than a threshold value, then adopting a deep belief network to perform dimensionality reduction treatment,
model training:
four algorithms of k nearest neighbor, naive Bayes, random forests and AdaBoost are adopted, 10-fold cross validation is carried out on data, accuracy is used as weight, and weighted average values of prediction results of classifiers with different algorithms are used as final prediction results of the algorithms. The voting strategy of the k-nearest neighbor algorithm adopts a weighting method, namely the voting weights of all neighbor nodes are in inverse proportion to the distance, and the distinguishability is increased; and a KDTree algorithm is adopted in the search strategy, so that the search speed is accelerated. And (4) carrying out normalization processing on the data, and finally giving the fire probability by adopting the Euclidean distance as a distance definition mode.
In the random forest, setting the number n of all attributes, randomly selecting an attribute subset each time, taking log2n as the number of the attributes in the subset, training by adopting a small data volume sample, and selecting an optimal parameter so as to determine the maximum depth and the number of decision trees of each decision tree.
The AdaBoost model is trained in small data volume samples and selects the optimal number of individual classifiers and learning rate.
And (3) evaluating a model:
the model evaluation employs error rate, accuracy and cost sensitive error rate.
The error rate is the ratio of the number of samples with classified errors to the total number of samples, and is defined as
Figure BDA0002649505470000031
The accuracy is the proportion of the number of correctly classified samples to the total number of samples,
Figure BDA0002649505470000032
word2vec adopts CBOW to generate Word vectors of short texts, and the average value of the Word vectors of the texts is adopted to represent short text variables.
The naive Bayes model employs a Gaussian Bayes classifier.
The invention has the beneficial effects that: the problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method are effectively solved, corresponding accumulated data and environmental conditions are added into a prediction system, and a scientific data processing means is adopted, so that effective guarantee is provided for prediction of forest fires.
Drawings
FIG. 1 shows a model flow diagram according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and preferred embodiments.
As shown in the figure, the method for predicting and analyzing the forest fire based on the machine learning adopts various machine learning algorithms and predicts the forest fire probability through big data analysis, and comprises the following steps:
data preprocessing: the data preprocessing comprises data acquisition and data processing;
data acquisition: the acquired data comprises two parts. The forecasting independent variable is composed of data of a plurality of parts, such as forest information condition, fire fighting facility condition, local geographic biological condition, weather condition, rainwater condition and ambient environment condition; the dependent variable to be predicted consists of fire information of historical fire information of a fire department.
Data processing: cleaning original data and removing repeated redundant data; and encoding non-numerical data in the original data.
For the fixed type data, adopting One-Hot coding for forest structure type, forest block use and forest combustible material type, converting the fixed type data into vector data which can be processed and identified by a computer,
for short text data, adopting One-Hot coding to the short text data of fire-fighting hidden danger and reporting information in the history record, adopting Word2vec to process the association among text vocabularies, and converting the association into dense Word vectors;
and for the long text data, generating a corresponding vector by adopting an LDA topic model so as to be used for subsequent processing.
Dimension reduction and feature selection:
selecting attributes closely related to fire occurrence by adopting a Relief characteristic selection method, deleting attribute variables with variance lower than a threshold value, then adopting a deep belief network to perform dimensionality reduction treatment,
model training:
four algorithms of k nearest neighbor, naive Bayes, random forests and AdaBoost are adopted, 10-fold cross validation is carried out on data, accuracy is used as weight, and weighted average values of prediction results of classifiers with different algorithms are used as final prediction results of the algorithms. The voting strategy of the k-nearest neighbor algorithm adopts a weighting method, namely the voting weights of all neighbor nodes are in inverse proportion to the distance, and the distinguishability is increased; and a KDTree algorithm is adopted in the search strategy, so that the search speed is accelerated. And (4) carrying out normalization processing on the data, and finally giving the fire probability by adopting the Euclidean distance as a distance definition mode.
In the random forest, setting the number n of all attributes, randomly selecting an attribute subset each time, taking log2n as the number of the attributes in the subset, training by adopting a small data volume sample, and selecting an optimal parameter so as to determine the maximum depth and the number of decision trees of each decision tree.
The AdaBoost model is trained in small data volume samples and selects the optimal number of individual classifiers and learning rate.
And (3) evaluating a model:
the model evaluation employs error rate, accuracy and cost sensitive error rate.
The error rate is the ratio of the number of samples with classified errors to the total number of samples, and is defined as
Figure BDA0002649505470000041
The accuracy is the proportion of the number of correctly classified samples to the total number of samples,
Figure BDA0002649505470000042
word2vec adopts CBOW to generate Word vectors of short texts, and the average value of the Word vectors of the texts is adopted to represent short text variables. The naive bayes model employs a gaussian bayes classifier.
The invention has the beneficial effects that: and predicting the occurrence probability of the forest fire by adopting a machine learning method, and establishing a forest quantitative fire risk assessment system. On the basis of processing data by a one-hot code, Word2vec and LDA topic model, adopting a deep confidence network to reduce dimension, further adopting a Gaussian Bayes classifier, a k-nearest neighbor algorithm, a random forest and an Ada Boost algorithm, respectively constructing classifiers, and taking classification accuracy as a weight. The problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method are effectively avoided.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (3)

1. A method for predicting and analyzing forest fires based on machine learning adopts various machine learning algorithms and predicts forest fire probabilities through big data analysis, and comprises the following steps:
data preprocessing: the data preprocessing comprises data acquisition and data processing;
data acquisition: the acquired data comprises two parts. The forecasting independent variable is composed of data of a plurality of parts, such as forest information condition, fire fighting facility condition, local geographic biological condition, weather condition, rainwater condition and ambient environment condition; the dependent variable to be predicted consists of fire information of historical fire information of a fire department.
Data processing: cleaning original data and removing repeated redundant data; and encoding non-numerical data in the original data.
For the fixed type data, adopting One-Hot coding for forest structure type, forest block use and forest combustible material type, converting the fixed type data into vector data which can be processed and identified by a computer,
for short text data, adopting One-Hot coding to the short text data of fire-fighting hidden danger and reporting information in the history record, adopting Word2vec to process the association among text vocabularies, and converting the association into dense Word vectors;
and for the long text data, generating a corresponding vector by adopting an LDA topic model so as to be used for subsequent processing.
Dimension reduction and feature selection:
selecting attributes closely related to fire occurrence by adopting a Relief characteristic selection method, deleting attribute variables with variance lower than a threshold value, then adopting a deep belief network to perform dimensionality reduction treatment,
model training:
four algorithms of k nearest neighbor, naive Bayes, random forests and AdaBoost are adopted, 10-fold cross validation is carried out on data, accuracy is used as weight, and weighted average values of prediction results of classifiers with different algorithms are used as final prediction results of the algorithms. The voting strategy of the k-nearest neighbor algorithm adopts a weighting method, namely the voting weights of all neighbor nodes are in inverse proportion to the distance, and the distinguishability is increased; and a KDTree algorithm is adopted in the search strategy, so that the search speed is accelerated. And (4) carrying out normalization processing on the data, and finally giving the fire probability by adopting the Euclidean distance as a distance definition mode.
In the random forest, setting the number n of all attributes, randomly selecting an attribute subset each time, taking log2n as the number of the attributes in the subset, training by adopting a small data volume sample, and selecting an optimal parameter so as to determine the maximum depth and the number of decision trees of each decision tree.
The AdaBoost model is trained in small data volume samples and selects the optimal number of individual classifiers and learning rate.
And (3) evaluating a model:
the model evaluation employs error rate, accuracy and cost sensitive error rate.
The error rate is the ratio of the number of samples with classified errors to the total number of samples, and is defined as
Figure FDA0002649505460000021
The accuracy is the proportion of the number of correctly classified samples to the total number of samples,
Figure FDA0002649505460000022
2. the method for forest fire prediction analysis based on machine learning according to claim 1, wherein Word2vec uses CBOW to generate Word vectors of short text, and the average value of the Word vectors of text is used to represent short text variables.
3. The method for predictive analysis of forest fires based on machine learning of claim 1, said naive bayes model employing a gaussian bayes classifier.
CN202010865182.4A 2020-08-25 2020-08-25 Method for predicting and analyzing forest fire based on machine learning Pending CN112132321A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010865182.4A CN112132321A (en) 2020-08-25 2020-08-25 Method for predicting and analyzing forest fire based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010865182.4A CN112132321A (en) 2020-08-25 2020-08-25 Method for predicting and analyzing forest fire based on machine learning

Publications (1)

Publication Number Publication Date
CN112132321A true CN112132321A (en) 2020-12-25

Family

ID=73848942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010865182.4A Pending CN112132321A (en) 2020-08-25 2020-08-25 Method for predicting and analyzing forest fire based on machine learning

Country Status (1)

Country Link
CN (1) CN112132321A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766133A (en) * 2021-01-14 2021-05-07 金陵科技学院 Automatic driving deviation processing method based on Relieff-DBN
CN113591873A (en) * 2021-05-26 2021-11-02 东南大学 Flame image classification method based on ensemble learning
CN113762337A (en) * 2021-07-29 2021-12-07 国网河北省电力有限公司经济技术研究院 Initial fire determination method, device, terminal and storage medium
CN117035197A (en) * 2023-08-25 2023-11-10 成都理工大学 Intelligent lost circulation prediction method with minimized cost

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107085904A (en) * 2017-03-31 2017-08-22 上海事凡物联网科技有限公司 Forest fire danger class decision method and system based on single classification SVM
CN108921330A (en) * 2018-06-08 2018-11-30 新疆林科院森林生态研究所 A kind of forest management system
CN110956187A (en) * 2019-11-28 2020-04-03 中国农业科学院农业信息研究所 Unmanned aerial vehicle image plant canopy information extraction method based on ensemble learning
US20200242202A1 (en) * 2019-01-29 2020-07-30 Shenzhen Fugui Precision Ind. Co., Ltd. Fire development situation prediction device and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107085904A (en) * 2017-03-31 2017-08-22 上海事凡物联网科技有限公司 Forest fire danger class decision method and system based on single classification SVM
CN108921330A (en) * 2018-06-08 2018-11-30 新疆林科院森林生态研究所 A kind of forest management system
US20200242202A1 (en) * 2019-01-29 2020-07-30 Shenzhen Fugui Precision Ind. Co., Ltd. Fire development situation prediction device and method
CN110956187A (en) * 2019-11-28 2020-04-03 中国农业科学院农业信息研究所 Unmanned aerial vehicle image plant canopy information extraction method based on ensemble learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙立研等: "基于气象因子深度学习的森林火灾预测方法", 《林业工程学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112766133A (en) * 2021-01-14 2021-05-07 金陵科技学院 Automatic driving deviation processing method based on Relieff-DBN
CN113591873A (en) * 2021-05-26 2021-11-02 东南大学 Flame image classification method based on ensemble learning
CN113762337A (en) * 2021-07-29 2021-12-07 国网河北省电力有限公司经济技术研究院 Initial fire determination method, device, terminal and storage medium
CN117035197A (en) * 2023-08-25 2023-11-10 成都理工大学 Intelligent lost circulation prediction method with minimized cost
CN117035197B (en) * 2023-08-25 2024-06-04 成都理工大学 Intelligent lost circulation prediction method with minimized cost

Similar Documents

Publication Publication Date Title
CN112132321A (en) Method for predicting and analyzing forest fire based on machine learning
CN110084151B (en) Video abnormal behavior discrimination method based on non-local network deep learning
CN111967343B (en) Detection method based on fusion of simple neural network and extreme gradient lifting model
CN111708343B (en) Method for detecting abnormal behavior of field process behavior in manufacturing industry
CN112231562A (en) Network rumor identification method and system
CN112039903B (en) Network security situation assessment method based on deep self-coding neural network model
CN109740655B (en) Article scoring prediction method based on matrix decomposition and neural collaborative filtering
CN112735097A (en) Regional landslide early warning method and system
CN112131352A (en) Method and system for detecting bad information of webpage text type
CN111556016B (en) Network flow abnormal behavior identification method based on automatic encoder
CN111859010B (en) Semi-supervised audio event identification method based on depth mutual information maximization
CN110008699B (en) Software vulnerability detection method and device based on neural network
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN116307103A (en) Traffic accident prediction method based on hard parameter sharing multitask learning
CN112329974B (en) LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system
CN111641608A (en) Abnormal user identification method and device, electronic equipment and storage medium
Mezei et al. Credit risk evaluation in peer-to-peer lending with linguistic data transformation and supervised learning
CN112395168A (en) Stacking-based edge side service behavior identification method
CN115004652A (en) Business wind control processing method and device, electronic equipment and storage medium
CN113435124A (en) Water quality space-time correlation prediction method based on long-time and short-time memory and radial basis function neural network
CN115438102A (en) Space-time data anomaly identification method and device and electronic equipment
CN115659244A (en) Fault prediction method, device and storage medium
CN115544272A (en) Attention mechanism-based chemical accident cause knowledge graph construction method
CN114881173A (en) Resume classification method and device based on self-attention mechanism
Rijal et al. Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201225

RJ01 Rejection of invention patent application after publication