CN112132321A

CN112132321A - Method for predicting and analyzing forest fire based on machine learning

Info

Publication number: CN112132321A
Application number: CN202010865182.4A
Authority: CN
Inventors: 戴维序; 彭玉泉; 郭鉴威; 史岩岩
Original assignee: Aerospace Xinde Zhitu Beijing Science And Technology Co ltd
Current assignee: Aerospace Xinde Zhitu Beijing Science And Technology Co ltd
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2020-12-25

Abstract

The invention provides a forest fire prediction analysis method based on machine learning, relates to the field of prediction analysis, adopts various machine learning algorithms, predicts forest fire probability through big data analysis, and effectively avoids the problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method.

Description

Method for predicting and analyzing forest fire based on machine learning

Technical Field

The invention relates to the field of predictive analysis, in particular to the field of predictive analysis of forest fires based on machine learning.

Background

At present, a semi-quantitative method is mainly adopted for the research of fire risk evaluation. For example, evaluation indexes, index weights and scores of the evaluation indexes are often determined according to expert experiences in a fuzzy comprehensive evaluation method, an index method and a matter element analysis method, and the evaluation mode mainly takes linearity as a main mode and depends heavily on subjective initiative and experience knowledge of individuals; qualitative evaluation methods such as safety checklists and pre-risk analysis lack clear measurable evaluation criteria; quantitative evaluation methods such as accident trees, for example, rely on expert judgment as well as on the probability of each event occurring. The evaluation theory of the risk of various forest fires is not mature enough, the evaluation standards are not uniform, and obvious subjectivity exists.

Moreover, even with existing prediction methods, the atmospheric environment and the flammability of vegetation are not taken into account, resulting in large deviations in the prediction results.

Disclosure of Invention

The invention aims to solve the technical problem of overcoming the defects in the prior art and provides a forest fire prediction analysis system based on machine learning.

The invention is realized by the following technical scheme:

the method adopts various machine learning algorithms and predicts the forest fire probability through big data analysis, and comprises the following steps:

data preprocessing: the data preprocessing comprises data acquisition and data processing;

data acquisition: the acquired data comprises two parts. The forecasting independent variable is composed of data of a plurality of parts, such as forest information condition, fire fighting facility condition, local geographic biological condition, weather condition, rainwater condition and ambient environment condition; the dependent variable to be predicted consists of fire information of historical fire information of a fire department.

Data processing: cleaning original data and removing repeated redundant data; and encoding non-numerical data in the original data.

For the fixed type data, adopting One-Hot coding for forest structure type, forest block use and forest combustible material type, converting the fixed type data into vector data which can be processed and identified by a computer,

for short text data, adopting One-Hot coding to the short text data of fire-fighting hidden danger and reporting information in the history record, adopting Word2vec to process the association among text vocabularies, and converting the association into dense Word vectors;

and for the long text data, generating a corresponding vector by adopting an LDA topic model so as to be used for subsequent processing.

Dimension reduction and feature selection:

selecting attributes closely related to fire occurrence by adopting a Relief characteristic selection method, deleting attribute variables with variance lower than a threshold value, then adopting a deep belief network to perform dimensionality reduction treatment,

model training:

four algorithms of k nearest neighbor, naive Bayes, random forests and AdaBoost are adopted, 10-fold cross validation is carried out on data, accuracy is used as weight, and weighted average values of prediction results of classifiers with different algorithms are used as final prediction results of the algorithms. The voting strategy of the k-nearest neighbor algorithm adopts a weighting method, namely the voting weights of all neighbor nodes are in inverse proportion to the distance, and the distinguishability is increased; and a KDTree algorithm is adopted in the search strategy, so that the search speed is accelerated. And (4) carrying out normalization processing on the data, and finally giving the fire probability by adopting the Euclidean distance as a distance definition mode.

In the random forest, setting the number n of all attributes, randomly selecting an attribute subset each time, taking log2n as the number of the attributes in the subset, training by adopting a small data volume sample, and selecting an optimal parameter so as to determine the maximum depth and the number of decision trees of each decision tree.

The AdaBoost model is trained in small data volume samples and selects the optimal number of individual classifiers and learning rate.

And (3) evaluating a model:

the model evaluation employs error rate, accuracy and cost sensitive error rate.

The error rate is the ratio of the number of samples with classified errors to the total number of samples, and is defined as

The accuracy is the proportion of the number of correctly classified samples to the total number of samples,

word2vec adopts CBOW to generate Word vectors of short texts, and the average value of the Word vectors of the texts is adopted to represent short text variables.

The naive Bayes model employs a Gaussian Bayes classifier.

The invention has the beneficial effects that: the problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method are effectively solved, corresponding accumulated data and environmental conditions are added into a prediction system, and a scientific data processing means is adopted, so that effective guarantee is provided for prediction of forest fires.

Drawings

FIG. 1 shows a model flow diagram according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood by those skilled in the art, the present invention will be further described in detail with reference to the accompanying drawings and preferred embodiments.

As shown in the figure, the method for predicting and analyzing the forest fire based on the machine learning adopts various machine learning algorithms and predicts the forest fire probability through big data analysis, and comprises the following steps:

Dimension reduction and feature selection:

model training:

And (3) evaluating a model:

word2vec adopts CBOW to generate Word vectors of short texts, and the average value of the Word vectors of the texts is adopted to represent short text variables. The naive bayes model employs a gaussian bayes classifier.

The invention has the beneficial effects that: and predicting the occurrence probability of the forest fire by adopting a machine learning method, and establishing a forest quantitative fire risk assessment system. On the basis of processing data by a one-hot code, Word2vec and LDA topic model, adopting a deep confidence network to reduce dimension, further adopting a Gaussian Bayes classifier, a k-nearest neighbor algorithm, a random forest and an Ada Boost algorithm, respectively constructing classifiers, and taking classification accuracy as a weight. The problems of over-strong subjectivity, inconsistent evaluation standards and large evaluation result difference of the traditional evaluation method are effectively avoided.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for predicting and analyzing forest fires based on machine learning adopts various machine learning algorithms and predicts forest fire probabilities through big data analysis, and comprises the following steps:

Dimension reduction and feature selection:

model training:

And (3) evaluating a model:

2. the method for forest fire prediction analysis based on machine learning according to claim 1, wherein Word2vec uses CBOW to generate Word vectors of short text, and the average value of the Word vectors of text is used to represent short text variables.

3. The method for predictive analysis of forest fires based on machine learning of claim 1, said naive bayes model employing a gaussian bayes classifier.