CN118051845B

CN118051845B - Geospatial full coverage data generation method and device based on space variable parameter machine learning

Info

Publication number: CN118051845B
Application number: CN202410446330.7A
Authority: CN
Inventors: 高秉博; 王雨雪; 殷悦; 王辰怡; 刘燕青; 谢东凯; 姚晓闯; 杨建宇; 冯权泷
Original assignee: China Agricultural University
Current assignee: China Agricultural University
Priority date: 2024-04-15
Filing date: 2024-04-15
Publication date: 2024-06-18
Anticipated expiration: 2044-04-15
Also published as: CN118051845A

Abstract

The invention provides a geographic space full-coverage data generation method and device based on space variable parameter machine learning, and relates to the technical field of geographic information science. The method comprises the following steps: gradually partitioning a target area, and calculating the spatial layering heterogeneity of the relationship between various auxiliary variables and target variables in the current partitioning state based on various auxiliary variables and target variables in each observation site in the target area after each partitioning; determining a target partition state based on the spatial hierarchical heterogeneity; under a target partition state, respectively constructing a space variable parameter machine learning model aiming at each subarea in the target area; and respectively carrying out interpolation prediction on target variables of each preset point to be predicted in the target region based on each space variable parameter machine learning model corresponding to each sub-region to obtain an interpolation prediction result and an uncertainty analysis result. According to the method, the spatial distribution map corresponding to the accurate geospatial full coverage data can be interpolated according to the limited observation site data.

Description

Geospatial full coverage data generation method and device based on space variable parameter machine learning

Technical Field

The invention relates to the technical field of geographic information science, in particular to a geographic space full coverage data generation method and device based on space variable parameter machine learning.

Background

Spatial interpolation is often used to convert measurement data of discrete points into a continuous data surface for comparison with the distribution pattern of other spatial phenomena. The construction of observation sites such as weather, soil and the like is generally high in cost and limited in quantity, and continuous curved surface data in a research area cannot be acquired, so that the limited quantity of observation sites are often calculated to the whole research area by using a spatial interpolation technology. Along with the continuous popularization of the spatial data interpolation technology, the application of the spatial data interpolation technology is widely developed, so that the precision of the technology is higher. However, under the circumstance of complex and changeable geographical environments, when large-scale high-spatial-resolution interpolation is performed, when a target variable is affected by various auxiliary variables, and when strong spatial heterogeneity exists in the relationship between the target variable and the auxiliary variables, the accuracy of interpolation results obtained by the conventional spatial interpolation method is low.

Under the premise of assuming that the local spatial relationship between the target variable and the auxiliary variable is stable, researchers develop a geographic weighted regression model and a Bayesian spatial coefficient-variation model to simulate the spatial non-stable relationship by using spatial autocorrelation. However, the former is severely dependent on a predefined spatial kernel function, and is extremely susceptible to co-linearity, thereby affecting the accuracy of the interpolation result; the latter then requires a predefined distribution for each coefficient of the model and setting the cross-covariance function for the spatially random part of the coefficients, which is difficult to set correctly, which would affect the accuracy of the interpolation result.

Therefore, in a large-scale high-spatial-resolution interpolation scenario, how to interpolate a spatial distribution map of accurate weather or soil data according to limited observation site data is a current problem to be solved.

Disclosure of Invention

The invention provides a geospatial full coverage data generation method and device based on space variable parameter machine learning, which are used for solving the defect of low precision of spatial interpolation results such as meteorological or soil data and the like in a geospatial, and realizing high-precision spatial interpolation of the geospatial full coverage data.

The invention provides a geospatial full coverage data generation method based on space variable parameter machine learning, which comprises the following steps:

partitioning a target area step by step, and calculating the spatial layering heterogeneity of the relationship between each auxiliary variable and the target variable in the current partitioning state based on each auxiliary variable and the target variable in each observation site in the target area after each partitioning;

determining a target partition state based on the spatial layered heterogeneity of the relations between various auxiliary variables and target variables after each partition, wherein the target partition state corresponds to a plurality of subareas of the target area;

Under the target partition state, respectively constructing a space variable parameter machine learning model aiming at each subarea in the target area, wherein each space variable parameter machine learning model is obtained based on target variables and auxiliary variables of each observation station in a corresponding space range and distance training from each observation station to the corresponding space variable parameter machine learning model; each space variable parameter machine learning model comprises position information and space range information, wherein the position information is the center coordinates of the subareas corresponding to the space variable parameter machine learning model, and the space range information is the size of the subareas corresponding to the space variable parameter machine learning model;

And respectively carrying out interpolation prediction on target variables of preset points to be predicted in the target area based on each space variable parameter machine learning model corresponding to each subarea to obtain a target interpolation result and an uncertainty analysis result.

According to the geospatial full coverage data generation method based on space variable parameter machine learning provided by the invention, each time of partitioning is followed by calculating the spatial layering heterogeneity of the relations between each type of auxiliary variable and the target variable in the current partitioning state based on each type of auxiliary variable and the target variable in each observation site in the target region, and the method comprises the following steps:

For each observation site in the target area, calculating bivariate local space autocorrelation coefficients between the target variable in the observation site and each auxiliary variable respectively;

after each time of partitioning the target area, the spatial layering heterogeneity of the relationship between each type of auxiliary variable and the target variable in the current partitioning state is calculated based on each bivariate local autocorrelation coefficient.

According to the geospatial full coverage data generation method based on space variant parameter machine learning provided by the invention, for each observation site in the target area, a bivariate local space autocorrelation coefficient between the target variable in the observation site and each auxiliary variable is calculated, and the method comprises the following steps:

and calculating the bivariate local space autocorrelation coefficients corresponding to the observation sites based on the auxiliary variables of the observation sites and the target variables of a first preset number of adjacent observation sites of the observation sites aiming at each observation site in the target area.

According to the geospatial full coverage data generation method based on space variable parameter machine learning provided by the invention, after partitioning the target area each time, the spatial layered heterogeneity of the relationships between the auxiliary variables and the target variables in the current partition state is calculated based on the local auto-correlation coefficients of the bivariate variables respectively, and the method comprises the following steps:

after each time of partitioning the target area, calculating local autocorrelation index variance values of relations between various auxiliary variables in each partition space and target variables respectively based on the bivariate local space autocorrelation coefficients corresponding to all observation stations in each partition space in the current partition state;

calculating global autocorrelation index variance values of relations between various auxiliary variables in the target area and the target variables respectively based on the bivariate local spatial autocorrelation coefficients corresponding to all the observation stations in the target area;

and respectively calculating the spatial layering heterogeneity of the relations between the auxiliary variables and the target variables of the target region in the current partition state based on the local autocorrelation index variance value and the global autocorrelation index variance value.

According to the geospatial full coverage data generation method based on space variable parameter machine learning provided by the invention, the target partition state is determined based on the spatial hierarchical heterogeneity of the relations between various auxiliary variables and target variables after each partition, and the method comprises the following steps:

Based on the spatial layering heterogeneity of the relation between the auxiliary variables and the target variables after each partition, respectively calculating the average spatial layering heterogeneity corresponding to the target area after each partition;

Constructing a first change curve based on the average spatial layering heterogeneity corresponding to the target area after each partition, or constructing a second change curve based on the difference value of the average spatial layering heterogeneity after two adjacent partitions;

And determining the partition state corresponding to the inflection point of the first change curve or the inflection point of the second change curve as a target partition state.

According to the geospatial full coverage data generation method based on space variable parameter machine learning provided by the invention, each space variable parameter machine learning model corresponding to each subarea is obtained based on training by the following method:

Acquiring target variables and auxiliary variables of all observation sites in the subarea;

determining the distance between each observation station point in the subarea and the central point of the subarea;

Training a random forest model based on target variables and auxiliary variables of all observation sites in the subarea and distances from each observation site in the subarea to the central point of the subarea respectively, and taking the trained random forest model as the space variable parameter machine learning model corresponding to the subarea.

According to the geospatial full coverage data generation method based on space variable parameter machine learning provided by the invention, the target variables of each preset point to be predicted in the target area are respectively subjected to interpolation prediction based on each space variable parameter machine learning model respectively corresponding to each subarea, so as to obtain a target interpolation result and an uncertainty analysis result, and the method comprises the following steps:

Predicting target variables of the points to be predicted by adopting a plurality of space-variant parameter machine learning models to obtain a plurality of interpolation prediction results, wherein the plurality of space-variant parameter machine learning models comprise a second preset number of space-variant parameter machine learning models adjacent to the points to be predicted;

And determining a target interpolation result and an uncertainty analysis result of the target variable of the point to be predicted according to a plurality of interpolation prediction results based on an inverse distance weighting mode.

The invention also provides a geospatial full coverage data generation device based on space variable parameter machine learning, which comprises:

The partitioning module is used for gradually partitioning the target area, and calculating the spatial layering heterogeneity of the relationship between various auxiliary variables and target variables in the current partitioning state based on various auxiliary variables and target variables in each observation site in the target area after each partitioning;

the determining module is used for determining a target partition state based on the space layering heterogeneity of the relations between various auxiliary variables and target variables after each partition, wherein the target partition state corresponds to a plurality of subareas of the target area;

The modeling module is used for respectively constructing space variable parameter machine learning models aiming at all the subareas in the target area state, wherein all the space variable parameter machine learning models are respectively obtained based on target variables and auxiliary variables of all the observation sites in the corresponding space range and the distance training from each observation site to the corresponding space variable parameter machine learning model; each space variable parameter machine learning model comprises position information and space range information, wherein the position information is the center coordinates of the subareas corresponding to the space variable parameter machine learning model, and the space range information is the size of the subareas corresponding to the space variable parameter machine learning model;

the prediction module is used for respectively carrying out interpolation prediction on target variables of each preset point to be predicted in the target area based on each space variable parameter machine learning model corresponding to each subarea to obtain a target interpolation result and an uncertainty analysis result.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the geospatial full coverage data generation method based on the space variant parameter machine learning when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a geospatial full coverage data generation method based on spatially varying parameter machine learning as described in any of the above.

The invention also provides a computer program product comprising a computer program which when executed by a processor implements a geospatial full coverage data generation method based on spatially varying parameter machine learning as described in any one of the above.

According to the spatial interpolation method and the spatial interpolation device for spatial variable parameter machine learning, provided by the invention, aiming at the situation that attribute similarity and spatial autocorrelation exist between a target variable and various auxiliary variables at the same time and the relationship heterogeneity between the target variable and various auxiliary variables is strong, a plurality of subareas are obtained by partitioning a target area based on spatial layered heterogeneity, and a target interpolation result is obtained by predicting a target area to-be-predicted point based on a spatial variable parameter machine learning model corresponding to each subarea. Because each space variable parameter machine learning model constructed for each subarea has specific positions and ranges, the space non-stationary relation between the target variable and the auxiliary variable can be effectively simulated, meanwhile, the space variable parameter machine learning model is built based on the target variable of the observation station, the auxiliary variable and the distance training of the observation station from the position of the space variable parameter machine learning model, therefore, the local space change in the relation between the target variable and the auxiliary variable can be automatically modeled, the space correlation and the dissimilarity rule of the multidimensional mixed type auxiliary variable can be fully utilized, the target variable of each point to be predicted can be accurately interpolated according to the limited observation station point data in a large-scale high-spatial resolution interpolation situation, and further, the accurate space distribution diagram corresponding to the geographic space full coverage data can be obtained.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow diagram of a geospatial full coverage data generation method based on spatially varying parameter machine learning according to an embodiment of the present invention;

FIG. 2 is a second flow chart of a geospatial full coverage data generation method based on spatially varying parameter machine learning according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a first curve and a second curve constructed based on average spatial layered heterogeneity provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of each sub-region in a target partition state according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a geospatial full coverage data generating device based on spatially varying parameter machine learning according to an embodiment of the present invention;

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Aiming at the situation that attribute similarity and spatial autocorrelation exist between a target variable and an auxiliary variable respectively and the relation heterogeneity between the target variable and the auxiliary variable is strong, the geographic space full coverage data generation method based on spatial variable parameter machine learning, which can fully utilize the spatial correlation and diversity rule of the multidimensional mixed type auxiliary variable, is researched, so that the precision of large-range high-spatial resolution interpolation drawing is improved.

In order to solve the technical problems in the prior art, an embodiment of the present application provides a geospatial full coverage data generating method based on space variant parameter machine learning, and fig. 1 is one of flow diagrams of the geospatial full coverage data generating method based on space variant parameter machine learning provided in the embodiment of the present application, as shown in fig. 1, where the method includes:

Step 110: partitioning the target area step by step, and calculating the spatial layering heterogeneity of the relationship between each auxiliary variable and the target variable in the current partitioning state based on each auxiliary variable and the target variable in each observation site in the target area after each partitioning.

Specifically, fig. 2 is a second flow chart of a geospatial full coverage data generating method based on space variant parameter machine learning according to an embodiment of the present invention, where, as shown in fig. 2, target areas may be recursively classified into four areas based on a quadtree recursion principle, that is, the target areas are equally divided into four areas, and further, four areas are gradually divided into four blocks according to a preset sequence, such as an upper left, an upper right, a lower left and a lower right sequence, and each of the four blocks is divided once into one partition of the target areas, so that the partitioning is gradually performed. And calculating the spatial layering heterogeneity of the relation between each auxiliary variable and the target variable in the current partition state after each partition, wherein after each partition, one grid area of the target area represents one partition space.

In one embodiment, calculating the spatial hierarchical heterogeneity of the relationships between each type of auxiliary variable and the target variable in the current partition state based on each type of auxiliary variable and the target variable in each observation site in the target area after each partition includes:

Specifically, as shown in fig. 2, the bivariate local spatial autocorrelation coefficients are determined based on the target variable and its influencing factor data, that is, the auxiliary variable influencing the target variable. The local spatial autocorrelation coefficients (bivariate LISA) of bivariate (bivariate refers to the target variable and a class of auxiliary variables) can measure the local heterogeneity of the correlation between the two variables. After the bivariate LISA values between the target variable (Y) and the auxiliary variables (X) on all the observation sites are calculated respectively, the spatial modes of the relations between the target variable (Y) and the auxiliary variables (X) can be obtained according to the spatial distribution of the bivariate LISA values.

In one embodiment, the calculating, for each of the observation sites in the target area, a bivariate local spatial autocorrelation coefficient between the target variable in the observation site and each of the auxiliary variables respectively includes:

Specifically, each observation site can be provided with a target variable and multiple auxiliary variables, and when the double-variable local space autocorrelation coefficients between the target variable and the auxiliary variables are calculated, the target variable and the auxiliary variables on each observation site are subjected to standardized processing. The process of normalization is illustrated by taking as an example the target variable at one observation site: and subtracting the target variable mean value of the target area from the target variable of the observation site, and dividing the target variable mean value by the target variable standard deviation of the target area to obtain the target variable after the standardized processing.

After each target variable and each auxiliary variable are respectively standardized, calculating the bivariate local Morlan index on each observation site, and taking the bivariate local Morlan index as the bivariate local space autocorrelation coefficient. For each observation site, the bivariate local molan index on the observation site is determined based on the auxiliary variable on the observation site and the target variable on a first preset number of neighboring observation sites to the observation site. For an observation site, one type of auxiliary variable corresponds to one bivariate local moland index with a target variable, namely when the observation site has 5 types of auxiliary variables, 5 bivariate local moland indexes are determined for the observation site.

Optionally, bivariate local Morganella index at each observation site of the target areaCan be obtained by calculation of the following formula:

(1)

Wherein, in the above formula, the catalyst, Representing the observation site/>, on the target area BBivariate local moland index of >The value range of (C) is [ -1,1],/><0 Indicates that the target variable is inversely related to the auxiliary variable,/>>0 Represents that the target variable is positively correlated with the auxiliary variable; representing the observation site/> Some kind of auxiliary variable values; /(I)Is the observation site/>Target variable values of (2); /(I)Representation and viewing site/>The number of spatially adjacent neighboring observation sites (i.e. the first preset number, which may be exemplified by the observation sites/>Respectively selects one adjacent observation site from the upper, lower, left and right, namely a first preset quantity/>, andMay be 4); /(I)The representation is based on the observation siteAnd observation site/>A spatial weight matrix weighted by the distance between the two.

In one embodiment, after each partitioning the target area, calculating spatial hierarchical heterogeneity of relationships between the auxiliary variables and the target variables in the current partition state based on the bivariate local autocorrelation coefficients respectively, including:

Specifically, as shown in fig. 2, spatial layered heterogeneity corresponding to each type of auxiliary variable in the current partition state can be calculated by a geographic detector based on the bivariate local autocorrelation coefficient. As can be seen from the above embodiments, each observation station corresponds to multiple types of auxiliary variables, and each type of auxiliary variable corresponds to a bivariate local spatial autocorrelation coefficient. Based on the above, after partitioning the target area each time, for each partition space in the current partition state, calculating local autocorrelation index variance values of relations between various auxiliary variables in the partition space and the target variables respectively according to the bivariate local spatial autocorrelation coefficients corresponding to all observation stations in the partition space. After each time of partitioning the target area, global autocorrelation index variance values of relations between various auxiliary variables in the target area and the target variables are calculated respectively based on bivariate local spatial autocorrelation coefficients corresponding to all observation stations in the target area, namely, each auxiliary variable corresponds to the global autocorrelation index variance value.

After obtaining the local autocorrelation index variance value and the global autocorrelation index variance value of the relations between each type of auxiliary variable and the target variable respectively, calculating the spatial layering heterogeneity of the relations between each type of auxiliary variable and the target variable respectively in the current partition state of the target region based on the local autocorrelation index variance value and the global autocorrelation index variance value respectively：

(2)

Wherein,Represents the/>Spatial hierarchical heterogeneity corresponding to class auxiliary variables,/>The value range of (1) is 0,1,There is no apparent spatial pattern representing the relationship between the target variable and the auxiliary variable, randomly assigned to spatial locations,/>Representing that the relationship between the target variable and the auxiliary variable is consistent within the same partition space, while the relationship between the target variable and the auxiliary variable is different between different partition spaces; /(I)Representing a partition space; /(I)Representing the number of partition spaces in the target area; /(I)Representing partition space/>The number of mesoscopic sites; /(I)Representing the number of observation sites in the target area; /(I)Representing partition space/>Middle/>A local autocorrelation index variance value corresponding to the auxiliary variable; /(I)Representing the/>, in the target areaGlobal autocorrelation index variance values corresponding to the class auxiliary variables.

Step 120: and determining a target partition state based on the spatial hierarchical heterogeneity of the relations between various auxiliary variables and target variables after each partition, wherein the target partition state corresponds to a plurality of subareas of the target region.

Specifically, after each partition, determining a target partition state based on spatial layering heterogeneity of relationships between various auxiliary variables and target variables, namely determining that the partition is no longer performed, determining a current partition state as a final target partition state, wherein the target partition state corresponds to a plurality of subareas of a target area.

In one embodiment, the determining the target partition state based on the spatial hierarchical heterogeneity of the relationships between the auxiliary variables and the target variables after each partition includes:

Specifically, after each division, based on the spatial layered heterogeneity calculated for each type of auxiliary variable, the average spatial layered heterogeneity corresponding to the target region after each division is calculated. For example, when the auxiliary variables corresponding to the observation sites in the target area have 5 classes, based on the above embodiment, the spatial hierarchical heterogeneity of the 5 classes of auxiliary variables can be calculated and obtained respectively、/>、/>、/>And/>Further, the average spatial layering heterogeneity/>, is calculated and determined。

Fig. 3 is a schematic diagram of a first curve and a second curve constructed based on average spatial layered heterogeneity according to an embodiment of the present invention. As shown in fig. 3, based on the average spatial stratification heterogeneity calculated after each partition, a first variation curve may be constructed that characterizes the variation of the average spatial stratification heterogeneity after each partition. Or based on the difference of the average spatial layered heterogeneity after two adjacent partitionsA second variation curve may be constructed that characterizes the difference in average spatial stratification heterogeneity corresponding to adjacent partitions. Determining a partition state corresponding to an inflection point of a first change curve or an inflection point of a second change curve as a target partition state, wherein the change of the average spatial layering heterogeneity corresponding to the partition state after the first change curve is the point, after the first change curve is partitioned, tends to be stable, and the change of the difference value of the average spatial layering heterogeneity after the second change curve is the point, two adjacent partitions tends to be stable.

According to the geospatial full coverage data generation method based on space variant parameter machine learning in the embodiment, the target partition state and the spatial position and the spatial range of each partition space in the target partition state are determined recursively based on the principle of a quadtree through a geographic detector model based on the autocorrelation between the target variable and the auxiliary variable, so that the space non-stationary relation between the variables can be effectively simulated.

Step 130: under the target partition state, respectively constructing a space variable parameter machine learning model aiming at each subarea in the target area, wherein each space variable parameter machine learning model is obtained based on target variables and auxiliary variables of each observation station in a corresponding space range and distance training from each observation station to the corresponding space variable parameter machine learning model; each space variable parameter machine learning model comprises position information and space range information, wherein the position information is the center coordinates of the subareas corresponding to the space variable parameter machine learning model, and the space range information is the size of the subareas corresponding to the space variable parameter machine learning model.

In one embodiment, each spatially-varying parameter machine learning model corresponding to each sub-region is obtained based on training by the following method:

Specifically, in the target partition state, the target area comprises a plurality of subareas, and for each subarea, a spatial side parameter machine learning model with position information and a spatial range is set at the central position of the subarea, and the spatial variable parameter machine learning model corresponding to each subarea is respectively trained. As shown in fig. 2, the specific training process of the spatially-varying parameter machine learning model corresponding to each sub-region is as follows:

Calculating the center point coordinates of the subareas, determining the sizes of the subareas, determining the center coordinates of the subareas as the positions (S) of the corresponding space-variant parameter machine learning models to be trained, and determining the sizes of the subareas as the space ranges (E) of the corresponding space-variant parameter machine learning models to be trained (namely, the random forest models RF). Fig. 4 is a schematic diagram of each sub-region in the target partition state according to the embodiment of the present invention. As shown in fig. 4, each rectangular square in the figure represents the size of each sub-region, that is, the spatial range of the spatial variable parameter machine learning model to be trained; the large dots in each sub-region represent the center point of the sub-region, i.e. the location of the spatially varying parametric machine learning model to be trained, and the small dots in fig. 4 represent the location distribution of the observation sites.

Obtaining target variables and auxiliary variables of all observation sites in a subarea, respectively determining the distances between each observation site in the subarea and the central point (namely the position of a space variable parameter machine learning model to be trained) of the subarea, taking data of the observation sites covered by the subarea as samples, and taking the target variables, the auxiliary variables and the distances between the observation sites and the corresponding space variable parameter machine learning model to be trained into a formula (3) to train the space variable parameter machine learning model:

(3)

Wherein Y in the above formula represents the value of the target variable; and/> Respectively representing the position information and the spatial range information of a spatial variable parameter machine learning model to be trained; /(I)Represents an auxiliary variable affecting Y; /(I)Representing spatial extent/>, of a spatially-varying parametric machine learning model to be trainedPosition/>, of each covered observation site to the space-variant parameter machine learning modelIs a distance of (3).

AddingAs training data, it is intended to model the local spatial variation of the relationship between the target variable and the auxiliary variable, which means that all the two points of the same auxiliary variable value may also be different if the distance from the spatial machine learning model is different, and this possible difference is predicted by the spatial variable parameter machine learning model from the observation site data.

In a specific training process, in order to ensure that the space-variant parameter machine learning model to be trained can have enough training samples, the space range of the space-variant parameter machine learning model to be trained can be determinedAmplification to expand the number of observation sites involved in its training, in particular the degree of amplification can be determined from a preset minimum amount of training data, e.g. when the training model requires a minimum of 100 observation sites in its range, then the spatial range/>, of the spatially varying parametric machine learning model to be trained is expandedSo that at least 100 observation sites are included in its scope. This amplification is reasonable because machine learning can build more complex models than linear models, and can model local changes in the relationship. In addition, in the spatial coefficient of variation model, overlapping of part of the training data is also acceptable.

Compared with the existing interpolation method, the space variable parameter machine learning model can utilize high-dimensional auxiliary variables, and simulate the space non-stationary relation between the target variable and the auxiliary variables while avoiding colinear, so that the method has the following beneficial effects:

(1) Aiming at the problem that the relation between the target variable and the auxiliary variable is spatially non-stable under the influence of conditions such as complex terrain, artificial activities and the like, two main parameters of a spatial position and a range are set for a spatial variable parameter machine learning model corresponding to each sub-region, in addition, the distance between the spatial variable parameter machine learning model and an observation site is added as a key variable during training, so that the local spatial change of the relation between the target variable and the auxiliary variable is automatically simulated, the spatial autocorrelation rule is correctly utilized, and the prediction precision of the model is improved.

(2) Aiming at the problems that the multiple linear regression model has factor collinearity and the machine learning interpretation is poor, the space variable parameter machine learning model in the embodiment can effectively process the condition of high interaction of the multidimensional auxiliary variable and can better simulate the nonlinear relation between the target variable and the auxiliary variable.

Step 140: and respectively carrying out interpolation prediction on target variables of each preset point to be predicted in the target area based on each space variable parameter machine learning model corresponding to each subarea to obtain a target interpolation result and an uncertainty analysis result.

In one embodiment, the performing interpolation prediction on the target variable of each preset point to be predicted in the target area based on each spatial variable parameter machine learning model corresponding to each sub-area respectively to obtain a target interpolation result and an uncertainty analysis result includes:

Specifically, after determining the target partition state, interpolation prediction is performed on each to-be-predicted point in the target area based on each space variable parameter machine learning model corresponding to each sub-area in the target partition state, and the method for obtaining the target interpolation result specifically comprises the following steps:

And predicting each to-be-predicted point by adopting a plurality of space variable parameter machine learning models, so as to obtain a plurality of interpolation prediction results. The plurality of spatially-varying parameter machine learning models include a second preset number of spatially-varying parameter machine learning models adjacent to the point to be predicted, for example, 4 spatially-varying parameter machine learning models nearest to the point to be predicted may be selected.

Further, based on the inverse distance weighting mode, a target interpolation result of the point to be predicted is determined according to the interpolation prediction results. In the geospatial full coverage data generation method based on space variable parameter machine learning, the target interpolation result of each point to be predicted is the linear combination of interpolation prediction results of a space variable parameter machine learning model which is spatially adjacent to the point to be predicted, namely the target interpolation resultThe method comprises the following steps:

(4)

wherein, in the above formula Representing the points to be predicted/>Target interpolation results (i.e., model predicted target variables); /(I)Represents the/>Personal space variant parameter machine learning model/>To-be-predicted point/>Interpolation prediction results of (a); /(I)Representation for the points to be predicted/>The number of the predicted space-variant parameter machine learning models (namely a second preset number); /(I)Representing spatially varying parametric machine learning model/>The corresponding weight values, namely estimated regression coefficients, are spatially different from each other, and are determined based on an inverse distance weighting mode; /(I)Representing the points to be predicted/>Is used for the position coordinates of the object.

Alternatively, a variety of kernel functions may be used to determine weights for each spatially-variant parameter machine learning model in equation (4), respectively. As shown in fig. 2, the embodiment of the present application performs a crossover test using five kernel functions to determine a kernel function with smaller error for interpolation prediction. The five kernel functions used in the embodiment of the application comprise: nearest neighbor, equal Weight (EW), inverse Distance Weighting (IDW), gaussian Weighting (GW), and adaptive gaussian weighting (GAW), as follows:

(1) Nearest neighbor: and taking the predicted value of the spatially variable parameter machine learning model which is spatially closest to the point to be predicted as a final target interpolation result.

(2) EW: using the nearest surroundings of the point to be predictedAnd predicting by using the space variable parameter machine learning model, and taking the average value of each interpolation prediction result as a final target interpolation result.

(3) IDW: using the nearest surroundings of the point to be predictedPredicting the space-variant parameter machine learning model, and performing inverse distance weighting on each interpolation prediction result to obtain a final target interpolation result, wherein the weight/>, of each space-variant parameter machine learning modelThe formula is as follows:

(5)

wherein, in the above formula Represents an arbitrary positive real number, usually set to 2; /(I)Representing spatially varying parametric machine learning model/>To the point to be predicted/>Is the euclidean distance of (2); /(I)Representation for the points to be predicted/>The number of spatially-varying parametric machine learning models to predict.

(4) GW: using the nearest surroundings of the point to be predictedPredicting the space-variant parameter machine learning models, and carrying out Gaussian weighting on the interpolation prediction results, wherein the weight/>, of each space-variant parameter machine learning modelThe formula is as follows:

(6)

wherein, in the above formula Representing the points to be predicted/>Position coordinates of/>Representing spatially varying parametric machine learning model/>To the point to be predicted/>The Euclidean distance of (2) is calculated as/>，/>Is a constant value, and is a function of the constant,Representing spatially varying parametric machine learning model/>Is used for the position coordinates of the object.

(5) GAW: using the nearest surroundings of the point to be predictedPredicting by using a space variable parameter machine learning model, and carrying out Gaussian self-adaptive weighting on a final target interpolation result, wherein the formula is as follows:

(7)

wherein, in the above formula Representing the points to be predicted/>Position coordinates of/>Representing spatially varying parametric machine learning model/>To the point to be predicted/>Is the euclidean distance of (2); /(I)Representing the optimization parameters.

Through the cross test, the kernel function with smaller error is Inverse Distance Weighting (IDW), so that a mode based on inverse distance weighting is determined, and a weight value corresponding to each space variable parameter machine learning model is determined.

It will be appreciated that the spatially-variant machine learning model may generate a certain error when predicting, and thus, the uncertainty analysis result of the final target interpolation result may be measured according to the uncertainty of each spatially-variant machine learning model, i.e., the error variance of the final target interpolation result (i.e., the uncertainty analysis result) may be converted into a linear combination of the error variances of each spatially-variant machine learning model, the error variancesThe formula of (2) is as follows:

(8)

Wherein, V represents the variance statistic of interpolation prediction results of the spatial variable parameter machine learning model for predicting the point to be predicted; Representing interpolation prediction results; /(I) Representing the true value; /(I)Representing the number of spatially-varying parameter machine learning models for predicting the point to be predicted; /(I)Represents the/>Interpolation prediction results of the individual space variable parameter machine learning model; /(I)Represents the/>Weights of the individual space variant machine learning model.

Since each spatially-varying parameter machine learning model is built by independent training, the prediction errors of the different spatially-varying parameter machine learning models can be considered independent, based on which equation (8) can be further derived as equation (9):

(9)

Wherein, Is/>Error variance of interpolation prediction results of the individual space-variant parameter machine learning model.

Furthermore, a confidence interval of the spatial interpolation prediction can be determined according to a prediction result of the spatial variable parameter machine learning model on each point to be predicted in the target area.

The uncertainty estimation is carried out on the prediction result, the error variance of the prediction result is given, and the confidence interval of the spatial interpolation prediction is given through the prediction result, so that the interpretability of the spatial variable parameter machine learning model is further improved.

According to the spatial interpolation method for spatial variable parameter machine learning, aiming at the situation that attribute similarity and spatial autocorrelation exist between a target variable and various auxiliary variables respectively and the relationship heterogeneity between the target variable and various auxiliary variables is strong, a plurality of subareas are obtained by partitioning a target area based on spatial layered heterogeneity, and a target interpolation result is obtained by predicting a target area to-be-predicted point based on a spatial variable parameter machine learning model corresponding to each subarea. Because each space variable parameter machine learning model constructed for each subarea has specific positions and ranges, the space non-stationary relation between the target variable and the auxiliary variable can be effectively simulated, meanwhile, the space variable parameter machine learning model is built based on the target variable of the observation station, the auxiliary variable and the distance training of the observation station from the position of the space variable parameter machine learning model, therefore, the local space change in the relation between the target variable and the auxiliary variable can be automatically modeled, the space correlation and the dissimilarity rule of the multidimensional mixed type auxiliary variable can be fully utilized, the target variable of each point to be predicted can be accurately interpolated according to the limited observation station point data in a large-scale high-spatial resolution interpolation situation, and further, the accurate space distribution diagram corresponding to the geographic space full coverage data can be obtained.

The geospatial full coverage data generating device based on space variant parameter machine learning provided by the invention is described below, and the geospatial full coverage data generating device based on space variant parameter machine learning described below and the geospatial full coverage data generating method based on space variant parameter machine learning described above can be correspondingly referred to each other.

Fig. 5 is a schematic structural diagram of a geospatial full coverage data generating apparatus based on spatially varying parameter machine learning according to an embodiment of the present invention, and as shown in fig. 5, the geospatial full coverage data generating apparatus 500 based on spatially varying parameter machine learning includes: a partitioning module 510, a determining module 520, a modeling module 530, and a predicting module 540;

The partitioning module 510 is configured to gradually partition a target area, and calculate spatial hierarchical heterogeneity of relationships between various auxiliary variables and target variables in a current partition state based on various auxiliary variables and target variables in each observation site in the target area after each partition;

A determining module 520, configured to determine a target partition state based on the spatial hierarchical heterogeneity of relationships between each type of auxiliary variable and a target variable after each partition, where the target partition state corresponds to a plurality of sub-regions of the target region;

The modeling module 530 is configured to respectively construct, in the target partition state, a spatial variable parameter machine learning model for each of the sub-regions in the target region, where each spatial variable parameter machine learning model is respectively obtained based on a target variable and an auxiliary variable of each observation site in a corresponding spatial range, and a distance training between each observation site and the corresponding spatial variable parameter machine learning model; each space variable parameter machine learning model comprises position information and space range information, wherein the position information is the center coordinates of the subareas corresponding to the space variable parameter machine learning model, and the space range information is the size of the subareas corresponding to the space variable parameter machine learning model;

The prediction module 540 is configured to perform interpolation prediction on target variables of each preset point to be predicted in the target area based on each spatial variable parameter machine learning model corresponding to each sub-area, so as to obtain a target interpolation result and an uncertainty analysis result.

According to the spatial interpolation device for spatial variable parameter machine learning, aiming at the situation that attribute similarity and spatial autocorrelation exist between a target variable and various auxiliary variables at the same time respectively and the relationship heterogeneity between the target variable and various auxiliary variables is strong, a plurality of subareas are obtained by partitioning a target area based on spatial layering heterogeneity, and a target interpolation result is obtained by predicting a target area to-be-predicted point based on a spatial variable parameter machine learning model corresponding to each subarea. Because each space variable parameter machine learning model constructed for each subarea has specific positions and ranges, the space non-stationary relation between the target variable and the auxiliary variable can be effectively simulated, meanwhile, the space variable parameter machine learning model is built based on the target variable of the observation station, the auxiliary variable and the distance training of the observation station from the position of the space variable parameter machine learning model, therefore, the local space change in the relation between the target variable and the auxiliary variable can be automatically modeled, the space correlation and the dissimilarity rule of the multidimensional mixed type auxiliary variable can be fully utilized, the target variable of each point to be predicted can be accurately interpolated according to the limited observation station point data in a large-scale high-spatial resolution interpolation situation, and further, the accurate space distribution diagram corresponding to the geographic space full coverage data can be obtained.

In one embodiment, the partitioning module 510 is specifically configured to:

Calculating bivariate local space autocorrelation coefficients of the target variable in the observation site and the auxiliary variables respectively aiming at each observation site in the target area;

After each partitioning is carried out on the target area, the spatial layering heterogeneity of the relationship between the auxiliary variables and the target variables in the current partitioning state is calculated based on the bivariate local autocorrelation coefficients.

In one embodiment, the partitioning module 510 is specifically configured to:

After partitioning the target area each time, calculating local autocorrelation index variance values respectively corresponding to various auxiliary variables in each partition space based on the bivariate local space autocorrelation coefficients corresponding to all observation stations in each partition space in the current partition state;

In one embodiment, the determining module 520 is specifically configured to:

In one embodiment, each spatially-varying parameter machine learning model corresponding to each target partition space is obtained based on training by the following method:

In one embodiment, the prediction module 540 is specifically configured to:

Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a geospatial full coverage data generation method based on spatially varying parameter machine learning, the method comprising:

And respectively carrying out interpolation prediction on target variables of each preset point to be predicted in the target area based on each space variable parameter machine learning model corresponding to each subarea to obtain a target interpolation result and an uncertainty analysis result.

Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, where the computer program product includes a computer program, where the computer program can be stored on a non-transitory computer readable storage medium, where the computer program when executed by a processor can perform a geospatial full coverage data generation method based on spatially varying parameter machine learning provided by the above methods, where the method includes:

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the geospatial full coverage data generation method based on spatially varying parameter machine learning provided by the methods above, the method comprising:

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A geospatial full coverage data generation method based on spatially varying parameter machine learning, comprising:

Interpolation prediction is respectively carried out on target variables of all preset points to be predicted in the target area based on all space variable parameter machine learning models corresponding to all the subareas respectively, and a target interpolation result and an uncertainty analysis result are obtained; after each partition, based on various auxiliary variables and target variables in each observation site in the target area, calculating the spatial hierarchical heterogeneity of the relationship between various auxiliary variables and the target variables in the current partition state, including:

after each time of partitioning the target area, respectively calculating the spatial layering heterogeneity of the relationship between each type of auxiliary variable and the target variable in the current partitioning state based on each bivariate local autocorrelation coefficient; calculating, for each of the observation sites in the target area, bivariate local spatial autocorrelation coefficients between the target variable in the observation site and each of the auxiliary variables, respectively, including:

For each observation site in the target area, calculating a bivariate local space autocorrelation coefficient corresponding to the observation site based on auxiliary variables of the observation site and target variables of a first preset number of adjacent observation sites of the observation site; after each partitioning of the target area, calculating spatial hierarchical heterogeneity of relationships between the auxiliary variables and the target variables in the current partitioning state based on the bivariate local autocorrelation coefficients respectively, wherein the spatial hierarchical heterogeneity comprises:

Respectively calculating the spatial layering heterogeneity of the relations between the auxiliary variables and the target variables of the target region in the current partition state based on the local autocorrelation index variance value and the global autocorrelation index variance value; the determining the target partition state based on the spatial hierarchical heterogeneity of the relationships between each class of auxiliary variables and the target variables after each partition comprises:

2. The geospatial full coverage data generation method based on spatially varying parameter machine learning according to claim 1, wherein each spatially varying parameter machine learning model corresponding to each sub-region is obtained based on training by:

3. The geospatial full coverage data generating method based on space variant parameter machine learning according to claim 1, wherein the interpolating and predicting the target variable of each preset point to be predicted in the target area based on each space variant parameter machine learning model corresponding to each subregion respectively, to obtain a target interpolation result and an uncertainty analysis result, includes:

4. A geospatial full coverage data generation apparatus based on spatially varying parameter machine learning, comprising:

the prediction module is used for respectively carrying out interpolation prediction on target variables of each preset point to be predicted in the target area based on each space variable parameter machine learning model corresponding to each subarea to obtain a target interpolation result and an uncertainty analysis result;

The partition module is specifically configured to:

After each time of partitioning the target area, calculating the spatial layering heterogeneity of the relationship between the auxiliary variables and the target variables in the current partitioning state based on the local autocorrelation coefficients of the bivariate;

The partition module is specifically configured to:

For each observation site in the target area, calculating a bivariate local space autocorrelation coefficient corresponding to the observation site based on auxiliary variables of the observation site and target variables of a first preset number of adjacent observation sites of the observation site;

The partition module is specifically configured to:

Respectively calculating the spatial layering heterogeneity of the relations between the auxiliary variables and the target variables of the target region in the current partition state based on the local autocorrelation index variance value and the global autocorrelation index variance value;

the determining module is specifically configured to:

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the geospatial full coverage data generation method based on spatially varying parameter machine learning as claimed in any one of claims 1 to 3 when the program is executed by the processor.

6. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the geospatial full coverage data generation method based on spatially varying parameter machine learning as claimed in any one of claims 1 to 3.