CN115314133A

CN115314133A - Path loss data enhancement method and system based on matrix completion

Info

Publication number: CN115314133A
Application number: CN202211237155.8A
Authority: CN
Inventors: 温晓敏; 方胜良; 胡豪杰; 范友臣; 程东航; 徐照菁; 马昭; 王孟涛; 刘涵; 吴曙光
Original assignee: Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Current assignee: Peoples Liberation Army Strategic Support Force Aerospace Engineering University
Priority date: 2022-10-11
Filing date: 2022-10-11
Publication date: 2022-11-08
Anticipated expiration: 2042-10-11
Also published as: CN115314133B

Abstract

The invention relates to the technical field of information and communication engineering, and particularly discloses a path loss data enhancement method and system based on matrix completion, which comprises the steps of constructing a characteristic attribute data set based on characteristic attribute data of a transmitting base station and monitoring stations, and acquiring path loss measured values of a small number of monitoring stations; inputting the data in the characteristic attribute data set into an empirical prediction model for calculation, thereby obtaining an empirical model path loss prediction value of the monitoring station; constructing a path loss sparse low-rank matrix based on the path loss predicted value of the empirical model and the path loss measured values of a small number of monitoring stations; completing the path loss sparse low-rank matrix to obtain a completed path loss matrix; combining the completed path loss matrix with data in the characteristic attribute data set to form an enhanced data training set; training the model based on the enhanced data training set; the method avoids acquiring a large amount of measurement data, and effectively expands the data set.

Description

Path loss data enhancement method and system based on matrix completion

Technical Field

The invention relates to the technical field of information and communication engineering, in particular to a path loss data enhancement method and system based on matrix completion.

Background

With the rapid development of wireless communication technology, mobile data traffic is increased explosively, which puts higher requirements on high-quality transmission of a wireless communication network; providing guidance for the design and optimization of a wireless communication system, ensuring that the wireless communication service quality reaches an acceptable level, and the modeling and prediction work of the signal propagation path loss is an essential link; radio waves propagate in radio channels of urban environments, and phenomena such as reflection, diffraction, scattering or refraction in media can be generated, and various propagation mechanisms bring great difficulty to prediction of path loss.

The electric wave propagation prediction model mainly comprises two main types, namely a traditional prediction model and a machine learning prediction model; the traditional prediction model mainly comprises an empirical model, a determination model and a semi-determination model; in recent decades, many researchers have shifted the research direction from the conventional radio wave propagation path loss prediction model to a machine learning method (machine learning prediction model) based on data driving; machine learning is a method for improving the performance of a specific task based on a large amount of data and a flexible model architecture, and can be divided into two problems of classification and regression; the prediction of the radio wave propagation path loss is a typical regression task, and a fitting relation function between the radio wave propagation path loss and the influence parameters of the radio wave propagation path loss is found by using a large amount of data through repeated iterative calculation of model parameters, so that the path loss value to be predicted can be obtained under the condition of new input parameters, and the remarkable and excellent generalization performance is achieved.

The machine learning prediction model comprises ANN-based and Non-ANN-based, and massive data used for the machine learning prediction model experiment is a primary problem in front of a plurality of researchers; two common methods for acquiring training data include measurement data and simulation data; the measured data is based on the measured data of a large amount of path loss, and the acquisition work is often difficult to realize; therefore, many studies resort to the acquisition of training data to existing computational predictive models to obtain the data.

At present, the method for generating synthetic data through an electromagnetic solver is a method with higher calculation efficiency, and a commonly used solver is a Ray Tracing (RT) solver; but using a deterministic model solver-based generated data set, where the environmental input features are all theoretical levels, such as the number of reflecting surfaces, electromagnetic parameters of the propagation medium, maximum obstacle height; the data sets on the theoretical level can only be used under specific environmental conditions, and the calculation needs great hardware support; meanwhile, researchers provide two mechanisms for expanding the training data set, the first mechanism is that original data are repeatedly used in a new scene or different frequency bands, and the second mechanism is that training sample data are generated through a traditional model based on prior information, but the training sample data have the defect that the prior information needs to be acquired through measured data; some researchers directly represent the radio wave propagation path loss distribution thermodynamic diagram of the investigated geographic area as a matrix with missing element data, and the essence is image restoration, which has the defect that the radio wave propagation influence factors cannot be well reflected.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method for enhancing path loss data based on matrix completion, in which only a small number of path loss measurement values of monitoring stations are used, thereby avoiding obtaining a large number of measurement data, effectively expanding a data set, and facilitating the improvement of model training accuracy.

It is a second object of the present invention to provide a path loss data enhancement system based on matrix completion.

The first technical scheme adopted by the invention is as follows: a path loss data enhancement method based on matrix completion comprises the following steps:

s100: constructing a characteristic attribute data set based on characteristic attribute data of the transmitting base station and the monitoring stations, and acquiring path loss measured values of a small number of monitoring stations;

s200: inputting the data in the characteristic attribute data set into an empirical prediction model for calculation so as to obtain a predicted value of the path loss of the empirical model of the monitoring station;

s300: constructing a path loss sparse low-rank matrix based on the empirical model path loss predicted value and the path loss measured values of a small number of monitoring stations;

s400: completing the path loss sparse low-rank matrix to obtain a completed path loss matrix; combining the completed path loss matrix with data in the characteristic attribute data set to form an enhanced data training set;

s500: training a machine learning prediction model based on the enhanced data training set.

Preferably, the characteristic attribute data of the transmitting base station in step S100 includes transmitting power, and longitude, latitude, altitude, main lobe direction of the transmitting antenna; characteristic attribute data of the monitored site include latitude, longitude, altitude, vegetation type and building altitude, density.

Preferably, the step S100 further includes performing dimension reduction preprocessing on the feature attribute data set.

Preferably, the dimension reduction preprocessing comprises the following sub-steps:

(1) Carrying out standardization preprocessing on the characteristic attributes of all dimensions in the characteristic attribute data set;

(2) Calculating a covariance matrix to analyze correlation between the feature attribute data;

(3) And screening out main features in the feature attribute data set based on the correlation among the feature attribute data, thereby realizing the dimension reduction of the feature attribute data set.

Preferably, the empirical prediction model in step S200 includes any one of a free space propagation model, an ECC-33 empirical model, an Okumura-Hata empirical model, and a COST-231 Hata empirical model.

Preferably, the path loss sparse low rank matrix in step S300 is constructed by:

and constructing a matrix based on the predicted path loss value of the empirical model and the measured path loss values of a small number of monitoring stations, wherein data in one row of the matrix is the measured path loss value, and randomly zeroing the data in the row according to the incomplete ratio of the data to form a path loss sparse low-rank matrix.

Preferably, the complementing the path loss sparse low rank matrix in step S400 includes the following sub-steps:

s410: segmenting the path loss sparse low-rank matrix according to a set truncation length to obtain a plurality of truncation matrixes;

s420: solving an approximate matrix of the plurality of truncated matrices based on a loss function;

s430: and splicing the approximate matrixes of the plurality of the truncation matrixes to obtain a completed path loss matrix.

Preferably, combining the complemented path loss matrix with data in the feature attribute dataset comprises:

and extracting all data of the first row of the complemented path loss matrix, and combining the data with the characteristic attribute data of the transmitting base station and the monitored station to obtain an enhanced data training set.

Preferably, the machine learning prediction model in step S500 includes one or more of a BP neural network model, an SVM regression prediction model and a decision tree regression prediction model.

The second technical scheme adopted by the invention is as follows: a path loss data enhancement system based on matrix completion comprises an acquisition module, a calculation module, a path loss sparse low-rank matrix construction module, a completion module and a training module;

the acquisition module is used for constructing a characteristic attribute data set based on the characteristic attribute data of the transmitting base station and the monitoring stations and acquiring path loss measured values of a small number of monitoring stations;

the calculation module is used for inputting the data in the characteristic attribute data set into an empirical prediction model for calculation so as to obtain an empirical model path loss prediction value of the monitoring station;

the path loss sparse low-rank matrix construction module is used for constructing a path loss sparse low-rank matrix based on the empirical model path loss predicted value and path loss measured values of a small number of monitoring stations;

the completion module is used for completing the path loss sparse low-rank matrix to obtain a completed path loss matrix; combining the completed path loss matrix with data in the characteristic attribute data set to form an enhanced data training set;

the training module is to train a machine learning prediction model based on the enhanced data training set.

The beneficial effects of the above technical scheme are that:

(1) Aiming at the problem that actual measurement data of path loss is difficult to obtain, the invention discloses a path loss data enhancement method based on matrix completion, which comprises the steps of firstly determining a monitoring station needing to obtain the path loss and relevant characteristic attributes of the monitoring station according to topographic and geomorphic information recorded in map data, and simultaneously obtaining parameters of a transmitting base station; calculating according to a path loss prediction empirical model of various common urban environments to obtain a plurality of sets of path loss data; establishing a sparse low-rank matrix by combining a small number of obtained actual measurement path loss values in the monitoring station; and finally, completing the numerical value completion of the monitoring station without the measured data by a matrix decomposition completion method, and realizing the effective expansion of the training data of the machine learning model.

(2) The path loss data enhancement method based on matrix completion only uses path loss measured values of a small number of monitoring stations, avoids the difficulty in obtaining a large amount of measured data, effectively expands a data set, and is beneficial to improving the accuracy of model training.

(3) The method constructs a sparse low-rank matrix by combining a small amount of actually measured path loss values through the existing urban environment path loss empirical model, and fulfills the aim of data enhancement through segmented matrix completion.

Drawings

Fig. 1 is a schematic flowchart of a method for enhancing path loss data based on matrix completion according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the comparison of different prediction model data and measured data provided by an embodiment of the present invention;

fig. 3 is a schematic diagram of a path loss value missing matrix construction and a piecewise completion provided in an embodiment of the present invention, where fig. 3 (a) is a schematic diagram of a path loss sparse low-rank matrix, and fig. 3 (b) is a schematic diagram of a sparse matrix obtained by piecewise truncation of the path loss sparse low-rank matrix;

fig. 4 is a schematic diagram of a measured city scene according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a distribution of measured path loss values of the transmitting base station 1 according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a distribution of measured path loss values of the transmitting base station 2 according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a distribution of measured path loss values of the transmitting base station 3 according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a distribution of measured path loss values of the transmitting base station 4 according to an embodiment of the present invention;

fig. 9 is a distribution diagram of measured path loss values of a small number of monitored stations of the transmitting base station 1 according to an embodiment of the present invention;

fig. 10 is a distribution diagram of measured path loss values of a small number of monitored sites of the transmitting base station 2 according to an embodiment of the present invention;

fig. 11 is a distribution diagram of measured path loss values of a small number of monitored sites of the transmitting base station 3 according to an embodiment of the present invention;

fig. 12 is a distribution diagram of measured path loss values of a small number of monitored sites of the transmitting base station 4 according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of a path loss data enhancement system based on matrix completion according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in further detail with reference to the drawings and examples. The following detailed description of the embodiments and the accompanying drawings are provided to illustrate the principles of the invention, but are not intended to limit the scope of the invention, i.e., the invention is not limited to the preferred embodiments described, but the scope of the invention is defined by the claims.

In the description of the present invention, it is to be noted that, unless otherwise specified, "a plurality" means two or more; the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance; the specific meaning of the above terms in the present invention can be understood as appropriate to those of ordinary skill in the art.

Example one

As shown in fig. 1, the method for enhancing path loss data based on matrix completion provided by the present invention mainly includes the following steps:

s100: constructing a characteristic attribute data set based on characteristic attribute data of a transmitting base station and monitoring stations (receiving stations), and acquiring path loss measured values of a small number of monitoring stations;

characteristic attribute data (parameters) of a known transmitting base station include transmission power and transmit antenna longitude, latitude, altitude, main lobe direction, etc.; acquiring characteristic attribute data disclosed by a monitoring station through map data containing elevation, land coverage, buildings and vegetation elements, wherein the characteristic attribute data of the monitoring station comprises latitude, longitude, altitude, vegetation types, building altitude, density and the like; and acquiring path loss measured values of a small number of monitoring stations through a fixed frequency spectrum monitoring station or a mobile vehicle-mounted monitoring device.

Further, in an embodiment, the method further includes performing dimensionality reduction preprocessing on the feature attribute data set, so that an enhanced data set formed by combining the complementary path loss matrix at a later stage has a low dimensionality, and therefore the training calculation efficiency of the path loss machine learning prediction model is improved.

The feature space is unstable due to the correlation (multiple collinearity) of feature attribute data, and meanwhile, a large-dimension data set can generate a series of problems of low calculation efficiency, large storage pressure, weak generalization capability and the like; the above problems can be solved by adopting feature extraction to reduce the dimension; all three methods, PCA, LDA and KernelPCA, can transform the feature attribute data set into a new feature subset with a lower dimension; PCA and LDA are common feature dimension reduction methods, and correspond to unsupervised data and supervised data respectively; and KernelPCA corresponds to non-linearly separable data; since data enhancement only requires retention of the best describing features (principal components) rather than classification features, we use PCA to perform feature dimensionality reduction on feature attribute data.

PCA (Principal Component Analysis) is a data dimension reduction method for continuous attribute features, an orthogonal transformation of original data is constructed, the correlation of the original spatial data is removed by a base of a new space, most information in the original data can be explained only by using a few new feature variables, namely, so-called Principal components, so that the dimension reduction of high-dimensional data is achieved, data redundant information is reduced, and the efficiency of a model training algorithm is enhanced.

The dimension reduction preprocessing of the feature attribute data set comprises the following sub-steps:

(1) Carrying out standardized preprocessing on the characteristic attributes of all dimensions in the characteristic attribute data set;

assuming a certain area of monitoringnPath loss data acquisition is carried out by each monitoring station, and each monitoring station is provided withmCharacteristic attribute data, thennOf individual monitoring stationsmIndividual feature attribute data constructable matrixXAnd is recorded as:

in the formula (I), the compound is shown in the specification,x _i is a firstiOf individual monitoring stationsmIs a characteristic ofA vector of sexual values, written as:

wherein the content of the first and second substances,

is a firstiThe first of each monitored sitejThe value of the attribute of the individual features,i=1,2,…,n，j=1,2,…,m，Ttranspose the symbol for the matrix.

Based on matrixXAnd calculating the mean value and the standard deviation of the estimated sample of each characteristic attribute data.

(1) To pairnFirst of each monitored site

Averaging of individual feature attributes

And calculating to obtain:

(2) to pairnThe first of each monitored site

Standard deviation of characteristic attribute

And calculating to obtain:

normalized feature matrix

Recording as follows:

in the formula (I), the compound is shown in the specification,

is a normalized feature matrix;mthe number of characteristic attribute data;nthe number of stations to be monitored;

is a firstiOf individual monitored sitesmA normalized vector of individual feature attribute values, wherein,

in the formula (I), the compound is shown in the specification,

is a firstiThe first of each monitored sitejThe normalized value of the individual characteristic attributes,

，

in the formula (I), the compound is shown in the specification,

is as followsiFirst of each monitored sitejThe value of the individual characteristic attributes is,i=1,2,…,n，j=1,2,…,m，

is composed ofnFirst of each monitored sitejA mean value of the individual feature attributes;

is composed ofnThe first of each monitored sitejStandard deviation of individual characteristic attributes.

the covariance matrix is represented by the following formula:

in the formula (I), the compound is shown in the specification,S _m×m is a covariance matrix;

is as followspCharacteristic attribute andqthe covariance between the attributes of the individual features,p,q=1,2,…,m，mthe number of characteristic attribute data;

calculated from the following equation:

in the formula (I), the compound is shown in the specification,

is a firstpCharacteristic attribute andqcovariance between individual feature attributes;nnumber of sites monitored;

、

are respectively the firstkFirst of each monitored sitepIndividual characteristic attribute andqa normalized value of the individual characteristic attribute;

、

are respectively asnThe first of each monitored sitepIndividual characteristic attribute andqmean of individual feature attributes.

Covariance

The larger, characterizepCharacteristic attribute andqthe larger the dispersion among the characteristic attributes is, the richer the information contained by the characteristic attributes is; from the perspective of linear coordinate projection, the principal component is to project a data set of a plurality of correlation features onto a coordinate system with fewer correlation features, and the eigenvector of the covariance matrix is actually the coordinate direction with the smallest correlation and the largest variance.

(3) Screening out main features in the feature attribute data sets based on the correlation among the feature attribute data to form main feature sets, thereby realizing the dimension reduction of the feature attribute data sets;

hypothesis covariance matrixS _m×m Is calculated as lambda ₁ ,λ ₂ ,…,λ _i ,…,λ _m And satisfy lambda ₁ >λ ₂ >…λ _i >…>λ _m Setting a threshold valuethCalculating the cumulative contribution rate of each feature (component) if the cumulative contribution rate reaches the firstkCumulative contribution rate of individual features (components)

Greater than or equal to a threshold valuethIf so, discard the firstkInsignificant features (components) after the individual features; that is to say thatmFeature attribute dataset dropping of dimension to

A master set of dimensions to achieve dimensionality reduction.

The information percentage of each characteristic attribute, namely the contribution rate of each characteristic attribute in the representative characteristic attribute data set; initial feature attribute data setkContribution rate of individual feature attributes

Calculated by the following formula:

in the formula (I), the compound is shown in the specification,

for feature attribute data setkThe contribution rate of individual feature attributes;

is as followskThe characteristic value corresponding to each characteristic vector;mthe number of characteristic attribute data;

is as follows

The corresponding eigenvalue of each eigenvector.

Front sidekCumulative contribution rate of individual features

Calculated by the following formula:

in the formula (I), the compound is shown in the specification,

is frontkCumulative contribution rate of individual feature attributes;mthe number of characteristic attribute data;

is as follows

The corresponding eigenvalue of each eigenvector.

calculating according to an empirical prediction model of the radio wave propagation path loss in the urban environment, so as to obtain a plurality of sets of empirical prediction data of the path loss; four empirical prediction models are used, including a free space propagation model, an ECC-33 empirical model, an Okumura-Hata empirical model, and a COST-231 Hata empirical model.

The free space propagation model is represented by the following formula:

in the formula (I), the compound is shown in the specification,PL ₁ (dB) For the predicted path loss values calculated by the free space propagation model (unit: dB) of the received signals,dthe distance (unit: km) from a receiving point (monitoring station) to a transmitting base station;fis the transmitting base station carrier frequency (unit: MHz).

The ECC-33 empirical model is represented by the following formula:

in the formula (I), the compound is shown in the specification,PL ₂ (dB) For the path loss prediction calculated by the ECC-33 empirical model (unit: dB);A _FS is free space attenuation;A _BM is the fundamental path loss median;G _B a correction factor for antenna height;G _R is the receiver gain factor;dthe distance (unit: km) from the transmitting base station for the receiving point;ffor transmitting the base station carrier frequency (unit: MHz);h _b is the transmitting base station antenna height (unit: m);h _ue is the effective height (unit: m) of the receiving antenna.

The Okumura-Hata empirical model (GSM 900 MHz) is represented by the following equation:

in the formula (I), the compound is shown in the specification,

the predicted value of the path loss (unit is dB) is calculated by an Okumura-Hata empirical model;dthe distance (unit: km) from the transmitting base station for the receiving point;ffor transmitting base station carrier frequency (unit: MHz);h _b is the transmitting base station antenna height (unit: m);h _ue is the effective height (unit: m) of the receiving antenna;ah _m correcting the parameters for the receiving antenna;C _m the value of the clutter correction parameter is 3dB.

The COST-231 Hata empirical model is represented by the following formula:

in the formula (I), the compound is shown in the specification,

is a predicted value (unit: dB) of the path loss calculated by a COST-231 Hata empirical model;dthe distance (unit: km) from the transmitting base station for the receiving point;ffor transmitting the base station carrier frequency (unit: MHz);h _b is the transmitting base station antenna height (unit: m);h _ue is the effective height (unit: m) of the receiving antenna;ah _m correcting the parameters for the receiving antenna;C _m the value of the clutter correction parameter is 3dB.

The Okumura-Hata empirical model and the COST-231 Hata empirical model are calculation formulas of different frequency bands of the same model.

S300: constructing a path loss sparse low-rank matrix based on the path loss predicted value of the empirical model and the path loss measured values of a small number of monitoring stations;

aiming at a path loss prediction task based on a machine learning model, firstly constructing a sparse low-rank matrix containing a missing path loss value; when a path loss sparse low-rank matrix is constructed, the concept of 'collaborative filtering' of a prediction scoring/recommending system is used for reference, and the path loss value probability obtained by the same prediction model is approximate under the condition that similar monitoring receiving points are similar in characteristic attributes; FIG. 2 is a graph of path loss values obtained at more than 1000 monitoring sites (monitoring points) by four methods (actual measurement, free propagation model, ECC-33 empirical model, COST231-Hata empirical model); as can be seen from fig. 2, this assumption is consistent with our observations of nature.

The method for constructing the path loss sparse low-rank matrix based on the path loss measured values of a small number of monitored sites and the path loss predicted values of the empirical model comprises the following steps:

based on the above assumptions and data analysis, byzObtained by a method (comprising actual measurement results and empirical model calculation results)nPath loss values for individual monitored sites (i.e., empirical model path loss predicted values and paths for a small number of monitored sites)Loss measured value), constructing a matrix based on the path loss values obtained by the method, wherein data in one row of the matrix is the path loss measured value, and randomly zeroing the data in the row according to a self-defined incomplete data proportion to form a path loss sparse low-rank matrix; the incomplete data ratio is: the ratio of the number of the small number of monitoring stations used for obtaining the actual measurement value of the path loss to the number of the total monitoring stations is selected, for example, the total monitoring stations is 1000, and only 50 monitoring stations are selected as the small number of monitoring stations for obtaining the actual measurement value of the path loss, so that the data incomplete ratio is 50:1000, parts by weight;

as shown in FIG. 3 (a), the present invention constructs the path loss values in the characteristic attribute data set as a matrix

Matrix of

Will be measured as the first behavioral path loss

The elements of the first row without measured data are set to be null to form a path loss sparse low-rank matrix

，A∈

；zIs for example 4, i.e. a matrix

The path loss predicted values of the 2,3 and 4 rows are obtained by correspondingly calculating three empirical model free space propagation models, ECC-33 and Hata; i.e. each column of the matrix represents a monitored site of the investigation region

Each row represents four path loss values of the monitored station, respectivelyThe method comprises the following steps: the first row represents the path loss measured value (the blank position represents the path loss measured value of the monitored station), the second row represents the path loss predicted value calculated by the ECC-33 empirical model, the third row represents the path loss predicted value calculated by the Hata empirical model, and the fourth row represents the path loss predicted value calculated by the free space propagation model.

S400: completing the path loss sparse low-rank matrix to form a completed path loss matrix; combining the completed path loss matrix with data in the characteristic attribute data set to form an enhanced data training set;

(1) Completing the path loss sparse low-rank matrix comprises the following substeps:

the present invention fills the investigation point without the measured value of the path loss by matrix decomposition, and in order to expand the high quality data, the present invention arranges the path loss sparse low rank matrix in fig. 3 (a) for each path lossDThe monitoring stations perform segment completion operation for one group (i.e. according to the set truncation length), as shown in fig. 3 (b).

If one iszLine ofnNon-zero matrix of columnsA，A∈

Without missing values, the matrix can be expressed as a product operation of two matrices, i.e. a full rank decomposition of the matrix is performed:

in the formula, a non-zero matrixARank of (1)A)=h，h<min{z,n}；U、V isRank ofhTwo real matrices of (i.e. a)

，

；TA matrix transpose symbol;Ris a real number set.

If matrixAThe element item in (1) has missing value, matrixAThe decomposition cannot be directly carried out according to the above formula, and in this case, an approximate solution, namely a gradient descent method, can be used to complete the decompositionU，VTo obtain a completion matrix

As matrices containing missing elementsAAn approximation matrix of (a).

To be able to fit the true path loss values as accurately as possible using matrix decomposition, we will construct sparse low-rank matrices

According to the width of the truncation windowDCutting off operation is carried out on each monitoring station, and the cutting-off operation are jointly generatedWPer truncation (truncate) path loss sparse matrix

，

=1,2,…,W，k=DOrk=

Wherein, in the process,

is composed ofnThe individual monitoring station passesDThe number of the rest monitoring stations after the integration is obtained; if the number of all monitored sitesnIs thatDInteger multiples of, i.e.n%DIf not =0, thenn=W∙D(ii) a If it is notn%DNot equal to 0, thenn=W∙D+

Wherein, in the step (A),

<Dthe symbol "%" represents the remainder.

Specifically, the method comprises the following steps:

in the formula (I), the compound is shown in the specification,

in order to be a path-loss sparse matrix,

indicating a splicing operation.

defining a loss function

：

The goal of solving the above equation is to find the energy to enable

Minimum sizeU、VA matrix; in the formula (I), the compound is shown in the specification,

represents a 2 norm;

is the first after cutting

A path loss sparse matrix;U、Vthe rank to be solved for the loss function ishTwo real number matrices, subscripts

、

The rows and the columns of the matrix are represented separately,

is the first in the matrix

The first of the column

The value of each of the elements is,

is the first in the matrix

First of a line

A value of each element;Ttranspose symbols for the matrix;βa penalty term coefficient that is a loss function;

biasing for monitoring stations;

an offset for the path loss calculation method;

is a matrix

Non-null elements of (1);

as a vector

To (1) a

A value of each element;

is a vector

To (1) a

A value of each element;

is the global offset of the matrix, wherein,

in the formula (I), the compound is shown in the specification,Pis a matrix

Number of non-empty elements in (1).

Order to

According to the gradient descent formula, then

,

The update formula is as follows:

in the formula (I), the compound is shown in the specification,

、

is a matrixUTo middle

First of a line

And

the value of each of the elements is,

、

is a matrixVIn (1)

To the first of the column

And

a value of each element;

to update the coefficients;βa penalty term coefficient that is a loss function;

。

s430: splicing the approximate matrixes of the truncation matrixes to obtain a completed path loss matrix;

our overall goal is to find a sparse low rank matrix

Approximation matrix of

Through the truncation operation, we convert the problem into solving each truncation matrix

Approximation matrix of (2)

Then spliced together

Obtaining a path loss sparse low rank matrix

Approximation matrix of

。

Namely, it is

Completing the matrix for the path loss;

1,2, \ 8230

,…WA complement matrix of truncated matrices.

(2) Combining the completed path loss matrix with data in the feature attribute dataset, comprising:

extracting a completed path loss matrix

All data of the first row

And combining the extracted data with the characteristic attribute data of the transmitting base station and the monitored station (receiving station) to obtain an enhanced data training set.

According to the designed sparse low-rank matrix and the splitting operation in the matrix completion process, firstly, an algorithm pseudo code is described as follows:

the need for machine learning for data is great, but the process of obtaining path loss data by field measurement techniques is still expensive; in particular, in some sensitive areas, the time required from approval of a radio data collection use application to raw data measurement is excessive; therefore, the method completes the completion of the measured data value of the monitoring station in the sparse low-rank matrix through a matrix decomposition completion method, realizes the effective expansion of the training data of the machine learning model, effectively solves the contradiction between large data demand and difficult data acquisition, and has important significance for improving the accuracy of the machine learning algorithm.

S500: training a machine learning prediction model based on the enhanced data training set; the machine learning prediction model comprises one or more of a BP neural network model, a SVM regression prediction model and a decision tree regression prediction model.

The invention discloses a path loss data enhancement method based on matrix completion, aiming at the problem that the actual measurement data of the path loss is difficult to obtain, the method comprises the steps of firstly determining a monitoring station to obtain the path loss and relevant characteristic attribute data (including latitude, longitude, altitude, vegetation type, altitude density of buildings and the like) of the monitoring station according to topographic and geomorphic information recorded in map data, and simultaneously obtaining parameters of a transmitting base station; calculating to obtain a plurality of sets of path loss predicted values according to a common path loss prediction empirical model of various urban environments; the path loss predicted value is combined with a small number of obtained path loss measured values in the monitoring station to establish a sparse low-rank matrix; and finally, completing numerical completion of the monitoring station without the measured data by a matrix decomposition completion method, and realizing effective expansion and enhancement of the training data of the machine learning model.

The beneficial effects of the technical scheme of the invention are explained by combining simulation experiments as follows:

in order to verify the effectiveness of the invention, the characteristic dimensions shown in table 1 are adopted for the actually measured monitoring station and the transmitting base station;

TABLE 1 characteristic Attribute data

The path loss value acquisition place is Fortaleza-CE city, a test scene shown in a figure 4 is obtained by using longitude and latitude coordinates, the signal intensity of 2327 street points in the city area is received and measured by using Agilent E6474A, and the measured values are generated by respectively working of four different transmitting base stations; fig. 5-8 respectively show the distribution of path loss values actually measured in the area when four different base stations operate, fig. 5-8 show the attenuation distribution of path loss at different base station positions, and the arrow direction indicates the maximum radiation direction of the antenna of the transmitting base station in the cell; dark color represents a large path loss value, and light color represents a small path loss value; the transmitting frequency of the transmitting base station is: 853.71MHz.

As shown in fig. 9-12, 95% of the original monitoring site data is removed at random to simulate a small amount of monitoring site data obtained by actual measurement; for example, 95% of 1000 monitoring sites are randomly removed, and only a small amount of monitoring site data is measured.

The empirical model reflects the characteristic attributes between the monitoring station (receiving point) and the transmitting base station, and the path loss value calculated by the empirical model is combined to construct a sparse low-rank matrix, so that the characteristic attributes of the path loss value of the missing receiving point can be well supplemented; the constructed sparse low-rank matrix is reasonable, effective and reliable; the limited measured data is subjected to matrix completion to generate more data, namely, an enhanced data training set is formed, the number and diversity of training samples are increased, the robustness of a machine learning model is improved, and the over-fitting problem is avoided; wherein the content of the first and second substances,

the limited measured data training set is as follows: on the basis of the training set of the measured data, randomly discarding part of the measured data in proportion, and using the rest data to simulate the limited measured data which can only be collected due to various limiting factors, which is called as the training set of the limited measured data;

the measured data training set is as follows: dividing the measured data set into a training set and a test set according to a proportion, wherein the training set is called a measured data training set;

the enhanced data training set is: and (3) expanding the data training set according to a matrix decomposition completion method.

Example two

Fig. 13 is a system for enhancing path loss data based on matrix completion according to an embodiment of the present invention, which includes an obtaining module, a calculating module, a path loss sparse low-rank matrix constructing module, a completion module, and a training module;

the training module is configured to train a machine learning prediction model based on the enhanced data training set.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A path loss data enhancement method based on matrix completion is characterized by comprising the following steps:

s200: inputting the data in the characteristic attribute data set into an empirical prediction model for calculation so as to obtain an empirical model path loss prediction value of the monitored site;

2. The method for enhancing pathloss data according to claim 1, wherein the characteristic attribute data of the transmitting base station in step S100 includes transmitting power and transmitting antenna longitude, latitude, altitude, main lobe direction; characteristic attribute data of the monitored site include latitude, longitude, altitude, vegetation type and building altitude, density.

3. The method for enhancing pathloss data according to claim 1, wherein the step S100 further comprises performing a dimensionality reduction preprocessing on the feature attribute data set.

4. The method of path loss data enhancement according to claim 3, wherein the dimensionality reduction pre-processing comprises the sub-steps of:

(1) Carrying out standardization preprocessing on the feature attributes of all dimensions in the feature attribute data set;

5. The method of claim 1, wherein the empirical prediction model in step S200 comprises any one of a free space propagation model, an ECC-33 empirical model, an Okumura-Hata empirical model, and a COST-231 Hata empirical model.

6. The method of enhancing pathloss data according to claim 1, wherein the pathloss sparse low rank matrix in step S300 is constructed by:

and constructing a matrix based on the predicted path loss value of the empirical model and the measured path loss values of a small number of monitoring stations, wherein data in one row of the matrix is the measured path loss value, and randomly setting zero for the data in the row according to the incomplete ratio of the data to form a path loss sparse low-rank matrix.

7. The method for enhancing pathloss data according to claim 1, wherein the step S400 of complementing the pathloss sparse low rank matrix comprises the sub-steps of:

8. The method of path loss data enhancement according to claim 7, wherein combining the complemented path loss matrix with data in the feature attribute dataset comprises:

and extracting all data in the first row of the completed path loss matrix, and combining the data with the characteristic attribute data of the transmitting base station and the monitoring station to obtain an enhanced data training set.

9. The method of claim 1, wherein the machine learning prediction model in step S500 comprises one or more of a BP neural network model, a SVM regression prediction model, and a decision tree regression prediction model.

10. A path loss data enhancement system based on matrix completion is characterized by comprising an acquisition module, a calculation module, a path loss sparse low-rank matrix construction module, a completion module and a training module;