CN106503714B

CN106503714B - Method for identifying city functional area based on point of interest data

Info

Publication number: CN106503714B
Application number: CN201610887062.8A
Authority: CN
Inventors: 蒋云良; 董墨萱; 刘勇
Original assignee: Huzhou University
Current assignee: Huzhou University
Priority date: 2016-10-11
Filing date: 2016-10-11
Publication date: 2020-01-03
Anticipated expiration: 2036-10-11
Also published as: CN106503714A

Abstract

The invention provides a method for identifying urban functional areas based on point of interest data, which is realized by the following steps: step one, map segmentation: rasterizing the map; step two, searching a base station to which the interest point belongs: finding a base station closest to the interest point; step three, calculating the distribution characteristics of the interest points of each base station; step four, clustering: carrying out fuzzy clustering analysis on the matrix in the third step to obtain different clustering results; step five, identifying the urban functional area: and C, calculating the distribution overlapping rate of the interest points with the category characteristics and the different clustering results obtained in the step four on the map, and identifying the clustered base stations. The method for identifying the urban functional areas according to the interest point data can identify the functions of the urban areas no matter the areas are tourist areas and working area residential areas, the result is basically consistent with the reality, and the effect can be improved in a more summarized manner.

Description

Method for identifying city functional area based on point of interest data

Technical Field

The invention relates to the field of big data analysis, in particular to a method for identifying a city functional area based on interest point data.

Background

With the rapid development of economy, a series of urban problems follow, and particularly, the urban problems are serious for some provincial cities or metropolis. As a result of urbanization in developing countries, "urban diseases" are manifested as traffic congestion, housing shortage, water shortage, energy shortage, environmental deterioration, employment difficulty, etc., which cause burdens on cities, even restrict the development of cities, and are likely to cause physical and mental diseases of citizens.

In recent years, some experts and scholars use various heterogeneous big data to perform urban calculation, so as to solve the problems caused by urbanization. Urban computing is a cross subject, and is a new field in computer science, which is integrated with the subjects of urban planning, traffic, energy, environment, sociology, economy and the like, with the city as a background. More specifically, urban computing addresses the challenges facing cities (e.g., environmental degradation, traffic congestion, increased energy consumption, planning lags, etc.) by constantly acquiring, integrating, and analyzing a variety of heterogeneous large data in cities. Among them, city planning is one of the applications mainly involved in city computing. The premise condition for planning the city is to know the city and the distribution condition of each functional area of the city. The urban functional area refers to an area with the land use function, the use intensity, the land use direction and the reference land price being substantially consistent, and the intensive use degree and the use potential of the area are basically the same, such as a cultural and educational area, a business area, a residential area and the like.

At present, scholars at home and abroad mainly use mobile phone data, floating car data, POI data and the like for the research of urban functional areas. Among them, POI data is widely used in the discovery of urban functional areas. POI data, collectively referred to as Point of interest data. In the GIS system, one piece of POI data may be one cell, one store, one bus station, and the like. One piece of POI data comprises parameters such as name, longitude and latitude, detailed address, POI category, contact phone and the like. In recent years, the research related to the discovery of urban functional areas by POI data mainly comprises the following steps: yuan et al have proposed in their research a DPoF framework (i.e., partitions Regions of Difference Functions) constructed using taxi GPS trajectory data and regional POI data; the Du run et al uses the topic class with the largest number of POIs as the topic of the cell to merge the adjacent cells when solving the stopping point of the irregularly switched mobile phone; in the research of the flying, public transportation IC card swiping card data and POI data are used for constructing an urban functional area identification model (DZoF).

And the position information of the mobile phone base station is often combined with the Voronoi Thiessen polygon to be used for dividing the city basic unit. The research related to the mobile phone base station division research area mainly comprises the following steps: jameson l.toole et al, when using dynamic data generated by a mobile phone user to identify land use and dynamic population relations, use location information of a base station to perform area division on a map; the V i cto Soto and the EnriueFria-Mart i z propose that the position information of the base station is also used for carrying out regional division on the map when the technology for automatically identifying and dividing the land use condition is used by using the information generated by the mobile phone base station network.

In addition, POI data includes the type comprehensively, relates to each aspect, and snatchs very conveniently, and some other data often are difficult to obtain. Currently, the mobile phone base stations of three operators cover basically the whole China. Moreover, for better serving the masses, the base stations of the operators are set up according to the population density and urban planning. That is, in a densely populated, high-rise area, the base stations are also arranged relatively densely, while in a relatively open area, the number of base stations is correspondingly reduced.

Disclosure of Invention

The invention aims to solve the technical problem of providing a method for identifying urban functional areas based on interest point data, which can identify the functions of all areas of an urban by using the interest point data. Therefore, the invention provides the following technical scheme:

a method for identifying a city functional area based on point of interest data comprises the following steps:

step one, map segmentation: rasterizing the map, and numbering all grids; dividing the map according to the position of the mobile phone base station, calculating the distance between each grid and the base station, and specifying that the grid belongs to the base station closest to the grid, so as to obtain a grid number list closest to each base station and a grid number matrix G occupied by each base station;

step two, searching a base station to which the interest point belongs: finding a base station closest to the interest point, and judging that the interest point belongs to the base station to obtain all interest point lists belonging to each base station;

step three, calculating the distribution characteristics of the interest points of each base station: classifying and counting the interest points of each base station according to the parameter of the interest point category in the data of the interest point list of each base station, namely respectively counting the number of the interest points of different categories of each base station to obtain an interest point category distribution matrix D of each base station; combining the interest point category distribution matrix D with the occupied grid number matrix G, and processing the interest point category distribution matrix D by adopting a normalization method to obtain a matrix finally used for analysis, wherein the matrix is named as Y;

step four, clustering: carrying out fuzzy clustering analysis on the matrix Y in the third step to obtain different clustering results;

step five, identifying the urban functional area: and C, calculating the distribution overlapping rate of the interest points with the category characteristics and the different clustering results obtained in the step four on the map, and identifying the clustered base stations.

On the basis of the technical scheme, the invention can also adopt the following further technical scheme:

in the third step, the normalization processing method is as follows:

respectively carrying out normalization processing on the interest point category distribution matrix D and the occupied grid number matrix G by using a formula (1), normalizing the two matrixes to be in an interval of [0,1], and combining the normalization results of the two matrixes by using a formula (2),

Y＝A·e^-X (2)

in the formula (1), { x_iIs the sample set, x_iFor all sample components of the sample set, x_maxIs the maximum value, x, of each component of all samples in the sample set_minThe minimum value of each component of all samples in the sample set is obtained;

in formula (2), Y is the matrix ultimately used for analysis, with dimensions n × m; a is a matrix obtained by normalizing an interest point category distribution matrix D according to a formula (1), and the dimension is n multiplied by m; x is a matrix obtained by normalizing the occupied grid number matrix G according to the formula (1), and the dimension is 1 multiplied by n; n is the number of base stations and m is the number of interest point categories.

In the fourth step, the fuzzy clustering analysis adopts a C mean value clustering algorithm to divide all vectors into C clusters, and a clustering center of each cluster is obtained, so that the sum of variances in the clusters is minimum;

and clustering by using a C-means fuzzy clustering algorithm to obtain a probability list that the base station i belongs to different clusters, then extracting the class to which the maximum value of the base station i in various probabilities belongs, defining the class as the class to which the base station i belongs, and obtaining a list of the classes to which the base stations belong, wherein the list is a clustering result.

Step five, calculating the overlapping rate of the distribution of the interest point with the interest point category of s and the base station with the clustering category of n on the map, and inputting: "a grid list where the interest points of which the interest point category is" s "are located; and clustering a grid number list covered by the base station with the category of 'n' to obtain the overlapping rate.

In the fifth step, a specific method for calculating the overlapping rate of the distribution of the interest point with the interest point category of s and the base station with the cluster category of n on the map is as follows:

step1, finding out the grid numbers of the points of interest with the type of the points of interest as s according to the longitude and latitude of each point of interest;

step2, amplifying the area according to the characteristic of s, namely amplifying the area to a square area in four directions of south, east and north by taking the grid number obtained in Step1 as the center to obtain all the grid numbers in the amplified area;

step3, counting all non-repeated grid numbers obtained in Step2, and marking the set as S;

step4, finding the grid number covered by the clustering type N according to the base station number with the clustering type N and the grid number list covered by each base station, and recording the set as N;

step5, calculating a grid overlapping rate (overlapRatio), namely the overlapping rate of a grid number set S with an interest point category of S and a grid number set N covered by a cluster category of N according to the formula (3);

in the first step, a method for searching a base station closest to the grid is adopted, and the map is divided by the position of the mobile phone base station.

Due to the adoption of the technical scheme, the invention has the beneficial effects that: the method for identifying the urban functional areas according to the interest point data can identify the functions of the urban areas no matter the areas are tourist areas and working area residential areas, the result is basically consistent with the actual result, and the effect can be improved in a more summarized manner.

Description of the figures (figures are examples later, the position is not to be changed)

Fig. 1 is a research area of the hangzhou city provided by the present invention.

Fig. 2 is a division result of the base station of fig. 1.

Fig. 3 shows the clustering result of the clustering parameter C-4 provided by the present invention.

FIG. 4 is a general city plan drawing in Hangzhou City in 2001-

Fig. 5 is a projection of a clustering result of the residential area on a general planning chart of the Hangzhou city.

FIG. 6 is a projection of "tourist areas" clustering results onto a hundredth map.

Fig. 7 is a point of interest distribution thermodynamic diagram in which the "point of interest category" is "work".

Detailed Description

As shown in the figure, a method for identifying a city functional area based on point of interest data includes the following steps:

step one, map segmentation: rasterizing the map, and numbering all grids; dividing the map according to the position of the mobile phone base station, calculating the distance between each grid and the base station, and specifying that the grid belongs to the base station closest to the grid, so as to obtain a grid number list closest to each base station and a grid number matrix G occupied by each base station; and partitioning the map by using the position of the mobile phone base station by adopting a method of searching the base station closest to the grid.

the normalization processing method comprises the following steps:

Y＝A·e^-X (2)

Step four, clustering: carrying out fuzzy clustering analysis on the matrix Y in the third step to obtain different clustering results; the fuzzy clustering analysis adopts a C mean value clustering algorithm to divide all vectors into C clusters, and obtains the clustering center of each cluster so as to ensure that the sum of variance in the clusters reaches the minimum;

The specific method for calculating the overlapping rate of the distribution of the interest points with the interest point category of s and the base station with the cluster category of n on the map is as follows:

for example, "living", "working", etc.; the base station with the cluster category of "N" represents the base station list displayed as "N" in the cluster result, the magnification in Step2 is determined by the characteristics of "S", such as the interest point of the "living" category, which is generally a house, and the coverage area of a house is generally 30m × 30m — 900m²If the area of a grid is 9.6m × 11.1m, the interest point whose category is "residential" should be enlarged nine times by centering on the grid where the interest point is located, i.e., a 3 × 3 square area centering on the grid where the interest point is located.

The functional area identification method provided by the invention is verified by taking the range of a single mobile phone base station as a unit area and using the point of interest data of a certain area in Hangzhou city.

The method comprises the following steps: map partitioning

A rectangular area with the longitude of 120.040-120.410 degrees and the latitude of 30.090-30.400 degrees in Hangzhou city of Zhejiang as shown in figure 1 is selected as a research object, the area is divided into grids of 0.0001 degree multiplied by 0.0001 degree (about 9.6m multiplied by 11.1m), an urban unit area is divided by using a grid attribution calculation method according to the longitude and latitude data of a mobile phone base station of an operator in Hangzhou city, and the division result is shown in figure 2.

As described above, after reading the present disclosure, those skilled in the art can make various other modifications without creative mental labor according to the technical solutions and concepts of the present disclosure, and all of them are within the protection scope of the present disclosure.

Step two: base station for searching interest point

Baidu interest point data is widely used in China, distribution of the Baidu interest point data in urban space is basically consistent with actual conditions, accuracy and reliability of the data are guaranteed, and therefore the interest point data with the Baidu degree in a research range is extracted for research. The data comprises more than 11 tens of thousands of pieces of interest point information in a research range, including the names, the longitude and latitude, the detailed addresses, the interest point categories, the contact telephone and other parameters of the interest points. In the research, the interest point data is processed according to the 'interest point category' parameter, and the interest point data is divided into shopping, work, residence, tourism, cultural and educational education of colleges and universities, kindergartens of primary schools, middle schools, medical treatment, cultural and entertainment, life service, financial service, automobile service, stations, parking lots, gourmet food and hotels 16.

Step three: calculating the distribution characteristics of the points of interest of each base station

The base station number is denoted by i, and the category of the interest point is denoted by j, wherein i is 1, 2, 3 and …, and j is 1, 2, 3, … and 16. The obtained result is the number distribution of the interest point category j to which the base station i belongs, and the table 1 is a list of the number distribution of the interest point category j to which the base station i belongs. And finally, processing the result of the table 1 by adopting a normalization method according to the number of the grids occupied by combination to obtain a matrix Y for later analysis.

TABLE 1

Step four: clustering: and carrying out base station clustering analysis on the result matrix Y according to the clustering method provided by the invention. Taking the parameter C as 4, namely dividing the research area into 4 different functional areas, and finally visualizing the analysis result, wherein the result is shown in fig. 3.

Step five: identifying urban functional areas

And selecting three characteristic values of 'living', 'working' and 'traveling' in the 'interest point category' parameter to identify the function of the base station. According to the overlapping rate calculation method, the overlapping rate calculation is carried out on the clustering result, and the calculation result is the result of the overlapping rate calculation in the step five as shown in the table 2. When the area of the interest point is enlarged, the area of the enlarged interest point with the category of 'living' and 'working' is 30m multiplied by 30m by combining the actual situation, namely, a square area of 3 multiplied by 3 is formed by taking the grid to which each interest point belongs as the center; and the area of the interest points with the category of tourism is enlarged to be 90m multiplied by 90m, namely, a square area of 9 multiplied by 9 with the grid to which each interest point belongs as the center.

Overlap ratio (%)	Color 4	Color 3	Color 1	Color 2
					Work by	1.37	1.69	1.86	0.49
Residence	0.30	0.93	4.65	0.08
					Travel toy	0.14	0.17	0.36	0.88

TABLE 2

From the results of the overlap ratio calculation of table 2, it can be first determined that the function of the color 1 region in fig. 3 should be "residential zone" and the function of the color 2 region should be "tourist zone" because their overlap ratio is much higher than that of the other colors. Secondly, the maximum value of the calculation result of the overlap ratio of the "interest point category" being "work" is also in the color 1 region, but since the overlap ratio of "residential" and the color 1 region is much higher than that of other color regions, it is obvious that the color 1 region should be "residential zone" rather than "work zone". In addition, the overlap ratio of the "work" with the color 4 region and the color 3 region is relatively low, and therefore one of the color 4 region and the color 3 region must be the "work" function. In practical situations, the "residential area" is often inseparable from the "work area", and the two areas are often adjacent in geographical location, in fig. 3, the color 3 area is most adjacent to the color 1 area, and the color 4 area is most adjacent to the color 3 and color 2 areas, so the color 3 area should be the "work area". Finally, the area a in fig. 3 is a famous west lake scenic spot in the state of hangzhou, including west lake, dragon well, lingo, and the like, and the terrain of this area is mostly a mountain area, so that the interest points of the other categories are rarely distributed except for the interest points of the category "tourism", and even there are no interest points distributed basically in the range of some base stations. In fig. 3, except for the color 2, the color 4 part is slightly more than the rest of the colors, and the function of the color 2 region is already judged as "tourist region", and the function of the color 4 region is "rare region" with less distribution of interest points.

From the above analysis, the recognition results of the city regions in fig. 3 are as follows: the color 1 region is a "residential zone"; the color 2 area is a 'tourist area'; the color 3 area is a working area; the color 4 region is the "region of rare arrivals".

By applying the method of the invention, the goodness of fit of each functional area in the embodiment is as follows:

(1) degree of coincidence of residential areas

Fig. 4 is an overall urban planning map in the year 2001 + 2020 of the Hangzhou city, fig. 5 is a projection of a distribution map of residential areas identified according to the method of the present invention on the overall urban planning map in the year 2001 + 2020 of the Hangzhou city under the same longitude and latitude conditions, and the black parts in the map, namely the areas identified as the residential areas according to the method of the present invention, are distributed on the map and are basically consistent with the residential areas in the overall urban planning map of the Hangzhou city. Therefore, the recognition result of the residential area is basically consistent with the actual recognition result.

(2) Goodness of fit of "tourist area

According to the illustration in fig. 6, the projection of the recognition result of the "tourist area" on the hundred-degree map of the invention also substantially conforms to the reality. The experimental result makes accurate identification on the functions of the base stations covering the scenic spots such as 'good tea culture village', 'west stream wetland', 'west lake scenic spot' and 'Hunan lake'.

(3) Degree of fit of' working area

The "work area" identified by the method of the present invention substantially coincides with the distribution of "public management and public service facilities land", "commercial service facilities land" and "industrial land" which can be defined as "work area" in the city general plan drawing of fig. 4, hangzhou city 2001-2020. It can be known from the "working" interest point distribution thermodynamic diagram of fig. 7 and the color 3 area of fig. 3 that the "working area" identified by the method according to the present invention substantially coincides with reality.

By integrating the goodness of fit analysis of the (1), (2) and (3), the identification result of the urban area function by the method for identifying the urban functional area according to the point of interest data provided by the invention is basically matched with the actual result.

Claims

1. A method for identifying a city functional area based on interest point data is characterized by comprising the following steps:

step three, calculating the distribution characteristics of the interest points of each base station: classifying and counting the interest points of each base station according to the parameter of the interest point category in the data of the interest point list of each base station to obtain an interest point category distribution matrix D of each base station; combining the interest point category distribution matrix D with the occupied grid number matrix G, and processing by adopting a normalization method to obtain a matrix Y finally used for analysis;

2. The method for identifying urban functional areas based on point of interest data as claimed in claim 1, wherein in step three, the normalization processing method is as follows:

Y＝A·e^-x (2)

3. The method according to claim 1, wherein in step four, the fuzzy clustering analysis uses C-means clustering algorithm to divide all vectors into C clusters and find the clustering center of each cluster, so that the sum of variance in the clusters is minimized;

4. The method as claimed in claim 1, wherein in step five, the overlapping rate of the distribution of the interest point with the interest point category "s" and the base station with the cluster category "n" on the map is calculated, and the method comprises the following steps: "a grid list where the interest points of which the interest point category is" s "are located; and clustering a grid number list covered by the base station with the category of 'n' to obtain the overlapping rate.

5. The method as claimed in claim 4, wherein in the fifth step, the specific method for calculating the overlapping rate of the distribution of the interest point with the "s" interest point category and the base station with the "n" clustering category on the map is as follows:

step1: finding out the grid numbers of the interest points with the interest point category as s according to the longitude and latitude of each interest point;

step2: amplifying the area according to the characteristics of the interest points with the interest point category of s to obtain all grid numbers in the amplified area, wherein the amplified area is obtained by amplifying the area to a square area in four directions of south, east, west and north by taking the grid number obtained by Step1 as the center;

step3: counting all non-repeated grid number sets obtained at Step2, and recording as S;

step4: according to the base station number with the clustering category of N and the grid number list covered by each base station, finding the grid number set covered by the base station number with the clustering category of N, and recording the grid number set as N;

step5: the grid overlap rate 0 verlaparato is calculated according to equation (3): the grid overlapping rate is the overlapping rate of a grid number set S with the interest point category of S and a grid number set N covered by a base station number with the clustering category of N;

6. the method as claimed in claim 1, wherein in step one, the map is divided by the position of the mobile phone base station by searching the base station nearest to the grid.