CN109117861A - A kind of multi-level cluster analysis method of point set for taking spatial position into account - Google Patents
A kind of multi-level cluster analysis method of point set for taking spatial position into account Download PDFInfo
- Publication number
- CN109117861A CN109117861A CN201810696862.0A CN201810696862A CN109117861A CN 109117861 A CN109117861 A CN 109117861A CN 201810696862 A CN201810696862 A CN 201810696862A CN 109117861 A CN109117861 A CN 109117861A
- Authority
- CN
- China
- Prior art keywords
- distance
- spatial
- feature
- space
- following formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 121
- 238000007621 cluster analysis Methods 0.000 title claims abstract description 32
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 75
- 230000002776 aggregation Effects 0.000 claims abstract description 50
- 238000004220 aggregation Methods 0.000 claims abstract description 50
- 238000004364 calculation method Methods 0.000 claims abstract description 32
- 230000002159 abnormal effect Effects 0.000 claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims description 29
- 238000002156 mixing Methods 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 7
- 206010021703 Indifference Diseases 0.000 claims description 5
- 230000015572 biosynthetic process Effects 0.000 claims description 5
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 238000011161 development Methods 0.000 claims description 2
- 238000013316 zoning Methods 0.000 claims description 2
- 230000000750 progressive effect Effects 0.000 abstract description 12
- 230000001149 cognitive effect Effects 0.000 abstract description 5
- 208000035473 Communicable disease Diseases 0.000 description 29
- 208000015181 infectious disease Diseases 0.000 description 16
- 238000011160 research Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 7
- 238000013507 mapping Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 230000002596 correlated effect Effects 0.000 description 5
- 201000010099 disease Diseases 0.000 description 5
- 230000008029 eradication Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 206010014599 encephalitis Diseases 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000002265 prevention Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 229960005486 vaccine Drugs 0.000 description 4
- 241000282836 Camelus dromedarius Species 0.000 description 3
- 206010008631 Cholera Diseases 0.000 description 3
- 241001248539 Eurema lisa Species 0.000 description 3
- 230000019771 cognition Effects 0.000 description 3
- 230000000875 corresponding effect Effects 0.000 description 3
- 206010013023 diphtheria Diseases 0.000 description 3
- 238000002649 immunization Methods 0.000 description 3
- 230000003053 immunization Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 239000004575 stone Substances 0.000 description 3
- 241000193738 Bacillus anthracis Species 0.000 description 2
- 206010014096 Echinococciasis Diseases 0.000 description 2
- 208000009366 Echinococcosis Diseases 0.000 description 2
- 201000006353 Filariasis Diseases 0.000 description 2
- 206010019799 Hepatitis viral Diseases 0.000 description 2
- 206010024229 Leprosy Diseases 0.000 description 2
- 201000005505 Measles Diseases 0.000 description 2
- 201000009906 Meningitis Diseases 0.000 description 2
- 201000005702 Pertussis Diseases 0.000 description 2
- 206010035148 Plague Diseases 0.000 description 2
- 208000000474 Poliomyelitis Diseases 0.000 description 2
- 208000037386 Typhoid Diseases 0.000 description 2
- 241000700605 Viruses Species 0.000 description 2
- 206010047505 Visceral leishmaniasis Diseases 0.000 description 2
- 241000607479 Yersinia pestis Species 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 208000001848 dysentery Diseases 0.000 description 2
- 208000028104 epidemic louse-borne typhus Diseases 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 208000006454 hepatitis Diseases 0.000 description 2
- 231100000283 hepatitis Toxicity 0.000 description 2
- 230000002458 infectious effect Effects 0.000 description 2
- 201000004792 malaria Diseases 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 239000004576 sand Substances 0.000 description 2
- 201000004409 schistosomiasis Diseases 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 201000008297 typhoid fever Diseases 0.000 description 2
- 206010061393 typhus Diseases 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 208000030507 AIDS Diseases 0.000 description 1
- 208000034579 Acute haemorrhagic conjunctivitis Diseases 0.000 description 1
- 208000008710 Amebic Dysentery Diseases 0.000 description 1
- 241000224489 Amoeba Species 0.000 description 1
- 206010001986 Amoebic dysentery Diseases 0.000 description 1
- 244000144730 Amygdalus persica Species 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 102100021523 BPI fold-containing family A member 1 Human genes 0.000 description 1
- 208000004429 Bacillary Dysentery Diseases 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 206010006500 Brucellosis Diseases 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 208000001490 Dengue Diseases 0.000 description 1
- 206010012310 Dengue fever Diseases 0.000 description 1
- 206010012742 Diarrhoea infectious Diseases 0.000 description 1
- 206010017915 Gastroenteritis shigella Diseases 0.000 description 1
- 241000124879 Grus leucogeranus Species 0.000 description 1
- 206010061192 Haemorrhagic fever Diseases 0.000 description 1
- 208000020061 Hand, Foot and Mouth Disease Diseases 0.000 description 1
- 208000025713 Hand-foot-and-mouth disease Diseases 0.000 description 1
- 101100165584 Homo sapiens BPIFA1 gene Proteins 0.000 description 1
- 241000221931 Hypomyces rosellus Species 0.000 description 1
- 206010024238 Leptospirosis Diseases 0.000 description 1
- 208000005647 Mumps Diseases 0.000 description 1
- 241000180649 Panax notoginseng Species 0.000 description 1
- 235000003143 Panax notoginseng Nutrition 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 235000006040 Prunus persica var persica Nutrition 0.000 description 1
- 206010037742 Rabies Diseases 0.000 description 1
- 206010039587 Scarlet Fever Diseases 0.000 description 1
- HUTDUHSNJYTCAR-UHFFFAOYSA-N ancymidol Chemical compound C1=CC(OC)=CC=C1C(O)(C=1C=NC=NC=1)C1CC1 HUTDUHSNJYTCAR-UHFFFAOYSA-N 0.000 description 1
- 206010064097 avian influenza Diseases 0.000 description 1
- 229960000190 bacillus calmette–guérin vaccine Drugs 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 210000000746 body region Anatomy 0.000 description 1
- 208000025698 brain inflammatory disease Diseases 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000015271 coagulation Effects 0.000 description 1
- 238000005345 coagulation Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 208000025729 dengue disease Diseases 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000006806 disease prevention Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 210000004024 hepatic stellate cell Anatomy 0.000 description 1
- 206010022000 influenza Diseases 0.000 description 1
- 230000009545 invasion Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000010985 leather Substances 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 229940014135 meningitis vaccine Drugs 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 208000010805 mumps infectious disease Diseases 0.000 description 1
- 210000003681 parotid gland Anatomy 0.000 description 1
- 230000001717 pathogenic effect Effects 0.000 description 1
- 230000010287 polarization Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 208000008128 pulmonary tuberculosis Diseases 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000009671 shengli Substances 0.000 description 1
- 201000005113 shigellosis Diseases 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 210000000278 spinal cord Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 208000006379 syphilis Diseases 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 201000001862 viral hepatitis Diseases 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
- 239000009270 zilongjin Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of multi-level cluster analysis methods of point set for taking spatial position into account, including step following six: (1) is tentatively judged based on the space clustering existence of Statistic map of grades;(2) space clustering existence accurate judgement of the based on spatial autocorrelation;(3) space clustering type accurate judgement of the based on spatial autocorrelation;(4) is accurately divided based on the space clustering region of spatial autocorrelation;(5) is accurately divided based on the aggregation spatial abnormal feature of spatial autocorrelation;(6) space clustering region of the based on clustering algorithm includes the accurate positioning of point.The above method uses progressive multi-level judgement structure, so that without the judgement into next level if previous level does not meet, it is closely related between at all levels, and it is at all levels between it is progressive, meet the cognitive need and habit of people, it is fast and calculation the good method to the standard with calculation, but also calculation not only calculated.
Description
Technical field
The present invention relates to computer science and Geographical Information Sciences technical field, take spatial position into account more particularly, to one kind
The multi-level cluster analysis method of point set.
Background technique
In real world, the objective law of " Things of a kind come together " is often abided by between things.So being born
The model of volume of data cluster, method, algorithm, this is especially apparent in Computer Science and Technology field.Machine learning
(Machine Learning) ten big algorithm include C4.5 algorithm, K average value (K-means) algorithm, support vector machine (SVM,
Support Vector Machine) algorithm, Apriori association algorithm, greatest hope (EM, Expectation Maximum)
Algorithm, paging PageRank algorithm, AdaBoost iterative algorithm, K closest (KNN, K-Nearest Neighbor) algorithm, Piao
Plain Bayes's (NB, Naive Bayes) algorithm and Taxonomy and distribution (CART, Classification And Regress
Trees) algorithm.Wherein, K average value (K-means) algorithm, greatest hope (EM, Expectation Maximum) algorithm are all
Clustering algorithm.Cluster, as non-supervisory (unsupervised) machine learning method typically without label (unlabed), packet
Include the clustering algorithm (typical as K-means algorithm) based on division, the clustering method based on level (it is typical as DIANA algorithm,
AGNES algorithm), density-based algorithms (typical as DBSCAN algorithm), density clustering method it is (typical as maximum
It is expected that EM algorithm) etc..The new clustering method of the above all kinds of traditional datas (often multidimensional data or high dimensional data) is continuous
It is designed, and is gradually widely studied and applies.
It is worth noting that, 80% data and spatial domain (or spatial position or geographical position in the objective reality world
Set) it is closely related.In other words, real-life most of data are closely connected with information and spatial position.In face of per hour/
Per minute all in ten hundreds of data or information of generation, and these data or information all carry spatial position spy
These data or information are given cluster analysis (or abbreviation space clustering) from space angle, are one very valuable by sign
Work.
The data clustering method and algorithm of the traditional data (in Computer Science and Technology field) are followed, it is existing (on ground
Manage in information science field) spatial clustering method with algorithm is also that rough classification is as follows: spatial clustering method, base based on division
In the spatial clustering method of level, density-based spatial clustering method, spatial clustering method based on grid etc..Typically,
Two-dimensional space cluster can be regarded as only there are two dimension (i.e. only have X-coordinate and Y coordinate or longitude longitude and latitude
Latitude the two attribute columns) data cluster analysis, and such cluster analysis result can be two-dimentional empty
Between intuitively show on domain (typical as plane map);Meanwhile three-dimensional space cluster can be regarded as only (only having there are three dimension
X-coordinate and Y coordinate and Z coordinate or longitude longitude and latitude latitude and height height these three attribute columns)
The cluster analysis of data, and such cluster analysis result can be intuitive on three-dimensional space domain (typical such as three-dimensional sphere)
Display.
In addition to space clustering model and method, there are also spatial autocorrelation judgements for the method analyzed for spatial aggregation.Needle
Generation to spatial auto-correlation (spatial auto-correlation), it then follows First Law of Geography (Tobler ' s
First Law or Tobler ' s First Law of Geography), i.e., " anything is all related to other things
, only more similar things is often associated with that close (original text is Everything is related on spatial position
to everything else,but near things are more related to each other)".More than being based on
First Law of Geography, this results in the correlation analysis result of things or attribute in spatial distribution, and there are following several possibility
(as shown in Fig. 1): (1) space is positively correlated: referring to that adjacent domain has the same or similar attribute value, as shown in attached drawing 1 (a);It changes
Yan Zhi, if to show place also high around high place, low in spatial distribution also low around for certain variable's attribute value, referred to as
Space is positively correlated, and shows that this variable's attribute value has space diffusion characteristic;(2) space is negatively correlated: it is different to refer to that adjacent domain has
Attribute value, as shown in attached drawing 1 (b);In other words, if showing place week low around high place, low in spatial distribution
Height is enclosed, then referred to as space is negatively correlated, shows that this variable's attribute value has spatial polarizations feature;(3) it is spatially uncorrelated: referring to variable category
The phenomenon that property shows randomness in spatial distribution, shows that spatial autocorrelation is unobvious, be a kind of random distribution, such as attached drawing 1
(c) shown in.
Although the above spatial clustering method, spatial autocorrelation model etc. can auxiliary space aggregation analysis work,
But wherein spatial clustering method is still most frequently used, so illustrated before illustrating the content of present invention in this emphasis
Space clustering.
For spatial clustering method, it can be given and be expressed using following common version, as shown in formula (1):
SDCA=(S, m, d, Dz, Ag, q) (1)
Wherein, SDCA is the contracting of Spatial Data Clustering Analysis (spatial data cluster analysis)
It writes;
S (acronym of Spatial), representation space data set, S={ O1, O2 ..., On };
The total amount of data object in m (abbreviation of number) representation space data set;
D (acronym of dimension), the dimension of representation space data set;
Dz indicates the similarity measures for being used for specific clustering;In space clustering, often using visual
Space length distance carrys out measured similarity;In tradition distance, the distance between hundreds and thousands of dimensions are often non-visual;
Ag (abbreviation of Algorithm) indicates the specific implementation algorithm for being used for clustering, is described in detail later;
Q (abbreviation of Convergence), representation space clustering algorithm termination condition (or complete condition, restrain item
Part);If any clustering algorithm only pass through limited times operation and directly obtain cluster result, and some algorithms by continuous iteration until
Convergence is to obtain final cluster result.
So far thousands of about the research paper of cluster, traditional clustering algorithm system is substantially about ginseng above
Number Ag expansion.In general, Ag can be divided into following five class: specifically including: (1) based on the method for division: data set is drawn at random
Be divided into k subset, then by iteration re-positioning technology attempt by data object from a cluster be moved to another cluster to
The quality for continuously improving cluster, such as K-means algorithm;(2) based on the method for level: carrying out layer to given set of data objects
Secondary decomposition according to the forming method of level, and can be divided into cohesion and division two major classes method, such as solidifying based on minimum distance
Poly- algorithm;(3) based on the method for density, cluster is generated according to the density of domain object or certain density function, so that often
A class must include at least the point of certain amount in given range region, such as DBSCAN algorithm;(4) based on the method for grid:
Object space is quantified as a limited number of unit, forms a network structure, so that all cluster operations are all in network
Upper progress, so that cluster speed greatly promotes, such as STING algorithm;(5) based on the method for model: assuming one for each class
A model finds data to the best fit of setting models, such as COBWeb algorithm;(6) method based on probability: estimated based on probability
The clustering method of meter, such as greatest hope EM algorithm, Density Estimator method.Additional, there is also the feelings that many methods are intersected
Condition, such as based on grid and the clustering algorithm combined based on density.It is as shown in Fig. 2 that above is referred to clustering algorithms.
For above various types of clustering algorithms, each time complexity, space complexity, scalability,
Cluster shape, whether unrelated with input sequence, noise processed ability and be capable of handling data type etc. has respective method
The characteristics of.Specific manifestation is as follows:
(1) requirement of efficiency of algorithm: many clustering algorithms can be very for the relatively small data set less than 200 data objects
It is clustered well, still, large data is concentrated may be comprising millions of, several ten million or even more objects and record.
Although can reduce data volume to be processed by sampling, sampling can affect to the result of cluster or even can generate mistake
Result accidentally.Therefore, the clustering algorithm of telescopic in height is ideally needed;
(2) handle the ability of different type attribute: many clustering methods can only cluster the input data of value type;So
And in data mining practical application, input data type it is diversified, nonideal, so need to consider different clusters
Processing capacity of the algorithm for different data scale feature;
(3) handle the ability of noise data: most of database in the real world all contains isolated point, sky
Scarce, unknown data or wrong data;Some clustering methods are more sensitive for such data, may cause low-quality cluster
As a result;
(4) handle the ability of high dimensional data: a database or data warehousing may include several dimensions or attribute;Perhaps
Multi-cluster method is good at the data of processing low-dimensional, may pertain only to 2-3 dimension;However, the cluster data in higher dimensional space
Object be it is very challenging, especially such input data is possible to very sparse, and high deflection;
(5) interpretation and availability: user wish cluster result be can explain, be understood that, can be used
's;In other words, cluster may need to connect each other and combine with specific semantic interpretation and application;
(6) for determines input parameter domain knowledge number: many clustering methods require user in clustering
Certain parameter is inputted, such as wishes to generate the number of class, and cluster result is very sensitive for input parameter.At practical place
In reason, parameter is generally difficult to determine, even more so for data set when especially for comprising high dimensional object;
(7) to the sensitivity of input sequence: some clustering algorithms are very sensitive for the input sequence of data, for example,
It for the same data set, inputs or is scanned into some algorithm in differing order, it is possible to completely different gather can be generated
Class is as a result, what this was often not desirable to;
(8) can find the cluster of arbitrary shape: many clustering methods are based on distance to determine cluster result, and are based on
The algorithm of distance metric tends to the spherical class for being found to have similar scale and density;However, cluster may be various shapes
Shape, such as linear, cyclic annular, spill and various other irregular shapes of complexity.
Different from single use Spatial Clustering give spatial aggregation judgement, also different from use single space from
Correlation model is to judging spatial aggregation, the invention proposes a kind of multi-level space clustering method for taking spatial position into account,
This method is multi-level, and between level be it is progressive, the spatial aggregation suitable for point set object judges, judgement
Process is intuitive and progressive, meets the cognitive need of people, explained later.
Summary of the invention
Technical problem to be solved by the invention is to provide a kind of multi-level cluster analysis of the point set for taking spatial position into account
Method, this method use progressive multi-level judgement structure, so that next without entering if previous level does not meet
The judgement of a level, meet " calculation fast ", meanwhile, the above multi-level judgment model meet people from the superficial to the deep cognition habit with
Demand embodies " calculation good ", it is at all levels between be it is progressive, meet the cognitive need and habit of people, this method is
One not only " calculation to " and " standard of calculation ", but also the method for " calculation fast " and " calculation good ", it is proposed for spatial aggregation judgement
A kind of new thinking.
The technical scheme adopted by the invention is that a kind of multi-level cluster analysis method of point set for taking spatial position into account,
Include the following steps:
Step 1: the space clustering existence based on Statistic map of grades tentatively judges: by hierarchical statistics drafting method come
Judge the whether doubtful presence of spatial aggregation;
Step 2: the space clustering existence accurate judgement based on spatial autocorrelation: by spatial autocorrelation coefficient
Global Moran I coefficient is to determine whether be implicitly present in spatial aggregation;
Step 3: the space clustering type accurate judgement based on spatial autocorrelation: if being implicitly present in spatial aggregation,
Judge that space clustering type is that high level cluster or low value are poly- by the global Getis-Ord coefficient in spatial autocorrelation coefficient
Class;
Step 4: the space clustering region based on spatial autocorrelation accurately divides: being clustered if it is high level, then pass through space
Local Getis-Ord coefficient in auto-correlation coefficient accurately delimits the specific region of high level cluster;It is clustered if it is low value, then
The specific region of low value cluster accurately delimited by the local Getis-Ord coefficient in spatial autocorrelation coefficient;
Step 5: the aggregation spatial abnormal feature based on spatial autocorrelation accurately divides: by the office in spatial autocorrelation coefficient
Portion's MoranI coefficient accurately to mark off the abnormal area other than high level aggregation or low value aggregation;
Step 6: the space clustering region based on clustering algorithm include point accurate positioning: by Spatial Clustering come
It is accurately positioned the point that space clustering region is included.
The beneficial effects of the present invention are: the above-mentioned multi-level cluster analysis method of point set for taking spatial position into account, is different from
Single use Spatial Clustering gives spatial aggregation judgement, and the spatial autocorrelation model single also different from calling is to sentence
Disconnected spatial aggregation, in this method, it is at all levels between be closely related, and it is at all levels between it is progressive, meet people's
Cognitive need and habit, each progressive level successively gives " with the presence or absence of cluster (answer Yes or No) ", " if there is
Cluster then judgement be it is which type of cluster (answer high level cluster or low value cluster) ", " how the specific region of cluster delimited
(providing specific aggregation zone) " and " clustering the point that specific region is included above has which (to give specific accumulation regions to be wrapped
The point contained) " answer so that if previous level does not meet without enter next level judgement, meet " calculation
Fastly ";Meanwhile the above multi-level judgment model meets the cognition habit and demand from the superficial to the deep of people, embodies " calculation good ".Always
It is both " calculation in a kind of multi-level cluster analysis method of point set for taking spatial position into account proposed by the present invention for knot
It is right " and " standard of calculation ", the method for " calculation fast " and " calculation good " again.
As preferential, in step 1, the hierarchical statistics drafting method are as follows: according to the statistics of each region dividing unit,
According to the density, intensity or development level of phenomenon come divided rank, then according to rank height, distinguish on map by zoning
It fills out and draws the different color of the depth or the different warp of density, to show the difference between each region dividing unit.
As preferential, in step 1, hierarchical statistics drafting method is used to three indexs, i.e. index PN, index PD with
And index HR, wherein PN indicates the number of specific crowd, and PD indicates the density of specific crowd, and HR indicates specific crowd all
Ratio in population, the specific calculating of three above index is as shown in following formula (1) and formula (2):
Wherein, aa(i)Indicate the face domain size of i-th of administrative division, cn(i)Indicate the population base of i-th of administrative division.
As preferential, in step 2, shown in the following formula of calculating (3) and formula (4) of global Moran I coefficient:
Wherein, ziIt is the difference of feature i attribute value Yu its intermediate valuewi,jIt is the space weight of feature i Yu feature j, n
It is the total number of feature, S0It is the summation of all space weights.
As preferential, in step 3, shown in the following formula of calculating (5) of global Getis-Ord coefficient:
Wherein, xiAnd xjIt is the attribute value of feature i and feature j, wijIt is the space weight of feature i and feature j, n is data
The total number of feature is concentrated,Indicate that feature i and feature j cannot be the same feature.
As preferential, in step 4, the following formula of calculating (6) of local Getis-Ord coefficient is to shown in formula (8):
Wherein, xjIt is the attribute value of feature j, wi,jIt is the space weight of feature i and feature j, n is feature total number.
As preferential, in step 5, shown in the following formula of calculating (9) of local Moran I coefficient:
Wherein, xiIt is the attribute value of feature i,It is the intermediate value of corresponding attribute, wi,jIt is the space right of feature i and feature j
Weight, n is the total number of feature.
As global Moran I coefficient, overall situation Getis-Ord coefficient, part preferential, be related in above-mentioned steps
Getis-Ord coefficient and part Moran I coefficient are required to using to Spatial weight matrix, the meter of the Spatial weight matrix
The adoptable strategy of calculation mode include: anti-distance strategy, anti-square distance strategy, fixed range strategy, indifference region strategy,
The closest strategy of K, the adjacent strategy in side, the adjacent strategy of edge point and Delaunay triangulation network strategy.
As preferential, in step 6, the Spatial Clustering is bottom-up blending algorithm, described bottom-up
The specific method of blending algorithm include the following steps:
(1), regard n spatial point as n subgroup, i.e., each subgroup only has 1 spatial point, then according to selected
Cluster merging criterion calculates the relationship between this n subgroup;
(2), subgroup two-by-two is classified as by a new subgroup according to cluster subgroup merging criterion, has thus obtained n-1
Subgroup;
(3), class statistic amount between any two in n-1 subgroup is recalculated, continues to be continued according to the above same criterion
Subgroup merging is carried out, then obtains n-2 subgroup;
(4), above step is repeated, and so on, until all subgroups complete to merge, ultimately form 1 big subgroup.
As preferential, the similitude judgement being related between spatial point in bottom-up blending algorithm, by using away from
Come the similitude between metric space point, the distance metric criterion from measurement criterion are as follows: setting vector x has j different dimensional
Degree, then the various distances between two different vector individual xs and xj are calculated as follows:
(1) Minkowski distance, shown in the following formula of the calculating of Minkowski distance (11):
Wherein p indicates Minkowski index;
(2) city block distance, when p value is 1 in Minkowski distance, special case turns to city block distance, city block distance
The following formula of calculating (12) shown in:
City block distance in two-dimensional space can further following formula (13) calculate:
dst=| xs-xt|+|ys-yt| (13)
City block distance in three-dimensional space can further following formula (14) calculate:
dst=| xs-xt|+|ys-yt|+|zs-zt| (14)
(3) Euclidean distance, when p value is 2 in Minkowski distance, special case turns to common Euclidean distance, Europe
Shown in the following formula of the calculating of formula distance (15):
Euclidean distance in two-dimensional space can further following formula (16) calculate:
Euclidean distance in three-dimensional space can further following formula (17) calculate:
(4) Chebyshev's distance, when p value is infinitely great in Minkowski distance, special case turns to Chebyshev
Distance, shown in the following formula of the calculating of Chebyshev's distance (18):
Chebyshev's distance in two-dimensional space can further following formula (19) calculate (such as shown in attached drawing 6 (c)):
dst=max (| xs-xt|,|ys-yt|) (19)
Chebyshev's distance in three-dimensional space can further following formula (20) calculate:
dst=max (| xs-xt|,|ys-yt|,|zs-zt|) (20)
As preferential, above-mentioned cluster subgroup merging criterion refers to: two sons are judged according to the distance between two subgroups
Whether group should merge, if can merge, choose the two subgroups and merge, can set two subgroups is subgroup respectively
R and subgroup s, object number is respectively nr and ns in two subgroups.So, Cluster merging criterion can be specifically arranged as follows:
Chain for list, using the similarity matrix or distance matrix of data, defining between class distance is between two classes
The minimum range of data, list is chain formula (21) to express as follows:
D (r, s)=min (dist (xri,xsj)),i∈(i,...,nr),j∈(1,...,ns) (21)
For complete chain, the similarity matrix or distance matrix of data are used, it is several for two classes between to define between class distance
According to maximum distance, it is complete it is chain can also following formula (22) expression:
D (r, s)=max (dist (xri,xsj)),i∈(1,...,nr),j∈(1,...,ns) (22)
For a group average linkage, using the similarity matrix or distance matrix of data, definition between class distance is class spacing
From data two-by-two with a distance from average value, group average linkage can following formula (23) expression:
For centroid distance, from distance matrix and initial data, definition distance is two-dimentional Euclidean distance, this distance is
Individual and the quality distance of group or the centroid distance of group and group, the following formula of centroid distance (24) expression:
It is chain for Ward, it is intended to make the increment of the sum of squares of deviations in group minimum in each step, Ward is chain
Following formula (25) simplifies expression:
It is chain for intermediate value, in the mass center of calculating group, by two parts of synthesis group according to identical weight calculation, that is, count
The mass center of calculating is actually the average value for forming the two-part mass center of the group, and intermediate value is chain can following formula (26) table
It reaches:
It is chain for weighted average, when calculating class spacing to distance plus the power for being equivalent to membership's inverse in class
Weight, weighted average is chain formula (27) to express as follows:
Detailed description of the invention
Attached drawing 1 is the intuitive schematic diagram of spatial coherence, wherein (a) is to be positively correlated, is (b) negative correlation, is (c) not phase
It closes;
Attached drawing 2 is the classification system figure of various Spatial Clusterings;
The comparison diagram of the characteristics such as the advantages of attached drawing 3 is various Spatial Clusterings and disadvantage;
Attached drawing 4 is general technical route map;
Attached drawing 5 is the various weight downward trends signal calculated in the Spatial weight matrix that spatial autocorrelation coefficient is related to
Figure;
Attached drawing 6 is all kinds of distance metric criterion schematic diagrames for calculating hierarchical clustering and being related to;
Attached drawing 7 is each seed group polymerization criterion schematic diagram for calculating hierarchical clustering and being related to;
Attached drawing 8 is Ningbo City and the spatial position distribution map it includes administrative division;
Attached drawing 9 is the hierarchical chart of China's administrative division setting;
Attached drawing 10 be Ningbo City's administrative division detailed composition scheme (area Gong11Ge/county/county-level city, 153 street/towns/
Township);
Attached drawing 11 is China's infectious disease type map (2 kinds of Class A, 26 kinds of Class B, 11 kinds of Class C);
Attached drawing 12 is Statistic map of grades result figure of all kinds of indexs in area/county/county-level city (totally 11) rank of Ningbo City;
Attached drawing 13 be all kinds of indexs Ningbo City street/town/township (totally 153) rank Statistic map of grades result
Figure;
Attached drawing 14 is cold and hot regional analysis result of all kinds of indexs in street/town/township (totally 153) rank of Ningbo City
Figure;
Attached drawing 15 is that all kinds of indexs are analyzed totally in the aggregation of street/town/township (153) rank of Ningbo City and abnormal area
Result figure;
Attached drawing 16 is the hierarchical space cluster result figure of infectious diseases in Ningbo patient;
Attached drawing 17 is that the spatial aggregation based on Density Estimator analyzes result figure.
Specific embodiment
It is invented referring to the drawings and in conjunction with specific embodiment to further describe, to enable those skilled in the art's reference
Specification word can be implemented accordingly, and the scope of the present invention is not limited to the specific embodiment.
The present invention relates to a kind of multi-level cluster analysis method of point set for taking spatial position into account, this method is different from single
Use space clustering algorithm gives spatial aggregation judgement, and the spatial autocorrelation model single also different from calling is to judge sky
Between aggregation, it is a kind of multi-level cluster analysis method, it is at all levels between be closely related, and it is at all levels it
Between be it is progressive, meet the cognitive need and habit of people.This method is the spatial aggregation judgment method towards point set.
Specifically, in a kind of multi-level cluster analysis method of point set for taking spatial position into account proposed by the present invention, respectively
A progressive level successively gives the answer of following key problem: " with the presence or absence of cluster (answering Yes or No) ", " if there is
Cluster then judgement be it is which type of cluster (answer high level cluster or low value cluster) ", " how the specific region of cluster delimited
(providing specific aggregation zone) ", " clustering the point that specific region is included above has which (giving specific accumulation regions is included
Point) ".
It is worth noting that, for any one algorithm, generally require to consider " calculation to or not
(effectiveness) ", " (efficiency) is not allowed in the standard of calculation ", " fast unhappy (quick) of calculation ", " good or not of calculation
(satisfaction)".For a kind of multi-level cluster analysis method of point set for taking spatial position into account proposed herein, tool
" calculation to " and " standard of calculation " is completely secured in standby stronger Fundamentals of Mathematics and mathematical proof;Above method uses progressive multilayer
Secondary judgement structure, so that meeting " calculation fast " if previous level does not meet without the judgement into next level;
Meanwhile the above multi-level judgment model meets the cognition habit and demand from the superficial to the deep of people, embodies " calculation good ".Summarize and
Speech, it is proposed here a kind of multi-level cluster analysis method of point set for taking spatial position into account, be one both " calculation to " and
" standard of calculation ", the method for " calculation fast " and " calculation good " again, it proposes a kind of new thinking for spatial aggregation judgement.
In order to realize a kind of multi-level cluster analysis method of point set for taking spatial position into account of the present invention, need through
Cross following six big steps, comprising:
Step 1: the space clustering existence based on Statistic map of grades tentatively judges: by hierarchical statistics drafting method come
Judge the whether doubtful presence of spatial aggregation;
Step 2: the space clustering existence accurate judgement based on spatial autocorrelation: by spatial autocorrelation coefficient
Global Moran I coefficient is to determine whether be implicitly present in spatial aggregation;
Step 3: the space clustering type accurate judgement based on spatial autocorrelation: if being implicitly present in spatial aggregation,
Judge that space clustering type is that high level cluster or low value are poly- by the global Getis-Ord coefficient in spatial autocorrelation coefficient
Class;
Step 4: the space clustering region based on spatial autocorrelation accurately divides: being clustered if it is high level, then pass through space
Local Getis-Ord coefficient in auto-correlation coefficient accurately delimits the specific region of high level cluster;It is clustered if it is low value, then
The specific region of low value cluster accurately delimited by the local Getis-Ord coefficient in spatial autocorrelation coefficient;
Step 5: the aggregation spatial abnormal feature based on spatial autocorrelation accurately divides: by the office in spatial autocorrelation coefficient
Portion's MoranI coefficient accurately to mark off the abnormal area other than high level aggregation or low value aggregation;
Step 6: the space clustering region based on clustering algorithm include point accurate positioning: by Spatial Clustering come
It is accurately positioned the point that space clustering region is included.
It is introduced in detail below for each step.
The preliminary judgement of space clustering existence of the step 1 based on Statistic map of grades
The content of this step is " judging the whether doubtful presence of spatial aggregation ", and the target of this step is that " space clustering is
No existing preliminary judgement ", this step is " hierarchical statistics drafting method " by means.
In hierarchical statistics drafting method, need using following three indexs: index PN, index PD, index HR.Specifically
, PN represents the number of specific crowd, it is the abbreviation of Number of Particular people;PD represents specific crowd
Density, it is the abbreviation of Density of Particular people;HR represents ratio of the specific crowd in all populations
Example, it is the abbreviation of Rate of ad-Hoc people.
In the calculating process of three above index PN, PD, HR, it is also necessary to by following parameter: setting i-th of administrative division
Face domain size be aa(i)(it is the abbreviation of Administrative Area), the population base of i-th of administrative division are cn(i)
(it is the abbreviation of Census Number).The specific calculating of three above index is as shown in following formula (1) and formula (2):
Particularly, it is believed that this absolute index of PD and HR the two relative indicatrixes ratio PN more has reliably, because being directed to
Identical specific crowd number, administrative division unit be to be located in center and economy is relatively flourishing so cause " although area
Less, but populous, specific crowd is concentrated " feature, and some administrative divisions are that address is remote and economy falls behind relatively so
Cause " although vast in territory, sparse population, specific crowd is very few " feature.For three above index, using ground
Hierarchical statistics drawing method in figure drawing, which is given, to be showed, and typically can set color method using classification.Classification based on the above index
Statistical chart observation can be realized the preliminary judgement with the presence or absence of spatial aggregation.
The accurate judgement of spatial aggregation existence of the step 2 based on spatial autocorrelation
The content of this step is " accurate judgement whether there is spatial aggregation ", and the purpose of this step is " spatial aggregation
Existing accurate judgement ", this step is " the global Moran I coefficient in spatial autocorrelation coefficient " by means.
In other words, for global Moran I coefficient, it for judging whether there is spatial autocorrelation, i.e., answer be Yes or
No (Yes indicates that, there are spatial autocorrelation, No indicates that spatial autocorrelation is not present).
Shown in the following formula of calculating (3) and formula (4) of global Moran I coefficient:
Wherein, ziIt is the difference of feature i attribute value Yu its intermediate valuewi,jIt is the space weight of feature i Yu feature j, n
It is the total number of feature, S0It is the summation of all space weights.
The accurate judgement of space clustering type of the step 3 based on spatial autocorrelation
The content of this step is " if there is space clustering, then accurate judgement is that high level cluster or low value cluster ", this step
Rapid purpose is " judgement is high level cluster or low value cluster ", this step is " in spatial autocorrelation coefficient by means
Global Getis-Ord coefficient ".
In other words, for global Getis-Ord coefficient, what it was answered is high level spatial autocorrelation or low value space from phase
It closes, i.e. answer is high level cluster (high-value cluster) or low value cluster (low-value cluster).
Shown in the following formula of calculating (5) of global Getis-Ord coefficient:
Wherein, xiAnd xjIt is the attribute value of feature i and feature j, wijIt is the space weight of feature i and feature j, n is data
The total number of feature is concentrated,Indicate that feature i and feature j cannot be the same feature.If space weighted value is binary number
It is worth (i.e. 0 and 1) or numerical value less than 1, then overall situation Getis-Ord factor v is always between 0 and 1.
The accurate division in space clustering region of the step 4 based on spatial autocorrelation
The content of this step is " tool of high level (or low value) cluster then accurately to be delimited if there is high level (or low value) cluster
Body region ", the purpose of this step are " accurately dividing the specific region of high level (or low value) aggregation ", this step is by means
" the local Getis-Ord coefficient in spatial autocorrelation coefficient ".
In other words, for local Getis-Ord coefficient, it is used to detect the specific aggregation space region of high level (low value) cluster
Domain be where, can specifically mark which region specific region is.
The following formula of calculating (6) of local Getis-Ord coefficient is to shown in formula (8):
Wherein, xjIt is the attribute value of feature j, wi,jIt is the space weight of feature i and feature j, n is feature total number.
The accurate division of aggregation spatial abnormal feature of the step 5 based on spatial autocorrelation
The content of this step is " accurately having divided the abnormal area other than high level (or low value) aggregation ", the purpose of this step
It is " the accurate abnormal area divided other than conventional aggregation zone ", this step is " in spatial autocorrelation coefficient by means
Local Moran I coefficient ".
In other words, for local Moran I coefficient (i.e. LISA, Local Indicator for Spatial Auto-
Correlation, local space auto-correlation coefficient), it provide the above routine clustering (English referred to herein as cluster, i.e., it is high-
High cluster, low-low cluster) specific region except, give abnormal conditions (English referred to herein as outlier, i.e., high-oligomeric class,
Low-high cluster) specific range.
Shown in the following formula of calculating (9) of local Moran I coefficient:
Wherein, xiIt is the attribute value of feature i,It is the intermediate value of corresponding attribute, wi,jIt is the space right of feature i and feature j
Weight, n is the total number of feature.
In summary, for overall situation Moran I coefficient as above, it is for judging whether there is spatial autocorrelation, i.e. answer
It is Yes or No (Yes indicates that, there are spatial autocorrelation, No expression is not present);For the above overall situation Getis-Ord coefficient, it is returned
What is answered is high level spatial autocorrelation or low value spatial autocorrelation, i.e., answer be high level cluster (high-value cluster) or
Person's low value clusters (low-value cluster);For the above part Getis-Ord coefficient, it is for detecting high level (low value)
Cluster specific area of space be where, can specifically mark which region specific region is;For the above part Moran
I coefficient (i.e. LISA), it is providing the specific of the above cluster (English is referred to herein as cluster, i.e. Gao-high cluster, low-low cluster)
Except region, the specific range of exception (English is referred to herein as outlier, i.e., high-oligomeric class, low-high cluster) is given;Above four
Person's coefficient is progressive.
It is worth noting that, above either overall situation Moran I coefficient, overall situation Getis-Ord coefficient, or part
Getis-Ord coefficient, local Moran I coefficient (i.e. LISA), requires use space weight matrix (spatial weight
matrix).The calculating of Spatial weight matrix can be using following strategy:
(1) anti-distance strategy: anti-distance tactful (Inverse Distance, be abbreviated as ID) refers to an element to another
The influence of an outer element is reduced with the increase of distance.In other words, with the element of distant place ratio, neighbouring neighbouring element is to mesh
The influence for marking the calculating of element is bigger (such as shown in attached drawing 5 (a));
(2) anti-square distance strategy: anti-square distance strategy (Inverse Distance Squared, be abbreviated as IDS)
Similar with anti-distance strategy, but its gradient becomes apparent from, therefore impacts and decline faster, and only target component
Nearest field can generate significant impact to the calculating of element;
(3) fixed range strategy: fixed range strategy (Fixed Distance Band, be abbreviated as FDB) refers to will be right
Each element in neighbouring element environment is analyzed;The weight that apportioning cost is 1 by the neighbouring element in specified critical distance,
And significant impact is generated to the calculating of target component;Neighbouring element outside specified critical distance by be assigned as 0 weight, and
Any influence will not be generated to the calculating of target component (such as shown in attached drawing 5 (b));
(4) indifference region strategy: indifference strategy (Zone of Indifference, be abbreviated as ZI) can be regarded as
The combination of anti-distance strategy and fixed range strategy;Apportioning cost is by it by the element in the specified critical distance to target component
1 weight, and the calculating that target component will be will affect;Once more than the critical distance, weight (and neighbouring element wants target
The influence that element calculates) it will be with the increase of distance and reduce (such as shown in attached drawing 5 (c));
(5) the closest strategy of K: K closest tactful (K Nearest Neighborhood, write a Chinese character in simplified form KNN) is referred to will most
K close element is included in analysis, and wherein K is specified numerical parameter;
(6) in adjacent strategy: while abut tactful (Contiguity Edges Only, be abbreviated as CEO) and refers to only public affairs
It just will affect the calculating of target component with the adjacent surface element on boundary or overlapping;
(7) the adjacent strategy of edge point: edge point adjacent tactful (Contiguity Edges Corners, be abbreviated as CEC) refers to
It is the calculating that Border, node or the face of overlapping element will affect target component;
(8) Delaunay triangulation network strategy: (Delaunay Triangulation, writes a Chinese character in simplified form Delaunay triangulation network strategy
DT it) refers to: being primarily based on element mass center creation not superimposed triangular grid, closed later using same edge and with triangle node
The case where element of connection is adjacent element;
The accurate positioning in space clustering region of the step 6 based on clustering algorithm
The content of this step is " being accurately positioned the point that space clustering region is included based on clustering algorithm ", the mesh of this step
Be " be accurately positioned space clustering region included point set ", this step is " Spatial Clustering " by means.
Herein, the hierarchical clustering algorithm in use space clustering algorithm.
Hierarchical clustering algorithm include two kinds: hierarchical clustering algorithm include bottom-up blending algorithm (AGNES algorithm,
Agglomerative Nesting) and top-down blending algorithm (DIANA algorithm, Divisive Analysis), here
What is used is bottom-up blending algorithm, and the specific method of the blending algorithm is following steps:
(1), regard n spatial point as n subgroup, i.e., each subgroup only has 1 spatial point, then according to selected
Cluster merging criterion calculates the relationship between this n subgroup;Here " cluster subgroup merging criterion " refers to why choose
The reason of two specified subgroups merge (or be merging standard), including it is shortest distance criterion, maximum distance criterion, average
(can be described in detail below) such as distance criterion, center of gravity distance criterion, sum of squares of deviations increment criterion;
(2), for two subgroups (point), according to the above criterion, (such as distance is nearest, sum of squares of deviations is minimum, sum of squares of deviations
Increment is minimum) it is classified as a new subgroup, thus obtain n-1 subgroup;
(3), recalculate the class statistic amount of n-1 subgroup between any two, continue to be continued according to the above same criterion into
Row subgroup merges, then obtains n-2 subgroup;
(4), above step is repeated, and so on, until all subgroups complete to merge, ultimately form 1 big subgroup;
The process that the above subgroup gradually merges can also be using tree-like graph expression be clustered, to clearly reflect parent between subgroup
It dredges.
For any hierarchical clustering algorithm (or even any clustering algorithm), the similitude that can be all related between spatial point
Judgement.In Spatial Clustering, using space length distance come measured similarity, i.e., " distance metric criterion ", distance metric
Criterion specifically: set vector x with j different dimensions, then the various distances between two different vector individual xs and xj are as follows
Calculate (vector individual xs and xj representation space point xs and spatial point xj here):
(1) Minkowski distance (Minkowski distance)
Fujian Koffsky distance is a kind of distance of summary (general), street distance (city block
Distance), Euclidean distance (euclidean distance), Chebyshev's distance (chebyshev distance) are all Mins
The specific special case of Koffsky distance.Shown in the following formula of the calculating of Minkowski (11):
Wherein, p indicates Fujian Koffsky index.
(2) city block distance (city block distance)
When p value is 1 in Minkowski distance, special case turns to city block distance (city block distance),
Also referred to as manhatton distance (Manhattan distance) or taxi distance (taxi distance), the calculating of city block distance
Shown in following formula (12):
City block distance in two-dimensional space can further following formula (13) calculate (such as shown in attached drawing 6 (a)):
dst=| xs-xt|+|ys-yt| (13)
City block distance in three-dimensional space can further following formula (14) calculate:
dst=| xs-xt|+|ys-yt|+|zs-zt| (14)
(3) Euclidean distance (euclidean distance)
When p value is 2 in Minkowski distance, special case turns to common Euclidean distance (euclidean
Distance), shown in the following formula of the calculating of Euclidean distance (15):
Euclidean distance in two-dimensional space can further following formula (16) calculate (such as shown in attached drawing 6 (b)):
Euclidean distance in three-dimensional space can further following formula (17) calculate:
(4) Chebyshev's distance (shebyshev distance)
When p value is infinitely great in Minkowski distance, special case turns to Chebyshev's distance (shebyshev
Distance), also referred to as chessboard distance (chess distance), the following formula of the calculating of Chebyshev's distance (18) are shown:
Chebyshev's distance in two-dimensional space can further following formula (19) calculate (such as shown in attached drawing 6 (c)):
dst=max (| xs-xt|,|ys-yt|) (19)
Chebyshev's distance in three-dimensional space can further following formula (20) calculate:
dst=max (| xs-xt|,|ys-yt|,|zs-zt|) (20)
It is different from distance metric described above, the effect of " Cluster merging criterion " in algorithm is: according to two sons
The distance between group judges whether the two subgroups should merge, and can set two subgroups is subgroup r and subgroup s respectively, two
Object number is respectively nr and ns in subgroup.So, " Cluster merging criterion " can be specifically arranged as follows:
For single chain (single linkage), also known as arest neighbors (nearest neighbor) method, such as Fig. 7 (a)
It is shown.This method uses the similarity matrix or distance matrix of data, and defining between class distance is data between two classes
Minimum range (as shown in two points of line in 7 (a) in figure).This method does not consider class formation, and there may be at random for it
Classification, especially in the case where big data, it is possible to create reel chain (long chaining) phenomenon, list is chain can be as follows
Formula (21) expression:
D (r, s)=min (dist (xri,xsj)),i∈(i,...,nr),j∈(1,...,ns) (21)
For complete chain (complete linkage), also known as farthest neighbour (furthest neighbor) method is such as schemed
Shown in 7 (b).This method equally uses the similarity matrix or distance matrix of data, but define between class distance be two classes it
Between data maximum distance (as shown in two points of line in Fig. 7 (b)).This method does not consider class formation, this method equally
Tend to find some compact classification.It is entirely chain formula (22) to express as follows:
D (r, s)=max (dist (xri,xsj)),i∈(1,...,nr),j∈(1,...,ns) (22)
For a group average linkage (group average linkage), also known as UPGMA (Unweighted Pair-Group
Method using the Average approach), as shown in Fig. 7 (c).This method equally uses the similarity of data
Matrix or distance matrix, but it is (more in such as attached drawing 7 (c) to define the average value that between class distance is between class distance data distance two-by-two
To shown in line).So the classification of generation has preferable robustness, tendency as it can be seen that this method considers the structure of class
Calculating Jie Yu Unit in two small classes of combination variance, distance it is chain and it is complete it is chain between.Group average linkage can also be following public
Formula (23) expression:
For centroid distance (centroid linkage), also known as UPGMC (Unweighted Pair-Group Method
Using Centroid approach), as shown in Fig. 7 (d).Different from previous methods, this method is from distance matrix and original
Data are set out, and general definition distance is two-dimentional Euclidean distance, this distance is individual and the quality distance of group or the matter of group and group
Heart distance (as shown in dotted line is given directions jointly in Fig. 7 (d)).It is of course also possible to using other distance measuring methods, but may
The concept elaboration for initial data " mass center " can be lacked, this method considers the structure of class.The generally following formula of centroid distance
(24) it expresses:
It is chain for Ward, also known as sum of squares of deviations method of addition (error sum of square criterion).This
Method tends to make in each step the increment of the sum of squares of deviations in group minimum, as shown in Fig. 7 (f).It is worth noting that,
Ward method has stronger Fundamentals of Mathematics (can be described in detail later).In contrast, it is known as sum of squares approach there are one method
(sum of square), it is chain to be similar to Ward, but its sum of squares of deviations based on each class rather than sum of squares of deviations
Increment, as shown in Fig. 7 (e).The formula expression of Ward method is complex, but has distinct feature and (have balance
The feature of Number of Subgroups, explained later), expression can be simplified by formula (25) as follows:
It is chain (medium linkage) for intermediate value, also known as WPGMC (Weighted Pair-Group Method
Using Centroid approach), before root unlike UPGMC, in the mass center of calculating group, by two of synthesis group
Divide according to identical weight calculation, that is to say, that calculated mass center actually forms being averaged for the two-part mass center of the group
Value.Intermediate value distance formula (26) can be expressed as follows:
Chain (weighted average linkage), the also known as WPGMA (Weighted for weighted average
Pair-Group Method using Average approach), it is chain to be similar to intermediate value, but when calculating class spacing to away from
From plus the weight for being equivalent to membership's inverse in class.Weighted average is chain formula (27) to express as follows:
Based on the above, special, using two-dimentional Euclidean distance as " distance (similitude) measurement criterion ", call based on from
Poor quadratic sum increment minimum criteria (i.e. Ward is chain) is used as " Cluster merging criterion ", the hierarchical clustering algorithm (one of the coagulation type
As referred to as Ward clustering method) calculating process it is specific as follows:
Wherein, the following formula of the center of gravity of subgroup S and subgroup T (28) to formula (31) calculates:
For any one subgroup T, the following formula of calculating (32) of sum of squares of deviations is shown inside subgroup:
When two subgroups S and T merge into new subgroup U, the following formula of distance (33) between existing subgroup R and new subgroup U
To shown in formula (37):
nU=nS+nT (33)
When two subgroups S and T merge into new subgroup U, the caused following formula of sum of squares of deviations increment (38) is calculated:
When S and T merge into new subgroup U when subgroup, sum of squares of deviations increment such as formula (39) recurrence is caused between subgroup R and U:
When subset adjusts (when adjusting point j to subgroup T from subgroup S), generates the following formula of increment (40) and calculates:
At this point, if setting the subgroup that subgroup R removes point j as subgroup S, the above Adjusted Option can be summarized as follows: needle
It is that belong to subgroup R or belong to subgroup T to point j, specifically calculates following formula (41) to shown in formula (43):
nR=nS-1 (41)
Formula (43) as above is as it can be seen that at a distance from the point j and subgroup R and in the case where being equidistant of point j and subgroup T, such as
Point j is belonged to subgroup R rather than subgroup T by fruit, then the number at the subgroup midpoint R have to it is fewer than the number at the subgroup midpoint T.It changes
Yan Zhi, " hierarchy clustering method (i.e. Ward method) based on sum of squares of deviations increment " has " balances subgroup in the same circumstances
The distinct characteristic of the number of point ".
Herein, specifically by taking the patients with infectious diseases spatial distribution cluster analysis of Ningbo City, Zhejiang Province as an example, to give
The present invention proposes that the specific example of method illustrates.
Ningbo, abbreviation river in Zhejiang Province are Vico-provincial Cities, cities specifically designated in the state plan, the fourth-largest port city of the world.Ningbo is located in southeast edge
Sea is located at China's Mainland coastline middle section, the Yangtze River Delta south wing, and it is natural barrier that, which there are Zhoushan Islands in east, on the north of Hangzhou Wan, west
The Shengzhou, Xinchang, Shangyu for meeting Shaoxin City border on Sanmen Wan in the south, and are connected with three of Taizhou, balcony, and attached drawing 8 gives Ningbo City
It is divided in the spatial position in China and Zhejiang Province and the space of administrative division.
The administrative division in China can be divided into multistage, in general, include national (i.e. Chinese), provincial (such as Zhejiang Province),
Prefecture-level (such as Ningbo City), area/county/county-level city's rank (being labeled as Level 1 herein, be abbreviated as Lv1), street/town/township level are other
(being labeled as Level 2 herein, be abbreviated as Lv2), community/residential block/administrative village rank (herein labeled as Level3, are abbreviated as
Lv3), spatial network rank (being labeled as Level 4 herein, be abbreviated as Lv4), the plane X-coordinate of individual and Y coordinate (are marked herein
It is denoted as Level 5, is abbreviated as Lv5), as shown in Fig. 9.
Based on the above, giving Ningbo City includes elaborating for administrative division, as shown in Fig. 10.It is specific as follows:
For the division of complete Ningbo City's administrative division, in total comprising 6 areas (i.e. Haishu District, Jiangdong District, Jiangbei District,
Zhenhai District, Yinzhou District, Beilun District), 3 county-level cities (i.e. Cixi City, Yuyao City, Fenghua City), 2 counties (i.e. Ninghai County, Xiangshan County),
Specifically the case where, is as follows:
(1) it is directed to Haishu District, contains 8 streets, respectively south gate street, the street Jiang Sha, west gate street, lunar lacus street
Road, drum tower street, white clouds street, the street Duan Tang, the street Wang Chun;
(2) it is directed to Jiangdong District, contains 8 streets, respectively hundred zhang of streets, the street Ming Lou, white crane street, Dong Liujie
Road, the street Dong Sheng, eastern suburb street, the street Fu Ming, the street Xin Ming;
(3) be directed to Jiangbei District, contain 7 streets and 1 town, respectively the street Zhong Ma, white sand street, the street Kong Pu,
Culture and education street, the street Hong Tang, the street Zhuan Qiao, the street Yong Jiang, kind cities and towns;
(4) be directed to Zhenhai District, contain 4 streets and 2 towns, respectively camel street, the street Zhuan Shi, the street Jiao Chuan,
Recruit Golconda street, the town Jiu Longhu, the Pu Creek town;
(5) it is directed to Yinzhou District, 7 streets, 17 towns, 1 township is contained, is the street Xia Ying, the street Zhong Gongmiao, stone respectively
The street ?, the street Mei Xu, the street Zhong He, first South Street road, the street Pan Huo, the town Zhan Qi, the town Xian Xiang, the town Tang Xi, Lake Dongqian town, Wu
Town, five small towns, the town Qiu Ai, Yunlong town, the town Heng Xi, the town Jiang Shan, high bridge town, rank street town, the town Ji Shigang, the town Gu Lin, the town Dong Qiao, Yin
The town Jiang Zhen, Zhang Shui, the township Long Guan;
(6) it is directed to Beilun District, contains 7 streets, 2 towns, 1 township, is brace street respectively, the street Qi Jiashan, new
The street ?, the big street ?, Xiapu street, the street Chai Qiao, Daxie street, the town Chun Xiao, white peak town, plum mountain area;
(7) be directed to Cixi City, contain 5 streets and 15 towns, be respectively the street Zong Han, the street Kan Dun, the street Hu Shan,
The street Gu Tang, white sand road street, the town Zhou Xiang, long river town, the town An Dong, the town Chong Shou, Yokogawa town, the town Xin Pu, the town Sheng Shan, the town Yao Lin,
The town Kuang Yan, attached Hai Zhen, end of the bridge town, Baywatch Wei Town, the town Zhang Qi, Longshan Town, Tianyuan town;
(8) it is directed to Yuyao City, includes 6 streets, 14 towns, 1 township, is Ditang street, the street Lang Xia, Yang Mingjie respectively
Road, Lanjiang River street, Fengshan street, the street Li Zhou, the town Huang Jiabu, the town Lin Shan, the town Mu Shan, the town Si Men, the town Ma Zhu, little Cao pretty young woman town,
The town Liang Nong, the town Zhang Ting, the town Lu Bu, the town great Lan, Siming Shan town, Radix Notoginseng town, the town He Mudu, the town great Yin, the township Lu Ting;
(9) it is directed to Ninghai County, 4 streets, 11 towns, 3 townshiies is contained, is allosaurus street, Land of Peach Blossoms street, plum forests respectively
It is street, end of the bridge Hu street, the town Chang Jie, the town Li Yang, a town, branch road town, the town Qian Tong, the town Sang Zhou, the town Huang Tan, the town great Jia He, strong
Flood dragon town, the town Xi Dian, the depth town Quan, the township Hu Chen, the township Cha Yuan, the township Yue Xi;
(10) be directed to Fenghua City, contain 5 streets and 6 towns, be respectively silk screen street, the street Yue Lin, Jiangkou street,
The street Xi Wu, the street Xiao Wangmiao, Xikou Zhen, still Tian Town, the town Chun Hu, fur coat villages and small towns, the town great Yan, the town Song Ao;
(11) it is directed to Xiangshan County, 3 streets, 10 towns, 5 townshiies is contained, is Dandong street, Dan Xijiedao, the rank of nobility respectively
Small stream street, the town Shi Pu, Western Zhou Dynasty town, the town He Pu, the town Xian Xiang, the top of a wall town, the Sizhou town Tou Zhen, Ding Tang, the town Tu Ci, the town great Xu, new bridge
Town, the township Dong Chen, the township Xiao Tang, the township Huang Bi Ao, the township Mao Yang, the township Gao Tangdao;
In other words, as shown in dotted portion in attached drawing 9, Ningbo City includes 11 areas/county/county-level city (Lv1) and 153 in total
Street/town/township (Lv2).In fact, being directed to " street/town/township ", more careful administrative division can also be given and divided, i.e., into
One step is divided into community/residential block/administrative village (Lv3), typically such as: (1) being directed to the south gate street of Haishu District, contain 11
Community (such as community Cheng Lang, the community Liu Jin, Wan An community, the community Hong Qi, the community Zhou Jiangan, southern exposure community, station community);
(2) the kind cities and towns of Jiangbei District are directed to, 6 communities (such as community Gu Xiang, the community Jing Ming), 5 residential block (such as Ci Dongju are contained
Residential block Min Qu, Miao Shan etc.), 37 administrative villages (such as village Ci Hu, Dongshan village).More precisely, the details of administrative division is retouched
Community space grid (Lv4) can also further be arrived in detail by stating degree (LOD, Levels of Details), until final accurate
Navigate to the plane X-coordinate and Y coordinate (Lv5) of each individual.It is worth noting that for " community/residential block/administrative village
(Lv3) " and the division mode of " community space grid (Lv4) " in specific subsequent processing with more than " street/town/township (Lv2) "
It is substantially similar, then divide content on excessively in detail, think herein researching value less so omit.
In general, for the LOD of administrative division, that is, include Lv1 (area/county/county-level city), Lv2 (street/town/township),
Lv3 (community/residential block/administrative village), Lv4 (community space grid), Lv5 (individual space coordinate) this 5 grades.Institute as above
It states, in actual treatment, relates generally to Lv1 (area/county/county-level city, Ningbo City totally 11), Lv2 (street/town/township, Ningbo City
Totally 153), Lv5 (individual space coordinate, about 5000 diseased individuals of 2011 annual data of Ningbo City record) this 3 grades.At this
In, Lv1 referred to as " area grade (district level) ", Lv2 referred to as " street-level (street convenient for subsequent expression
Level) ", Lv5 abbreviation " individual grade (individual level) ".
Meanwhile the case where providing China's infectious disease disease herein explanation.According to " People's Republic of China's Prevention of Infectious Diseases
Method " and " law on the prevention and control of infectious diseases implementing regulations " for the infectious disease type in China be divided into category A infectious disease, Category B notifiable disease, third
This three categories of class infectious disease.More careful disease classification is specific as follows (as shown in Fig. 11):
(1) it is directed to category A infectious disease, can be subdivided into the plague, cholera, totally 2 kinds;
(2) it is directed to Category B notifiable disease, can be subdivided into virus hepatitis, morbilli, pertussis, stranguria syndrome, malaria, syphilis steps on
Leather heat, anthrax, infectiousness atypia hepatitis, diphtheria, neo-nataltetenus(NNT), AIDS, brucellosis, scarlet fever, rabies,
Typhoid and paratyphoid, meningococal meningitis, Hemorrhagic fever, human hepatic stellate cell, popular B-mode brain
Inflammation, bacillary and amebic dysentery, pulmonary tuberculosis, leptospirosis, snail fever, polio, totally 25 kinds;
(3) it is directed to Class C infectious disease, can be subdivided into influenza, mumps, acute hemorrhagic conjunctivitis,
Rubeola, leprosy, popular and matlazahuatl, kala-azar, echinococcosis, filariasis remove cholera, bacillary and amoeba
Property dysentery, the infectious diarrhea disease other than Typhoid and paratyphoid, hand-foot-and-mouth disease, totally 11 kinds;
Particularly, in patients with infectious diseases data herein, be related to data be Ningbo entirely within the scope of big city (as sky
Between scale), the patient informations of 2011 annual (as time scale) all kinds of infectious diseases, about 5000 records in total, wherein
The infectious disease type being actually related to is as shown in ticking in attached drawing 11 and reference numerals.As shown in Fig. 11, why herein
Infectious disease type only have in the legal 38 kinds of infectious diseases in China 11 kinds (as in Figure 11 tick and reference numerals shown in, wherein Class A
Middle reality is without reference to being actually related to 7 kinds, be actually related to 4 kinds in Class C in Class B;According to patient's number descending sort such as attached drawing
In 11 shown in numeral mark), this is mainly attributed to China, Zhejiang Province, Ningbo City and many years in terms of Prevention of Infectious Diseases makes great efforts work
Make, so that many of the above infectious disease has been eliminated completely or Eradication or maintained extremely a small number of in Ningbo City
(as shown in strikethrough in attached drawing 11), concrete condition is as follows:
(1) 1940 year Anti-Japanese War, Ningbo City, Zhejiang Province and surrounding cities are unanimously by the plague, cholera, anthrax
Invasion, these viruses and the bacterium overwhelming majority are launched by Japanese army, until the above infectious disease of founding of New in 1949 is
Effective control is obtained;
(2) be directed to polio, in Ningbo City until ability Eradication in 1991, while herein in case not yet
Occur;
(3) it is directed to highly pathogenic bird flu, 2 cases of most initial in Ningbo City until just detect in January, 2014, simultaneously
Also do not occur in case data herein;
(4) it is directed to Japanese Type-B encephalitis, in Ningbo City until being just effectively controlled for 2010, annual patient is controlled
System is coincide in units, the situation with case data situation here;
(5) the case where dengue fever, is similar with Japanese Type-B encephalitis;
(6) be directed to diphtheria, be not just found in Ningbo City from after 1989, at the same herein in case data not yet
Once occurred;
(7) it is directed to snail fever and malaria, is destroyed respectively in Ningbo City in 1972 and 1989, while case herein
Number of cases does not also occur in;
(8) be directed to leprosy, obtained Eradication in 1997 in Ningbo City, at the same herein in case data not yet
Occur;
(9) it is directed to typhus, in Ningbo City recent years also without correlation report, the situation and case number of cases herein
According to identical;
(10) it is directed to kala-azar and echinococcosis, situation is similar with typhus, and especially the former is in China in 1958 years
By Eradication, the latter does not have case to be found in Zhejiang Province, and above situation is coincide with case data situation herein;
(11) it is directed to filariasis, was destroyed in Ningbo City in 1997, while not occurring in this paper case data.
Meanwhile the Eradication of above section infectious disease or elimination completely will also be attributed to the fact that China " planned immunization " works.
Specifically, China since founding of New, has reinforced work in terms of disease prevention and cure, rigid and absolute enforcement immunization work, each province
Actively implement.Wherein, Zhejiang Province is specifically included for the immunization procedures of children: for preventing the second of virus B hepatitis
Liver vaccine (HBV), for preventing BCG vaccine lungy (BCG), the spinal cord ash vaccine (PV) for guarding against poliomyelities, use
Vaccine (DPT) is broken, for preventing morbilli, rubeola, the popular parotid gland in one hundred days of prevention pertussis, diphtheria, neo-nataltetenus(NNT)
Scorching numb cheek wind vaccine (MMR), the Vaccinum Encephalitidis Epidemicae (JEV) for preventing Japanese Type-B encephalitis, meningococal meningitis epidemic meningitis
Vaccine (MCV), the Aimmugen (HAV) for preventing viral hepatitis type A.
Give above (1) China's administrative division structure setting Ningbo City, Zhejiang Province reality implement and (2) China
Infectious disease type is really related to illustrating for type in present case.
It is worth noting that, the spatial aggregation analysis for giving infectious diseases in Ningbo patient in 2011 specific works it
Before, it is more accurate to calculate and describing, it provides following pretreatment work: herein, using index PNID in following formula (44)
Instead of the PN in above formula (1), the PD in above formula (1) is replaced using the index PDID in following formula (44), is used
Index HRID in following formula (45) replaces the index HR in above formula (2).Compared with three original indexs, the above substitution
Three indexs more for practical application meaning.
Specifically, PNID represents the number of patients with infectious diseases, it is Patient Number of Infectious
The abbreviation of Disease;PDID represents the density of patients with infectious diseases, it is Patient Density of Infectious
The abbreviation of Disease;HRID represents ratio of the patients with infectious diseases in all populations, it is Hospitalization Rate
The abbreviation of of Infectious Disease.Following formula (44) and formula are distinguished in the calculating of index PDID and index HRID
(45) shown in:
In addition to the above, remainder formula (i.e. formula (3) to formula (43)) remains unchanged.
Using step 1 (the preliminary judgement of the space clustering existence based on Statistic map of grades), three above index is obtained
In the Preliminary visualization result (shown in attached drawing 12 (a)-(c)) and three above index of area/county/county-level city's rank (totally 11)
In the Preliminary visualization result (shown in attached drawing 13 (a)-(c)) of street/town/township level not (totally 153) and its central area part
Enlarged drawing (shown in such as attached drawing 13 (d)-(f)), corresponding statistical result can tentatively judge as listed in table 1, from above, and 2011 years peaceful
The citywide Bo Shi patients with infectious diseases spatial distribution is doubtful there are spatial aggregation, and doubtful is gathered in central city.
Using step 2 (accurate judgement of the space clustering existence based on spatial autocorrelation), three above index is obtained
(overall situation Moran I coefficient is used, such as table 2 in the accurate judgement of street/town/township level not space clustering existence of (totally 153)
In the 2nd, the 4th, the 6th column shown in), obtain accurate judgement the result is that " there is cluster (confidence level 99%) ".Herein, it walks
Rapid two accurate judgement the result is that the preliminary judging result of above step one confirmation.
Using step 3 (accurate judgement of the space clustering type based on spatial autocorrelation), further, obtain above
Three indexs (use overall situation Getis-Ord system in the accurate judgement of street/town/township level not space clustering type of (totally 153)
Number, as shown in the 3rd, the 5th, the 7th column in table 2), obtain accurate judgement the result is that " high level clusters (confidence level 99%) ",
Herein, the accurate judgement of step 3 is the refinement that above step two judges.
Using step 4 (the accurate division in the space clustering region based on spatial autocorrelation), obtains three above index and exist
The standard of high level aggregation zone (Gao-high cluster) and low value aggregation zone (low-low cluster) of street/town/township level not (totally 153)
It really divides (shown in such as attached drawing 14 (a)-(c)).Herein, the specific regional assignment of step 4 is the specific of the judgement of above step three
Change.
Using step 5 (the accurate division of the space clustering spatial abnormal feature based on spatial autocorrelation), obtain for PNID word
Section calculate cluster and the statistical conditions of abnormal area (such as attached drawing 15 (a) and (d) shown in, according to LMI ZScore descending row in table 3
Column);It is similar, can also obtain calculating for HRID field cluster and abnormal area statistical conditions (such as attached drawing 15 (c) and
(f) shown in, arranged in table 4 also according to LMI ZScore descending) and for PDID field calculate cluster and the system of abnormal area
Count situation (shown in such as attached drawing 15 (b) and (e), table omits as space is limited).For above 3 tables, Gao-high cluster can be found
Street/town/the township the most apparent (high-high cluster), the street Dong Liu, the street Ming Lou including Jiangdong District, Jiangbei District
Culture and education street, the street stone ?, the town Jiang Shan, the town Gu Lin of Yinzhou District, the street Jiao Chuan of Zhenhai District, the street Zhuan Shi, camel street,
Recruit Golconda street.Simultaneously, it was found that there are this low value (low-high outlier) abnormal conditions of the street Jiang Sha of Haishu District.
Herein, step 5 is the further supplement of above step four.
Using step 6 (precise positioning in the space clustering region based on clustering algorithm), made using " two-dimentional Euclidean distance "
For " distance metric criterion ", while different " Cluster merging criterion " is taken, obtained shown in result such as attached drawing 16 (a)-(g).Its
In, choose the Ward hierarchy clustering method (such as shown in attached drawing 16 (g)) for having " balance Number of Subgroups " distinct characteristic, obtain as
Lower 4 level-one density centers (being recorded as C1, C2, C3, C4 respectively) and 6 second level density centers (be recorded as respectively c1, c2, c3,
C4, c5, c6), as shown in table 5.It is specific as follows:
For C1, be located at the station area of Reuter in the trick Golconda street of Zhenhai District, Shengli road community, Zong Pu bridge community, after
Street community, suitable grand community, the community Xi Menshequ, Bai Long, the prison community ?.For C2, it is located at the street Ming Lou of Jiangdong District
With the community Jing Jia, the community Xu Jia, the community Xu Rong, the community Jin Yuan, the mill the Dong Liu community, Tai Koo Shing society of the intersection in the street Dong Liu
Area, community of living in peace, center community.For C3, be located at Yinzhou District Shi ?street Shi ?community, east community, new district society
Area, the village stone ?, the village Tang Xi, the village Hou Cang, the village Yue She, the village Che Hedu, the village Lian Feng.For C4, it is located at the town Jiang Shan of Yinzhou District
The age of the small town community, the residential block Jiang Shan, the residential block Shi Shan, the village Qiang Nong, Dongguang village, the village Yu Jia, the upper village Zhang Cun, Hou Mao, Chen Jiatuan
Village, the village Fan Shidu.
For c1, be located at the street Jiao Chuan of Zhenhai District the community Yu Fan, near a river community, five communities Li Pai, clear water Pu village,
The village Zhong Guanlu.For c2, it is located at the community Zhuan Shi, the area of emerging village Reuter, connection Xing Cun in the street Zhuan Shi of Zhenhai District.For c3,
Positioned at the community Sheng Jia in the camel street of Zhenhai District, the village Luo Xing, village of respecting virtue, Jinhua village, excess-three village.For c4, it is located at the north of the Changjiang River
The mill the Shuan Dong community in the culture and education street in area, community of cultivating people of ability, the community Cui Dong, the community great Zha, the community Bei Anqinsen.For c5, position
In the West Lake community in the town Gu Lin of Yinzhou District, the residential block Gu Lin, the village Gu Lin, the village Shi Jia, the village Guo Xia, the village Dai Jia, the village Feng Li, total
Village, the village Bu Zheng, the village Bao Jia, the village Zhe Jiao, the village Zhong Yi, the West Gang Cun, Song Yan Wang Cun.For c6, it is located at the Dan Dongjie of Xiangshan County
The park community of road and red West Street road intersection, East Street community, Tashan Mountain community, enterprising village, village, the village Qi Chun outside east gate, and
North Road community, newly built community, north gate village, square well head village, the village Wu Feng.
Herein, keep almost the same based on the calculated result of step 6 and five result of above step, and after the former is
Person's refinement.
Table 1. (is limited to only to list Hai Shu, the east of a river, the north of the Changjiang River, town for patients with infectious diseases in the rough estimates of each administrative division
Sea)
Table 2. uses the calculating of Global Moran I and Global Getis-Ord index for Infectious Diseases Data
Table 3. calculates the statistical data of cluster with abnormal area for PNID field (according to LMI ZScore descending)
Table 4. calculates the statistical data of cluster with abnormal area for HRID field (according to LMI ZScore descending)
5. Density Estimator result of table is consistent with space clustering result
The invention patent is by subsidy in " digital mapping and territory Information application engineering country mapping geography information office emphasis are real
Test open research foundation Funded Projects (project number GCWD201801) .Funded by Open Research Fund in room
Program of Key Laboratory of Digital Mapping and Land Information Application
Engineering,NASG(National Administration of Surveying,Mapping and
Geoinformation) (Grant No.GCWD201801) " and " national natural science fund subsidy project (project approval
Number: 41601428) .Project Supported by National Natural Science Foundation of
China (Grant No.41601428) " and " Ningbo Institute of Technology, Zhejiang University scientific research starting project (project name: utilization
LADM realizes the design and modeling _ by taking Shenzhen City, Guangdong Province and Ningbo City, Zhejiang Province as an example of the unified registration of China's real estate) " and
" monitoring of Ministry of Land and Resources's urban land resource subsidizes project (KF-2016-02-001) with emulation key lab's open fund
.The Project Supported by the Open Fund of Key Laboratory of Urban Land
Resources Monitoring and Simulation,Ministry of Land and Resources(KF-2016-
02-001) " and " 2016 annual Zhejiang Province's post-doctor's scientific research projects subsidize on a selective basis Task application (Project items title: about
The unified registration modeling of China's real estate of LADM _ by taking Zhejiang Province as an example) " and " Mapping remote sensing technology information engineering state key experiment
Room Funded Projects and number (15I03) .Open Research Fund of Key Laboratory of Information
Engineering in Surveying, Mapping and Remote Sensing (15I03) " and " Ningbo City is towards life
Order intelligent big data engineer application innovation team (project number: 2016C11024) the .This patent is of health
supported by Ningbo Innovative Team:The intelligent big data engineering
application for life and health(Grant No.2016C11024)。
The invention patent is also by subsidy in " Ministry of Education's humanity social sciences research general data-youth fund (entry name
Claim: the method for safety monitoring research project number of the involuntary behavior crowd of fusion indoor location service and video analysis:
16YJCZH112) " and " Ningbo City's Natural Science Fund In The Light (project name, the huge traffic data based on NoSQL cloud database
Acquisition is studied with method for digging, project number: 2017A610118) ".
Claims (11)
1. a kind of multi-level cluster analysis method of point set for taking spatial position into account, it is characterised in that: include the following steps:
Step 1: the space clustering existence based on Statistic map of grades tentatively judges: being judged by hierarchical statistics drafting method
The whether doubtful presence of spatial aggregation;
Step 2: the space clustering existence accurate judgement based on spatial autocorrelation: passing through the overall situation in spatial autocorrelation coefficient
Moran I coefficient is to determine whether be implicitly present in spatial aggregation;
Step 3: the space clustering type accurate judgement based on spatial autocorrelation: if being implicitly present in spatial aggregation, passing through
Global Getis-Ord coefficient in spatial autocorrelation coefficient judges that space clustering type is high level cluster or low value cluster;
Step 4: the space clustering region based on spatial autocorrelation accurately divides: being clustered if it is high level, then pass through space from phase
Local Getis-Ord coefficient in relationship number accurately delimits the specific region of high level cluster;It clusters, then passes through if it is low value
Local Getis-Ord coefficient in spatial autocorrelation coefficient accurately delimits the specific region of low value cluster;
Step 5: the aggregation spatial abnormal feature based on spatial autocorrelation accurately divides: by the part in spatial autocorrelation coefficient
Moran I coefficient accurately to mark off the abnormal area other than high level aggregation or low value aggregation;
Step 6: the space clustering region based on clustering algorithm includes the accurate positioning of point: by Spatial Clustering come accurate
The point that located space aggregation zone is included.
2. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 1, feature exist
In: in step 1, the hierarchical statistics drafting method are as follows: according to the statistics of each region dividing unit, according to the density of phenomenon,
Intensity or development level carry out divided rank, then according to rank height, fill out by zoning that draw the depth different respectively on map
Color or the different warp of density, to show the difference between each region dividing unit.
3. according to claim 1 or a kind of multi-level cluster analysis side of point set for taking spatial position into account as claimed in claim 2
Method, it is characterised in that: hierarchical statistics drafting method is used to three indexs, i.e. index PN, index PD and index HR, wherein
PN indicates the number of specific crowd, and PD indicates the density of specific crowd, and HR indicates ratio of the specific crowd in all populations, with
The specific calculating of upper three indexs is as shown in following formula (1) and formula (2):
Wherein, aa(i)Indicate the face domain size of i-th of administrative division, cn(i)Indicate the population base of i-th of administrative division.
4. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 1, feature exist
In: in step 2, shown in the following formula of calculating (3) and formula (4) of global Moran I coefficient:
Wherein, ziIt is the difference of feature i attribute value Yu its intermediate valuewi,jIt is the space weight of feature i Yu feature j, n is special
The total number of sign, S0It is the summation of all space weights.
5. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 1, feature exist
In: in step 3, shown in the following formula of calculating (5) of global Getis-Ord coefficient:
Wherein, xiAnd xjIt is the attribute value of feature i and feature j, wijIt is the space weight of feature i and feature j, n is special in data set
The total number of sign,Indicate that feature i and feature j cannot be the same feature.
6. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 1, feature exist
In: in step 4, the following formula of calculating (6) of local Getis-Ord coefficient is to shown in formula (8):
Wherein, xjIt is the attribute value of feature j, wi,jIt is the space weight of feature i and feature j, n is feature total number.
7. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 1, feature exist
In: in step 5, shown in the following formula of calculating (9) of local Moran I coefficient:
Wherein, xiIt is the attribute value of feature i,It is the intermediate value of corresponding attribute, wi,jIt is the space weight of feature i and feature j, n is
The total number of feature.
8. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 1, feature exist
In: global Moran I coefficient that step 1 is related into step 6, overall situation Getis-Ord coefficient, part Getis-Ord system
Several and part Moran I coefficient is required to use using to the calculation of Spatial weight matrix, the Spatial weight matrix
To strategy include: anti-distance strategy, anti-square distance strategy, fixed range strategy, indifference region strategy, the closest plan of K
Slightly, the adjacent strategy in side, the adjacent strategy of edge point and Delaunay triangulation network strategy.
9. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 1, feature exist
In: in step 6, the Spatial Clustering is bottom-up blending algorithm, the tool of the bottom-up blending algorithm
Body method includes the following steps:
(1), regard n spatial point as n subgroup, i.e., each subgroup only has 1 spatial point, then according to selected cluster
Merging criterion calculates the relationship between this n subgroup;
(2), subgroup two-by-two is classified as by a new subgroup according to cluster subgroup merging criterion, has thus obtained n-1 subgroup;
(3), class statistic amount between any two in n-1 subgroup is recalculated, continuation continues according to the above same criterion
Subgroup merges, then obtains n-2 subgroup;
(4), above step is repeated, and so on, until all subgroups complete to merge, ultimately form 1 big subgroup.
10. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 9, feature exist
In: include the similitude judgement between spatial point in bottom-up blending algorithm, is measured by using distance metric criterion
Similitude between spatial point, the distance metric criterion are as follows: setting vector x has j different dimensions, then two different vectors
Various distances between individual xs and xj are calculated as follows:
(1) Minkowski distance, shown in the following formula of the calculating of Minkowski distance (11):
Wherein p indicates Minkowski index;
(2) city block distance, when p value is 1 in Minkowski distance, special case turns to city block distance, the meter of city block distance
It calculates shown in following formula (12):
City block distance in two-dimensional space can further following formula (13) calculate:
dst=| xs-xt|+|ys-yt| (13)
City block distance in three-dimensional space can further following formula (14) calculate:
dst=| xs-xt|+|ys-yt|+|zs-zt| (14)
(3) Euclidean distance, when p value is 2 in Minkowski distance, special case turns to common Euclidean distance, it is European away from
From the following formula of calculating (15) shown in:
Euclidean distance in two-dimensional space can further following formula (16) calculate:
Euclidean distance in three-dimensional space can further following formula (17) calculate:
(4) Chebyshev's distance, when p value is infinitely great in Minkowski distance, special case turns to Chebyshev's distance,
Shown in the following formula of the calculating of Chebyshev's distance (18):
Chebyshev's distance in two-dimensional space can further following formula (19) calculate (such as shown in attached drawing 6 (c)):
dst=max (| xs-xt|,|ys-yt|) (19)
Chebyshev's distance in three-dimensional space can further following formula (20) calculate:
dst=max (| xs-xt|,|ys-yt|,|zs-zt|) (20)
11. a kind of multi-level cluster analysis method of point set for taking spatial position into account according to claim 9, feature exist
In: above-mentioned cluster subgroup merging criterion refers to: judge whether two subgroups should merge according to the distance between two subgroups,
If can merge, choose the two subgroups and merge, setting two subgroups is subgroup r and subgroup s respectively, in two subgroups
Object number is respectively nr and ns, then, Cluster merging criterion can be specifically arranged as follows:
It is chain for list, the similarity matrix or distance matrix of data are used, defining between class distance is data between two classes
Minimum range, it is single it is chain can following formula (21) expression:
D (r, s)=min (dist (xri,xsj)),i∈(i,...,nr),j∈(1,...,ns) (21)
For complete chain, the similarity matrix or distance matrix of data are used, between class distance data between two classes are defined
Maximum distance, entirely chain formula (22) to express as follows:
D (r, s)=max (dist (xri,xsj)),i∈(1,...,nr),j∈(1,...,ns) (22)
For a group average linkage, using the similarity matrix or distance matrix of data, definition between class distance is between class distance number
According to the average value of distance two-by-two, organize average linkage formula (23) can express as follows:
For centroid distance, from distance matrix and initial data, definition distance is two-dimentional Euclidean distance, this distance is individual
With the quality distance of group or the centroid distance of group and group, the following formula of centroid distance (24) is expressed:
It is chain for Ward, it is intended to make the increment of the sum of squares of deviations in group minimum in each step, Ward is chain as follows
Formula (25) simplifies expression:
It is chain for intermediate value, in the mass center of calculating group, by two parts of synthesis group according to identical weight calculation, that is, calculate
Mass center be actually the average value for forming the two-part mass center of the group, intermediate value is chain can following formula (26) expression:
It is chain for weighted average, the weight for being equivalent to membership's inverse in class is added to distance when calculating class spacing,
Weighted average is chain formula (27) to express as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810696862.0A CN109117861B (en) | 2018-06-29 | 2018-06-29 | Point set multi-level aggregative property analysis method considering spatial position |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810696862.0A CN109117861B (en) | 2018-06-29 | 2018-06-29 | Point set multi-level aggregative property analysis method considering spatial position |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109117861A true CN109117861A (en) | 2019-01-01 |
CN109117861B CN109117861B (en) | 2022-01-25 |
Family
ID=64822365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810696862.0A Expired - Fee Related CN109117861B (en) | 2018-06-29 | 2018-06-29 | Point set multi-level aggregative property analysis method considering spatial position |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109117861B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705606A (en) * | 2019-09-12 | 2020-01-17 | 武汉大学 | Spatial K-means clustering method based on Spark distributed memory calculation |
CN111403048A (en) * | 2020-03-18 | 2020-07-10 | 唐宓 | Unknown infectious disease early warning and tracing method |
CN111813905A (en) * | 2020-06-17 | 2020-10-23 | 平安科技(深圳)有限公司 | Corpus generation method and device, computer equipment and storage medium |
CN113075648A (en) * | 2021-03-19 | 2021-07-06 | 中国舰船研究设计中心 | Clustering and filtering method for unmanned cluster target positioning information |
CN113298302A (en) * | 2021-05-18 | 2021-08-24 | 昆明理工大学 | Irregular shape space-time scanning method aiming at disease prediction |
CN113299388A (en) * | 2021-05-12 | 2021-08-24 | 吾征智能技术(北京)有限公司 | System for cross-modal medical biological characteristic cognitive diseases based on fever with rash |
CN113449749A (en) * | 2020-03-25 | 2021-09-28 | 日日顺供应链科技股份有限公司 | Goods space height determining method and system |
CN117648657A (en) * | 2023-12-13 | 2024-03-05 | 青岛市建筑设计研究院集团股份有限公司 | Urban planning multi-source data optimization processing method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102426806A (en) * | 2011-11-07 | 2012-04-25 | 同济大学 | Regional rail network UAV cruise method based on dynamic cell division |
US9430499B2 (en) * | 2014-02-18 | 2016-08-30 | Environmental Systems Research Institute, Inc. | Automated feature extraction from imagery |
-
2018
- 2018-06-29 CN CN201810696862.0A patent/CN109117861B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102426806A (en) * | 2011-11-07 | 2012-04-25 | 同济大学 | Regional rail network UAV cruise method based on dynamic cell division |
US9430499B2 (en) * | 2014-02-18 | 2016-08-30 | Environmental Systems Research Institute, Inc. | Automated feature extraction from imagery |
Non-Patent Citations (4)
Title |
---|
CHANG-BIN YU ET.AL: "BEHAVIOR ANALYSIS OF EPIDEMIOLOGICAL PATIENTS FOR MEDICAL SITE TREATMENT FROM A SPATIAL PERSPECTIVE", 《2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS》 * |
ERIC FOX ET.AL: "Spatial Analysis of High Resolution Aerial Photographs to Analyze the Spread of Mountain Pine Beetle Infestations", 《JOURNAL OF SUSTAINABLE DEVELOPMENT》 * |
李俊磊 等: "相似度计算及其在数据挖掘中的应用", 《电脑知识与技术》 * |
查文婷 等: "基于GIS技术分析长沙市2006-2013年手足口病流行病学特征分析", 《中华疾病控制杂志》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705606A (en) * | 2019-09-12 | 2020-01-17 | 武汉大学 | Spatial K-means clustering method based on Spark distributed memory calculation |
CN111403048A (en) * | 2020-03-18 | 2020-07-10 | 唐宓 | Unknown infectious disease early warning and tracing method |
CN113449749A (en) * | 2020-03-25 | 2021-09-28 | 日日顺供应链科技股份有限公司 | Goods space height determining method and system |
CN113449749B (en) * | 2020-03-25 | 2023-02-17 | 日日顺供应链科技股份有限公司 | Goods space height determining method and system |
CN111813905A (en) * | 2020-06-17 | 2020-10-23 | 平安科技(深圳)有限公司 | Corpus generation method and device, computer equipment and storage medium |
CN111813905B (en) * | 2020-06-17 | 2024-05-10 | 平安科技(深圳)有限公司 | Corpus generation method, corpus generation device, computer equipment and storage medium |
CN113075648A (en) * | 2021-03-19 | 2021-07-06 | 中国舰船研究设计中心 | Clustering and filtering method for unmanned cluster target positioning information |
CN113075648B (en) * | 2021-03-19 | 2024-05-17 | 中国舰船研究设计中心 | Clustering and filtering method for unmanned cluster target positioning information |
CN113299388A (en) * | 2021-05-12 | 2021-08-24 | 吾征智能技术(北京)有限公司 | System for cross-modal medical biological characteristic cognitive diseases based on fever with rash |
CN113299388B (en) * | 2021-05-12 | 2023-09-29 | 吾征智能技术(北京)有限公司 | Cross-modal medical biological characteristic cognitive disease system based on fever with rash |
CN113298302B (en) * | 2021-05-18 | 2022-06-28 | 昆明理工大学 | Irregular shape space-time scanning method aiming at disease prediction |
CN113298302A (en) * | 2021-05-18 | 2021-08-24 | 昆明理工大学 | Irregular shape space-time scanning method aiming at disease prediction |
CN117648657A (en) * | 2023-12-13 | 2024-03-05 | 青岛市建筑设计研究院集团股份有限公司 | Urban planning multi-source data optimization processing method |
CN117648657B (en) * | 2023-12-13 | 2024-05-14 | 青岛市建筑设计研究院集团股份有限公司 | Urban planning multi-source data optimization processing method |
Also Published As
Publication number | Publication date |
---|---|
CN109117861B (en) | 2022-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109117861A (en) | A kind of multi-level cluster analysis method of point set for taking spatial position into account | |
Yao et al. | A co-location pattern-mining algorithm with a density-weighted distance thresholding consideration | |
CN104063466A (en) | Virtuality-reality integrated three-dimensional display method and virtuality-reality integrated three-dimensional display system | |
Zhang et al. | Understanding urban dynamics from massive mobile traffic data | |
CN106981092B (en) | Priority-Flood-based internal flow domain extraction method | |
Li et al. | Dynamic changes of land use/cover and landscape pattern in a typical alpine river basin of the Qinghai‐Tibet Plateau, China | |
CN106326637A (en) | Link prediction method based on local effective path degree | |
CN109271441A (en) | A kind of visualization clustering method of high dimensional data and system | |
Ying et al. | Review of tourism ecological security from the perspective of ecological civilization construction | |
Nguyen et al. | An improved density-based approach to spatio-textual clustering on social media | |
Chen | Delineating the spatial boundaries of megaregions in China: A city network perspective | |
Deng et al. | Identification of urban spatial structure of pearl river delta urban agglomeration based on multisource spatial data | |
Li et al. | Prediction of urban domestic water consumption considering uncertainty | |
Bing et al. | Pre-Trained semantic embeddings for POI categories based on multiple contexts | |
Yang et al. | Classifying urban functional zones by integrating POIs, Place2vec, and LDA | |
Wang et al. | Shrinking or expanding? City spatial distribution and simulation analyses based on regionalization along the Yellow River | |
CN108427672B (en) | Method, terminal device and the computer readable storage medium of character translation | |
Deng et al. | General multidimensional cloud model and its application on spatial clustering in Zhanjiang, Guangdong | |
Xu et al. | Study on the spatial distribution characteristics of traditional villages in Chongqing and their influencing factors | |
Teng et al. | Survey on visualization layout for big data | |
Chang et al. | Landmark‐based summarized messages for flood warning | |
CN106033618B (en) | The automatic identification method in basin in a kind of dem data | |
Li et al. | Application of three-dimensional GIS to water resources | |
CN116933146B (en) | Classification system creation method and device for digital twin space entity | |
Redlich | Quantitative Analysis of Geomasking Methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220125 |