CN114492590A - Boundary channel generation method and device based on track clustering - Google Patents

Boundary channel generation method and device based on track clustering Download PDF

Info

Publication number
CN114492590A
CN114492590A CN202111640835.XA CN202111640835A CN114492590A CN 114492590 A CN114492590 A CN 114492590A CN 202111640835 A CN202111640835 A CN 202111640835A CN 114492590 A CN114492590 A CN 114492590A
Authority
CN
China
Prior art keywords
track
data
target
candidate
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111640835.XA
Other languages
Chinese (zh)
Inventor
张小康
王元卓
程伯群
王汝平
赵俊霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Science And Technology Big Data Research Institute
Original Assignee
China Science And Technology Big Data Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Science And Technology Big Data Research Institute filed Critical China Science And Technology Big Data Research Institute
Priority to CN202111640835.XA priority Critical patent/CN114492590A/en
Publication of CN114492590A publication Critical patent/CN114492590A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a border channel generation method and device based on track clustering, wherein the method comprises the following steps: the method comprises the steps of obtaining historical track data and border line longitude and latitude data of a target user, preprocessing the historical track data to obtain candidate track data, calculating based on the candidate track data and the border line longitude and latitude data to obtain target track data, clustering the target track data to obtain candidate track clusters, screening the candidate track clusters based on a preset strategy to obtain target track clusters, extracting a central line of the target track clusters, storing position information of the central line according to a preset format, processing and analyzing through the historical track data and the border line longitude and latitude data of the target user, monitoring track information of target personnel and cross-border behaviors in an abnormal behavior mode, and providing guarantee for the fields of personnel monitoring, border area control and the like.

Description

Boundary channel generation method and device based on track clustering
Technical Field
The disclosure relates to the technical field of data processing, in particular to a border channel generation method and device based on track clustering.
Background
In general, a border is understood to be a range of regions adjacent to a border, a national border, where illegal border crossing behavior often has adverse consequences.
Therefore, how to obtain the channel route of the border area becomes a technical problem to be solved.
Disclosure of Invention
In order to solve the technical problem, the present disclosure provides a boundary channel generation method and device based on trajectory clustering.
In a first aspect, an embodiment of the present disclosure provides a border channel generation method based on track clustering, including:
acquiring historical track data and border line longitude and latitude data of a target user;
preprocessing historical track data to obtain candidate track data;
calculating based on the candidate track data and the border line longitude and latitude data to obtain target track data;
clustering target track data to obtain candidate track clusters;
screening the candidate track clusters based on a preset strategy to obtain a target track cluster;
and extracting the central line of the target track cluster, and storing the position information of the central line according to a preset format.
In a second aspect, an embodiment of the present disclosure provides a border channel generation apparatus based on track clustering, including:
the first acquisition module is used for acquiring historical track data and border line longitude and latitude data of a target user;
the first processing module is used for preprocessing the historical track data to obtain candidate track data;
the computing module is used for computing based on the candidate track data and the border line longitude and latitude data to obtain target track data;
the second processing module is used for clustering the target track data to obtain candidate track clusters;
the screening module is used for screening the candidate track clusters based on a preset strategy to obtain target track clusters;
and the extraction module is used for extracting the central line of the target track cluster and storing the position information of the central line according to a preset format.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
in the embodiment of the disclosure, historical track data and border line longitude and latitude data of a target user are obtained, the historical track data are preprocessed to obtain candidate track data, calculation is performed based on the candidate track data and the border line longitude and latitude data to obtain the target track data, clustering processing is performed on the target track data to obtain candidate track clusters, the candidate track clusters are screened based on a preset strategy to obtain the target track clusters, a central line of the target track clusters is extracted, position information of the central line is stored according to a preset format, processing and analysis are performed through the historical track data and the border line longitude and latitude data of the target user, cross-border behaviors in an abnormal behavior mode of target personnel can be monitored, and guarantee is provided for the fields of personnel monitoring, border area control and the like.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flow chart of a border channel generation method based on track clustering according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a method for preprocessing historical track data according to an embodiment of the present disclosure;
fig. 3 is a schematic flow chart of another boundary channel generation method based on trajectory clustering according to an embodiment of the present disclosure;
fig. 4 is a schematic flow chart of a method for clustering target trajectory data according to an embodiment of the present disclosure;
fig. 5 is a general flow diagram of a border channel generation method based on track clustering according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a border channel generation apparatus based on track clustering according to the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
Fig. 1 is a schematic flow chart of a border channel generation method based on track clustering according to an embodiment of the present disclosure, including:
step 101, obtaining historical track data and border line longitude and latitude data of a target user.
The target user can be selected and set according to the application scene needs, for example, a user with cross border history is used as the target user, and different target users can be represented by user identification, that is, the target user can be uniquely identified by the user identification; the historical track data refers to a track point set moved by a target user corresponding to each time point in a historical time period; the border line latitude and longitude data refers to the latitude and longitude corresponding to the lines dividing one area range and the other area range.
In the embodiment of the present disclosure, there are various ways to obtain the historical track data of the target user, for example, track data in a historical time period recorded by a device having a GPS (Global Positioning System) Positioning function carried by the target user is used as the historical track data of the target user, and then, for example, a mobile phone number networking mobile position of the target user is used as the historical track data of the target user; in another embodiment of the present disclosure, there are various ways to obtain the border line longitude and latitude data, for example, by using GIS (Geographic Information System) Geographic Information processing software, such as QGIS (Quantum GIS), ArcGIS, etc., to extract the longitude and latitude corresponding to the border line as the border line longitude and latitude data.
The above-mentioned manner of obtaining the historical track data and the border line longitude and latitude data of the target user is only an example, and the specific manner of obtaining the historical track data and the border line longitude and latitude data of the target user is not limited in the present disclosure.
And 102, preprocessing the historical track data to obtain candidate track data.
In the embodiment of the disclosure, in order to further improve the accuracy, integrity and consistency of data, the acquired historical track data needs to be preprocessed, in an embodiment mode, the historical track data is serialized, that is, sorted according to a time sequence, a track sequence corresponding to a target user is obtained, and then an abnormal value process with a latitude of zero or null in the track sequence is removed, so that candidate track data is obtained; in another embodiment, the historical track data is serialized, that is, sorted according to a time sequence, to obtain a series of track points corresponding to the target user, and then the target user is subjected to removal processing on the track points, the distance between the target user and the previous adjacent track point is smaller than a preset distance threshold value, within a preset time interval, to obtain candidate track data.
As an example, as shown in fig. 2, firstly, the historical trajectory data is serialized, that is, sorted according to a time sequence, to obtain a trajectory sequence corresponding to the target user, then, an abnormal value with a latitude of a zero value or a null value in the trajectory sequence is removed, the trajectory point is repeated to remove the duplicate, and candidate trajectory data is obtained by screening.
The two ways of preprocessing the historical trajectory data to obtain the candidate trajectory data are only examples, and the specific way of preprocessing the historical trajectory data to obtain the candidate trajectory data is not limited in the present disclosure.
It should be noted that the track sequence is composed of a series of track points, and each track point includes a user identifier, a longitude, a latitude, and a time.
And 103, calculating based on the candidate track data and the border line longitude and latitude data to obtain target track data.
In the embodiment of the disclosure, the target track data is obtained by calculating the candidate track data and the border line longitude and latitude data, specifically, the candidate track data and the border line longitude and latitude data are calculated by the longitude and latitude great circle distance to obtain the great circle distance, histogram statistics is performed on the candidate track data by adopting a statistical method to obtain a weight corresponding to each track point in the candidate track data, the great circle distance obtained from the candidate track data is less than or equal to a preset distance threshold, and the track point corresponding to each track point is greater than or equal to a preset weight threshold is taken as the target track data, so that the accuracy of obtaining the subsequent central point is further improved.
And step 104, clustering the target track data to obtain candidate track clusters.
The Clustering processing refers to Clustering and merging adjacent similar classified areas by using a morphological operator, specifically, a Clustering algorithm can be selected and set according to needs, for example, a Density-Based Clustering algorithm of application with Noise (DBSCAN) is used for dividing target track data into different track clusters according to a Density reachability principle.
In the embodiment of the disclosure, the candidate track cluster can be obtained by clustering target track data, specifically, one track point is randomly selected from the target track data as a starting track point, all track points within a preset distance range from the starting track point are obtained, if the number of all track points is greater than or equal to a preset number threshold, the starting track point and all track points form one candidate track cluster, the starting track point is marked as visited, and then the previous step of operation is repeated to process all track points which are not marked as visited in the candidate track cluster; if the number of the track points is smaller than the preset number threshold, the starting track point is marked as a noise point, and it should be noted that if all the track points in the candidate track cluster are marked as visited, the non-visited points are continuously processed according to the above operation until all the track points in the target track data are visited, and finally the candidate track cluster is obtained.
And 105, screening the candidate track clusters based on a preset strategy to obtain a target track cluster.
In order to ensure the reasonability of the generation of the border channel, the candidate track clusters need to be screened through a preset strategy to obtain a target track cluster, and specifically, the preset strategy comprises: and screening to form one or more of track sequences with the number of users larger than a preset number threshold, track sequences with the longest circle distance not smaller than a preset distance threshold and track sequences with the screening time span not less than a preset number of days threshold.
As an example, the number of users corresponding to each track cluster in the candidate track clusters is obtained, and track clusters with the number of users greater than a preset number threshold are obtained as target track clusters.
And 106, extracting the central line of the target track cluster, and storing the position information of the central line according to a preset format.
The central line of the target track cluster is the track line of the target person near the border, specifically, the extraction of the target track line refers to setting a search window, starting translation on latitude coordinates of the target track cluster, recording track point data sets in the moving process, sequentially extracting track central point coordinates in the search window, generating a track central point coordinate set H, and drawing the central line of the target track cluster according to the set H data; the preset format refers to a floating point type that converts the longitude and latitude coordinates of the center line of the target track cluster into fixed numbers.
In the embodiment of the disclosure, a search window with a preset size is set, translation is started on a latitude coordinate of a target track cluster, the initial latitude of the search window is recorded as s, the end latitude is recorded as e, and a track point data set omega in the moving process is collectediAnd then sequentially extracting the coordinates of the track center point in the search window by sliding the search window to generate a track center point coordinate set H, drawing a center line according to data corresponding to the H, converting the longitude and latitude coordinates of the center line into a floating point type with fixed number of bits, calculating the coordinates of the center point of the center line, and finally inquiring the geographical position code and the Chinese address of the center point and storing the geographical position code and the Chinese address in a database form.
It should be noted that: set of trajectory point data ΩiThe calculation formula is as follows: omegai={(x,y)|si≤x≤eiY ∈ R }, wherein ΩiRepresenting a set of longitude and latitude coordinates in the window, the starting abscissa position s of the windowiWith a stop abscissa of ei(ii) a The calculation formula of the track central point coordinate set H is as follows:
Figure BDA0003443347590000061
wherein, | ΩjI denotes the number of trace points in the set, latiIndicating latitude, loniIndicating longitude.
The border channel generation scheme based on track clustering provided by the embodiment of the disclosure acquires historical track data and border line longitude and latitude data of a target user, preprocesses the historical track data to obtain candidate track data, calculates based on the candidate track data and the border line longitude and latitude data to obtain target track data, clusters the target track data to obtain candidate track clusters, screens the candidate track clusters based on a preset strategy to obtain the target track clusters, extracts a center line of the target track clusters, and stores position information of the center line according to a preset format The cross-border behavior of the abnormal behavior mode can timely manage and control related personnel, and the safety of border line management is further improved.
Fig. 3 is a schematic flow chart of another boundary channel generation method based on track clustering according to an embodiment of the present disclosure, including:
step 301, obtaining historical track data and border line longitude and latitude data of a target user.
It should be noted that step 301 is the same as step 101 described above, and specific reference is made to the description of step 101, and details are not described here.
Step 302, arranging the historical track data according to a time sequence to obtain a track sequence corresponding to the target user, and deleting track points with zero or null latitude and repeated track points in the track sequence to obtain candidate track data.
In the embodiment of the disclosure, the historical track data of the target users are sorted according to the time sequence to obtain corresponding track sequences, the track sequences include effective attributes of 4 items of user identification, longitude, latitude and time, the track points of the track sequence of each target user are set to be n, then the track sequence at the moment is an n × 4 matrix, and then the track points with zero or null latitude in the track sequence and the repeated track points are deleted to obtain candidate track data.
Specifically, the reason why the repetitive track points are generated is: interference and inconsistency of various factors such as a generation mode, frequency and a generation place of the historical track data of the target user and non-standardization of historical track data acquisition enable the historical track data of the same source to be repeated in time, and the historical track data of different sources to be overlapped in time, so that repeated track points are generated, and influence on mining of the historical track of the target user is achieved. In a specific embodiment of the present disclosure, the repetition points may be removed according to the periodic characteristics of data acquisition, for example, a fixed time interval is set for historical track data of the same source, for example, call ticket data, the fixed time interval is set to 10 minutes, and then, a track point within 10 minutes and having a distance of less than 500 meters from a previous adjacent track point is regarded as a repetition point, and is removed.
Step 303, performing longitude and latitude great circle distance calculation on the candidate track data and the border line longitude and latitude data to obtain a great circle distance, performing histogram statistics on the candidate track data to obtain a weight corresponding to each track point in the candidate track data, and obtaining track points with the great circle distance being less than or equal to a preset distance threshold value and the weight being greater than or equal to a preset weight threshold value from the candidate track data as target track data.
Specifically, the great circle distance is obtained by calculating the latitude and longitude great circle distance of the candidate track data and the latitude and longitude data of the border line, and the track data of all target users appearing near the border within a period of time can be screened out according to the great circle distance; the histogram statistics is carried out on the candidate track data, so that each track point in the candidate track data is provided with a weight value corresponding to each track point, and the high-frequency track points appearing in all target users within a period of time can be screened out according to the weight values, so that the target track data is screened out.
In a specific embodiment of the present disclosure, a preset distance threshold is set as a, a preset weight threshold is set as B, and if a great circle distance obtained from candidate trajectory data is less than or equal to a and a weight is greater than or equal to B, it is determined that the candidate trajectory data is a high-frequency trajectory point appearing in an area near a border of a target person, and the candidate trajectory data is taken as target trajectory data.
The longitude and latitude great circle distance calculation formula is as follows:
Haversine(X,Y)=R×accos(cos(x1-y1)cosx2cosy2+sinx2siny2);
wherein Haversene is a hemiversine formula, R is the radius of the earth, x1,y1Indicating the number of longitudes, x, of the point2,y2The number of picks of a dot is shown.
And step 304, randomly selecting one track point from the target track data as a starting track point, acquiring all track points within a preset distance range from the starting track point, if the number of all track points is larger than or equal to a preset number threshold, forming a candidate track cluster by the starting track point and all track points, and marking the starting track point as visited.
Specifically, a preset distance is set to be D, a preset number threshold value is set to be E, then one track point F is randomly selected from target track data to serve as a starting track point, all track points within a distance of D from the starting track point F are obtained, namely, all track points contained in a circle with the distance of F as a circle center and D as a radius are obtained, if the number of all track points is larger than or equal to E, the starting track point F and all the obtained track points meeting the conditions form a candidate track cluster, and the starting track point F is marked as visited.
And 305, repeating the previous operation to process all track points which are not marked as visited points in the candidate track cluster, if the number of all track points is smaller than a preset number threshold, marking the starting track point as a noise point, and if all track points in the candidate track cluster are marked as visited points, continuously repeating the operation to process the points which are not visited until all track points in the target track data are visited, so as to obtain the candidate track cluster.
And continuing to repeat the operation of the step 304, processing all track points which are not marked as accessed in the candidate track cluster, if the number of all track points is less than E, marking the starting track point as a noise point, and if all track points in the candidate track cluster are marked as accessed, continuing to repeat the operation of the step 304 to process the points which are not accessed until all track points in the target track data are accessed to obtain the candidate track cluster.
In the embodiment of the present disclosure, the target track data is clustered, specifically as shown in fig. 4, first, the target track data, a preset distance and a preset number threshold are input, the target track data is scanned, any track point P is selected, whether P is classified into a certain candidate track cluster or marked as a noise point is determined, if not, it is determined whether all track points within the preset distance range of the P point are smaller than the preset number threshold, if all track points within the preset distance range of the P point are smaller than the preset number threshold, the P point is marked as a boundary point or a noise point, if all track points within the preset distance range of the P point are greater than or equal to the preset number threshold, the P point is marked as an accessed candidate track cluster and a candidate track cluster is established, then all track points within the preset distance range of the P point are added to the candidate track cluster, and then Q points not marked within the preset distance range are checked, and finally, repeating the operation until all track points in the target track data are visited to obtain a candidate track cluster.
And step 306, screening one or more of track sequences with the number of users larger than a preset number threshold, track sequences with the longest great circle distance not smaller than a preset distance threshold and track sequences with the screening time span not smaller than a preset number of days threshold.
In the embodiment of the present disclosure, there are many preset policy manners for obtaining a target track cluster, for example, a preset number threshold is set to be L, and when the number of users is greater than L, a track sequence of the target track cluster is the target track cluster; in other embodiments, the preset distance threshold is set to be M, and when the longest great circle distance is not less than M, the track sequence is a target track cluster; in still other embodiments, the preset number of days threshold is set to be N, and when the time span is not less than N, the track sequence is the target track cluster.
And 307, generating a track central point coordinate set according to a formula, and extracting the central line of the target track cluster.
In the embodiment of the disclosure, the size of a search window is set to be h, translation is performed on a dimensional coordinate of a target track cluster, a track point data set G in the moving process is collected according to a starting transverse coordinate and a stopping transverse coordinate of the window, then a track central point coordinate with the size of the search window being h is sequentially extracted through sliding the search window, a track central point coordinate set is generated, and then a central line of the target track cluster is drawn according to data corresponding to the central point coordinate set.
Step 308, converting the longitude and latitude coordinates of the center line into a floating point type with a fixed number of bits, calculating the coordinates of the center point of the center line, inquiring the geographical position code and the Chinese address of the center point, and storing the geographical position code and the Chinese address into a database form.
In order to make the channel data persistent, the excavated channel data, namely longitude and latitude coordinates of the central line, are converted into a floating point type of fixed number of bits, wherein each channel data is labeled, the track sequence is of a point data (MultiPoint) type, the coordinate of the central point of the central line is calculated, and the geographical position code and the Chinese address of the central point are inquired and stored into a database form.
Step 309, detecting the position information of the center line according to a preset period, and acquiring the updated position information of the center point to update the position information of the center line under the condition that the position information of the center line is updated.
In the embodiment of the present disclosure, in order to ensure the accuracy of the center line position information, the position information of the center line needs to be detected according to a preset period, if the preset period is set to 3 days, the position information of the center line needs to be detected every 3 days, and when it is detected that the position information of the center line is updated, the position information of the center line corresponding to the updated position information of the center point is obtained and updated.
The border channel generation scheme based on track clustering provided by the embodiment of the disclosure obtains historical track data and border line longitude and latitude data of a target user, arranges the historical track data according to a time sequence to obtain a track sequence corresponding to the target user, removes duplication of track points with zero or null latitude and repeated track points in the track sequence to obtain candidate track data, calculates great circle distances of longitude and latitude of the candidate track data and the border line longitude and latitude data to obtain great circle distances, performs histogram statistics on the candidate track data to obtain a weight corresponding to each track point in the candidate track data, obtains track points with the great circle distances less than or equal to a preset distance threshold and the weights more than or equal to a preset weight threshold from the candidate track data as target track data, randomly selects one track point from the target track data as a start track point, obtaining all track points within a preset distance range from a starting track point, if the number of all track points is larger than or equal to a preset number threshold, forming a candidate track cluster by the starting track point and all track points, marking the starting track point as visited, repeating the previous step of operation to process all track points which are not marked as visited in the candidate track cluster, if the number of all track points is smaller than the preset number threshold, marking the starting track point as a noise point, if all track points in the candidate track cluster are marked as visited, continuously repeating the operation to process the points which are not visited until all track points in target track data are visited to obtain the candidate track cluster, screening one or more of track sequences which form a user number larger than the preset number threshold, track sequences which screen a longest great circle distance not smaller than a preset distance threshold, and track sequences which screen a time not less than a preset threshold number of days, generating a track central point coordinate set according to a formula, extracting a central line of a target track cluster, converting longitude and latitude coordinates of the central line into a floating point type with a fixed number of bits, calculating the central point coordinate of the central line, inquiring a geographical position code and a Chinese address of the central point, storing the geographical position code and the Chinese address into a database form, detecting position information of the central line according to a preset period, acquiring the updated position information of the center point to perform an update process on the position information of the center line in the case of acquiring the updated position information of the center line, the travel rule of target personnel is obtained through clustering target track data, the channel data durability is guaranteed through a mode of storing the center line position information in a preset format, the center line position information is detected and updated through a preset period, the channel data accuracy is guaranteed, and the border safety problem caused by untimely updating of the center line position information is avoided.
Fig. 5 is a general flow diagram of a border channel generation method based on track clustering according to an embodiment of the present disclosure, which includes obtaining historical track data and border line longitude and latitude data, preprocessing the historical track data to obtain candidate track data, calculating based on the candidate track data and the border line longitude and latitude data to obtain target track data, processing the target track data through clustering to obtain candidate track clusters, screening the candidate track clusters to obtain target track clusters, and extracting a center line of the target track clusters to obtain channel data, that is, center line position information and store the channel data.
Fig. 6 is a schematic structural diagram of a border channel generation apparatus based on track clustering according to the present disclosure, the apparatus includes a first obtaining module 601, a first processing module 602, a calculating module 603, a second processing module 604, a filtering module 605, and an extracting module 606, wherein,
the first obtaining module 601 is configured to obtain historical track data and border line longitude and latitude data of a target user;
a first processing module 602, configured to pre-process historical trajectory data to obtain candidate trajectory data;
a calculation module 603, configured to perform calculation based on the candidate trajectory data and the border line longitude and latitude data to obtain target trajectory data;
a second processing module 604, configured to perform clustering processing on the target trajectory data to obtain candidate trajectory clusters;
the screening module 605 is configured to screen the candidate trajectory cluster based on a preset policy to obtain a target trajectory cluster;
and an extracting module 606, configured to extract a center line of the target track cluster, and store position information of the center line according to a preset format.
Optionally, the first processing module 602 is specifically configured to:
arranging the historical track data according to a time sequence to obtain a track sequence corresponding to a target user; the track sequence consists of a series of track points, and each track point comprises a user identifier, longitude, latitude and time;
deleting track points with zero or null latitude and repeated track points in the track sequence to obtain candidate track data; and the track point with the distance from the target user to the previous adjacent track point within the preset time interval and smaller than the preset distance threshold value is used as the repeated track point.
Optionally, the calculating module 603 is specifically configured to:
carrying out longitude and latitude great circle distance calculation on the candidate track data and the border line longitude and latitude data to obtain a great circle distance;
performing histogram statistics on the candidate track data to obtain a weight corresponding to each track point in the candidate track data;
and obtaining track points with the great circle distance less than or equal to a preset distance threshold and the weight greater than or equal to a preset weight threshold from the candidate track data as target track data.
Optionally, the second processing module 604 is specifically configured to:
randomly selecting one track point from the target track data as a starting track point, acquiring all track points within a preset distance range from the starting track point, if the number of all track points is larger than or equal to a preset number threshold value, forming a candidate track cluster by the starting track point and all track points, and marking the starting track point as visited;
repeating the previous step of operation to process all track points which are not marked as visited in the candidate track cluster;
if the number of all track points is smaller than a preset number threshold, marking the starting track point as a noise point;
if all track points in the candidate track cluster are marked as visited, the operation is continuously repeated to process points which are not visited until all track points in the target track data are visited, and the candidate track cluster is obtained.
Optionally, the screening module 605 is specifically configured to:
and screening to form one or more of track sequences with the number of users larger than a preset number threshold, track sequences with the longest circle distance not smaller than a preset distance threshold and track sequences with the screening time span not less than a preset number of days threshold.
Optionally, the extracting module 606 is specifically configured to:
setting a search window with a preset size, starting translation on a latitude coordinate of a target track cluster, recording the initial latitude of the search window as s, recording the end latitude as e, and collecting a track point data set omega in the moving processiCalculated according to the following formula: omegai={(x,y)|si≤x≤eiY ∈ R }, wherein ΩiRepresenting a set of longitude and latitude coordinates in the window, the starting abscissa position s of the windowiWith a stop abscissa of ei
Sliding the search window, sequentially extracting the coordinates of the track central point in the search window, generating a track central point coordinate set H, drawing a central line according to data corresponding to the H, and calculating according to the following formula:
Figure BDA0003443347590000141
wherein, | ΩiI represents the number of track points in the set;
converting the longitude and latitude coordinates of the central line into a floating point type of a fixed number;
and calculating the center point coordinate of the center line, inquiring the geographical position code and the Chinese address of the center point and storing the geographical position code and the Chinese address into a database form.
Optionally, the apparatus further comprises:
the detection module is used for detecting the position information of the central line according to a preset period;
and the second acquisition module is used for acquiring the updated position information of the central point to update the position information of the central line under the condition of acquiring the updated position information of the central line.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A border channel generation method based on track clustering is characterized by comprising the following steps:
acquiring historical track data and border line longitude and latitude data of a target user;
preprocessing the historical track data to obtain candidate track data;
calculating based on the candidate track data and the border line longitude and latitude data to obtain target track data;
clustering the target track data to obtain candidate track clusters;
screening the candidate track clusters based on a preset strategy to obtain target track clusters;
and extracting the central line of the target track cluster, and storing the position information of the central line according to a preset format.
2. The method for generating the border channel based on the track clustering as claimed in claim 1, wherein the preprocessing the historical track data to obtain candidate track data comprises:
arranging the historical track data according to a time sequence to obtain a track sequence corresponding to the target user; the track sequence consists of a series of track points, and each track point comprises a user identifier, longitude, latitude and time;
deleting track points with zero or null latitude and repeated track points in the track sequence to obtain candidate track data; and the target user takes the track point with the distance from the previous adjacent track point within a preset time interval less than a preset distance threshold value as the repeated track point.
3. The method of claim 1, wherein the calculating based on the candidate trajectory data and the border line longitude and latitude data to obtain target trajectory data comprises:
carrying out longitude and latitude great circle distance calculation on the candidate track data and the border line longitude and latitude data to obtain a great circle distance;
performing histogram statistics on the candidate track data to obtain a weight corresponding to each track point in the candidate track data;
and obtaining track points with the great circle distance smaller than or equal to a preset distance threshold and the weight larger than or equal to a preset weight threshold from the candidate track data as the target track data.
4. The border channel generation method based on trajectory clustering as claimed in claim 3, wherein the latitude and longitude great circle distance calculation formula is:
Haversine(X,Y)=R×accos(cos(x1-y1)cosx2cosy2+sinx2siny2);
wherein R is the radius of the earth, x1,y1Indicating the number of longitudes, x, of the point2,y2The table shows the number of picks at a point.
5. The method for generating the border channel based on the trajectory clustering according to claim 1, wherein the clustering the target trajectory data to obtain candidate trajectory clusters comprises:
randomly selecting one track point from the target track data as a starting track point, acquiring all track points within a preset distance range from the starting track point, if the number of all track points is larger than or equal to a preset number threshold value, forming a candidate track cluster by the starting track point and all track points, and marking the starting track point as visited;
repeating the previous step of operation to process all track points which are not marked as visited in the candidate track cluster;
if the number of all track points is smaller than the preset number threshold, marking the starting track point as a noise point;
if all track points in the candidate track cluster are marked as visited, the above operation is continuously repeated to process points which are not visited until all track points in the target track data are visited, and the candidate track cluster is obtained.
6. The method of claim 1, wherein the preset strategy comprises:
and screening to form one or more of track sequences with the number of users larger than a preset number threshold, track sequences with the longest circle distance not smaller than a preset distance threshold and track sequences with the screening time span not less than a preset number of days threshold.
7. The method of claim 1, wherein the extracting the center line of the target track cluster comprises:
setting a search window with a preset size, starting translation on the latitude coordinate of the target track cluster, recording the initial latitude of the search window as s, recording the end latitude as e, and collecting a track point data set omega in the moving processiCalculated according to the following formula: omegai={(x,y)|si≤x≤eiY ∈ R }, wherein ΩiRepresenting a set of longitude and latitude coordinates in the window, the starting abscissa position s of the windowiWith a stop abscissa of ei
Sliding the search window, sequentially extracting the coordinates of the track central point in the search window, generating a track central point coordinate set H, drawing the central line according to the data corresponding to the H, and calculating according to the following formula:
Figure FDA0003443347580000031
wherein, | ΩiAnd | represents the number of trace points in the set.
8. The method for generating the border channel based on the track clustering as claimed in claim 1, wherein the storing the position information of the center line according to the preset format comprises:
converting the longitude and latitude coordinates of the central line into a floating point type of a fixed number;
and calculating the coordinates of the central point of the central line, inquiring the geographical position code and the Chinese address of the central point and storing the geographical position code and the Chinese address into a database form.
9. The method of claim 1, further comprising:
detecting the position information of the center line according to a preset period;
and under the condition of acquiring the updated position information of the central line, acquiring the updated position information of the central point to update the position information of the central line.
10. A border channel generation device based on trajectory clustering, comprising:
the first acquisition module is used for acquiring historical track data and border line longitude and latitude data of a target user;
the first processing module is used for preprocessing the historical track data to obtain candidate track data;
the computing module is used for computing based on the candidate track data and the border line longitude and latitude data to obtain target track data;
the second processing module is used for clustering the target track data to obtain candidate track clusters;
the screening module is used for screening the candidate track cluster based on a preset strategy to obtain a target track cluster;
and the extraction module is used for extracting the central line of the target track cluster and storing the position information of the central line according to a preset format.
CN202111640835.XA 2021-12-29 2021-12-29 Boundary channel generation method and device based on track clustering Pending CN114492590A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111640835.XA CN114492590A (en) 2021-12-29 2021-12-29 Boundary channel generation method and device based on track clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111640835.XA CN114492590A (en) 2021-12-29 2021-12-29 Boundary channel generation method and device based on track clustering

Publications (1)

Publication Number Publication Date
CN114492590A true CN114492590A (en) 2022-05-13

Family

ID=81508660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111640835.XA Pending CN114492590A (en) 2021-12-29 2021-12-29 Boundary channel generation method and device based on track clustering

Country Status (1)

Country Link
CN (1) CN114492590A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907159A (en) * 2022-11-22 2023-04-04 应急管理部国家减灾中心 Method, device, equipment and medium for determining similar path typhoon
CN116136416A (en) * 2023-02-07 2023-05-19 北京甲板智慧科技有限公司 Real-time track optimization method and device based on multi-feature fusion filtering

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115907159A (en) * 2022-11-22 2023-04-04 应急管理部国家减灾中心 Method, device, equipment and medium for determining similar path typhoon
CN115907159B (en) * 2022-11-22 2023-08-29 应急管理部国家减灾中心 Method, device, equipment and medium for determining typhoons in similar paths
CN116136416A (en) * 2023-02-07 2023-05-19 北京甲板智慧科技有限公司 Real-time track optimization method and device based on multi-feature fusion filtering
CN116136416B (en) * 2023-02-07 2023-11-17 北京甲板智慧科技有限公司 Real-time track optimization method and device based on multi-feature fusion filtering

Similar Documents

Publication Publication Date Title
CN114492590A (en) Boundary channel generation method and device based on track clustering
CN109815993B (en) GPS track-based regional feature extraction, database establishment and intersection identification method
CN109684384B (en) Trajectory data space-time density analysis system and analysis method thereof
CN111814596B (en) Automatic city function partitioning method for fusing remote sensing image and taxi track
CN107665289B (en) Operator data processing method and system
CN109118766A (en) A kind of colleague's vehicle discriminating method and device based on traffic block port
CN110275911B (en) Private car travel hot spot path mining method based on frequent sequence mode
CN111209457B (en) Target typical activity pattern deviation warning method
Sun et al. Roads and Intersections Extraction from High‐Resolution Remote Sensing Imagery Based on Tensor Voting under Big Data Environment
CN110798805A (en) Data processing method and device based on GPS track and storage medium
CN112309126A (en) License plate detection method and device, electronic equipment and computer readable storage medium
CN114707616A (en) Method, device and equipment for identifying incidental relationship between tracks
CN115862331A (en) Vehicle travel track reconstruction method considering bayonet network topological structure
CN113205134A (en) Network security situation prediction method and system
WO2018042208A1 (en) Street asset mapping
Herold et al. A GEOBIA approach to map interpretation-multitemporal building footprint retrieval for high resolution monitoring of spatial urban dynamics
CN107798450B (en) Service distribution method and device
CN112052337A (en) Target relation detection method, system and storage medium based on time-space correlation
CN115187884A (en) High-altitude parabolic identification method and device, electronic equipment and storage medium
CN116127337B (en) Risk mining method, device, storage medium and equipment based on position and image
Al-Suleiman et al. Assessment of the effect of alligator cracking on pavement condition using WSN-image processing
Niccolai et al. Decision rule-based approach to automatic tree crown detection and size classification
Burnaev Time-series classification for industrial applications: road surface damage detection use case
CN108197134B (en) Automatic point group target synthesis algorithm supported by big data
CN112651992A (en) Trajectory tracking method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination