CN114419357B

CN114419357B - Data processing method, data processing device, computer and readable storage medium

Info

Publication number: CN114419357B
Application number: CN202210230988.5A
Authority: CN
Inventors: 侯立洋; 刘雨亭; 孟繁荣
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-06-21
Anticipated expiration: 2042-03-10
Also published as: CN114419357A

Abstract

The embodiment of the application discloses a data processing method, a data processing device, a computer and a readable storage medium, and the embodiment of the application can be applied to the field of maps, and the method comprises the following steps: acquiring V path nodes, and performing wandering traversal on the V path nodes to obtain M path sequences consisting of the V path nodes; acquiring node pairs from the M path sequences, predicting node association probabilities between the V path nodes and the first path nodes respectively based on the first path nodes in the node pairs, and determining node association characteristics corresponding to the V path nodes respectively through the node association probabilities corresponding to the V path nodes respectively and second path nodes in the node pairs; the first path node and the second path node are in the same path sequence; and clustering the V path nodes based on the node association characteristics to obtain a node set for carrying out region division on the network. By the method and the device, the accuracy of road network region division can be improved.

Description

Data processing method, data processing device, computer and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, a computer, and a readable storage medium.

Background

With the development of cities and traffic, the coverage area of a road network is larger and larger, traffic routes and the like included in the road network are more and more complex, and when analyzing route tracks and the like from the traffic routes to the traffic routes, the analysis coverage rate of the traffic routes may be lower due to the fact that the number of the traffic routes is larger and the distribution is more complex. Therefore, the road network may be divided into regions, and generally, the regions where the traffic routes are located are determined based on the geographical regions where the traffic routes are located, so that the road network is divided into regions.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device, a computer and a readable storage medium, which can improve the accuracy of road network region division.

An embodiment of the present application provides a data processing method, including:

acquiring V path nodes, and performing wandering traversal on the V path nodes to obtain M path sequences consisting of the V path nodes; v is a positive integer; m is a positive integer;

acquiring node pairs from the M path sequences, predicting node association probabilities between the V path nodes and the first path nodes respectively based on the first path nodes in the node pairs, and determining node association characteristics corresponding to the V path nodes respectively through the node association probabilities corresponding to the V path nodes respectively and second path nodes in the node pairs; the first path node and the second path node are in the same path sequence;

and clustering the V path nodes based on the node association characteristics to obtain a node set for carrying out region division on the network.

An embodiment of the present application provides a data processing apparatus, where the apparatus includes:

the node acquisition module is used for acquiring V path nodes;

the sequence acquisition module is used for performing wandering traversal on the V path nodes to obtain M path sequences consisting of the V path nodes; v is a positive integer; m is a positive integer;

the node pair obtaining module is used for obtaining node pairs from the M path sequences;

the characteristic determining module is used for predicting node association probabilities between the V path nodes and the first path nodes respectively based on the first path nodes in the node pairs, and determining node association characteristics corresponding to the V path nodes respectively through the node association probabilities corresponding to the V path nodes respectively and second path nodes in the node pairs; the first path node and the second path node are in the same path sequence;

and the node clustering module is used for clustering the V path nodes based on the node association characteristics to obtain a node set for carrying out region division on the road network.

Wherein the V path nodes comprise path node i; the M path sequences comprise path sequences corresponding to path nodes i; i is a positive integer less than or equal to V;

the sequence acquisition module comprises:

the first walking unit is used for selecting a second sequence node corresponding to the sequence starting point from adjacent nodes of the path node i by taking the path node i as the sequence starting point, and selecting a third sequence node corresponding to the sequence starting point from the adjacent nodes of the second sequence node until a jth sequence node corresponding to the sequence starting point is obtained; j is a positive integer;

and a first sequence determining unit, configured to determine, if a j-th sequence node corresponding to the sequence starting point does not have an adjacent node or j is a sequence length threshold, the j-th sequence node corresponding to the sequence starting point as a sequence end point corresponding to the sequence starting point, and determine, based on the sequence starting point to the sequence end point, a path sequence corresponding to the path node i.

the sequence acquisition module comprises:

the second walking unit is used for determining adjacent nodes of the path node i as a first subsequence related to the sequence starting point by taking the path node i as the sequence starting point, and determining the adjacent nodes of the path node included in the first subsequence as a second subsequence related to the sequence starting point until a d-th subsequence related to the sequence starting point is obtained; d is a positive integer;

a second sequence determining unit, configured to determine, if there is no adjacent node in a path node included in the d-th subsequence, or the total number of path nodes included in the sequence starting point and the first to d-th subsequences associated with the sequence starting point is greater than or equal to the sequence length threshold, a path sequence corresponding to the path node i according to the sequence starting point and the path nodes included in the first to d-th subsequences associated with the sequence starting point.

Wherein, this sequence acquisition module includes:

the complete traversal unit is used for respectively taking the V path nodes as first sequence starting points and respectively performing wandering traversal in the V path nodes by using the V first sequence starting points to obtain first traversal sequences respectively corresponding to the V path nodes;

the random traversal unit is used for randomly selecting a second sequence starting point from the V path nodes, and performing wandering traversal in the V path nodes by using the second sequence starting point to obtain a second traversal sequence corresponding to the second sequence starting point;

and the third sequence determining unit is used for determining the M path sequences according to the first traversal sequences corresponding to the V path nodes respectively and the second traversal sequences corresponding to the starting points of the second sequences.

Wherein, the node pair obtaining module includes:

a sequence selection unit, configured to determine, as a target path sequence, a path sequence including an ith path node of the V path nodes among the M path sequences; i is a positive integer less than or equal to V;

a co-occurrence obtaining unit, configured to obtain a size of a node co-occurrence window, and in a target path sequence, obtain a co-occurrence path node whose sequence distance from an ith path node is smaller than or equal to the size of the node co-occurrence window;

the node pair forming unit is used for forming a node pair corresponding to the ith path node by the co-occurrence path node associated with the ith path node and the ith path node; the ith path node is a first path node in a node pair corresponding to the ith path node; the co-occurrence path node associated with the ith path node is the second path node in the node pair corresponding to the ith path node.

Wherein, the node pair obtaining module includes:

the frequency screening unit is used for acquiring occurrence frequencies corresponding to the V path nodes from the M path sequences respectively and acquiring a first path node from the V path nodes based on the occurrence frequencies;

the node pair forming unit is further configured to obtain, from the M path sequences, a second path node in the same path sequence as the first path node, and form the node pair from the first path node and the second path node.

Wherein, the characteristic determining module comprises:

the characteristic identification unit is used for acquiring a characteristic identification model and acquiring a first node characteristic of a first path node in a node pair; the characteristic identification model comprises a first initial parameter matrix and a second initial parameter matrix;

the characteristic conversion unit is used for inputting the first node characteristics into the characteristic identification model and performing characteristic conversion on the first node characteristics by adopting a first initial parameter matrix in the characteristic identification model to obtain hidden characteristics;

the probability prediction unit is used for performing feature prediction on the hidden features by adopting a second initial parameter matrix in the feature recognition model to obtain node association probabilities between the V path nodes and the first path nodes respectively;

the parameter adjusting unit is used for adjusting parameters of the first initial parameter matrix and the second initial parameter matrix according to the node association probabilities respectively corresponding to the V path nodes and the second path node in the node pair;

the matrix generating unit is used for obtaining a first parameter matrix corresponding to the first initial parameter matrix and a second parameter matrix corresponding to the second initial parameter matrix; the first parameter matrix and the second parameter matrix are used for predicting node association probability matched with the node pairs;

and the characteristic determining unit is used for determining the node association characteristics corresponding to the V path nodes from the first parameter matrix or the second parameter matrix.

Wherein, the probability prediction unit comprises:

the characteristic prediction subunit is used for performing characteristic prediction on the hidden characteristic by adopting a second initial parameter matrix in the characteristic identification model to obtain path characteristics corresponding to the V path nodes respectively;

and the normalization processing subunit is used for performing feature normalization processing on the path features respectively corresponding to the V path nodes to obtain node association probabilities between the V path nodes and the first path node respectively.

Wherein, this parameter adjustment unit includes:

a probability obtaining subunit, configured to obtain, from the node association probabilities corresponding to the V path nodes, a target association probability of a second path node in the node pair;

the parameter adjusting subunit is used for generating a loss function according to the target association probability and the node association probability corresponding to the residual path nodes, and performing parameter adjustment on the first initial parameter matrix and the second initial parameter matrix based on the loss function; the remaining path node refers to a path node other than the first path node and the second path node in the node pair among the V path nodes.

Wherein, this node clustering module includes:

the attribute acquisition unit is used for acquiring node attribute types corresponding to the V path nodes and acquiring node attribute characteristics corresponding to the V path nodes in the node attribute types respectively;

the feature fusion unit is used for performing feature fusion on the node association features respectively corresponding to the V path nodes and the node attribute features respectively corresponding to the V path nodes to obtain node fusion features respectively corresponding to the V path nodes;

and the fusion clustering unit is used for clustering the V path nodes based on the node fusion characteristics respectively corresponding to the V path nodes to obtain a node set for carrying out region division on the road network.

Wherein, this node clustering module includes:

the distance acquisition unit is used for acquiring node distances between the ith path node and the V path nodes respectively based on the node association characteristics;

a distance determining unit, configured to determine a path node with a minimum node distance as a minimum adjacent node of an ith path node, and determine the minimum node distance as an reachable distance of the ith path node until obtaining minimum adjacent nodes and reachable distances corresponding to the V path nodes, respectively; i is a positive integer less than or equal to V;

the tree construction unit is used for constructing a tree edge between the V path nodes and the minimum adjacent nodes corresponding to the V path nodes respectively by taking the V path nodes as tree nodes, so as to construct a path node tree;

the tree splitting unit is used for splitting the path node tree based on the reachable distances respectively corresponding to the V path nodes to obtain k sub-path node trees; k is a positive integer;

and the set composition unit is used for composing the path nodes included in each sub-path node tree into a node set for carrying out region division on the road network.

Wherein, this interval acquisition unit includes:

the distance obtaining subunit is configured to obtain the node association features of the ith path node, the feature distances between the node association features corresponding to the V path nodes, and obtain the first node distance of the ith path node from the V feature distances until the first node distances corresponding to the V path nodes are obtained;

the distance determining subunit is used for determining the node distance between the ith path node and the pth path node from the first node distance of the ith path node, the first node distance of the pth path node and the characteristic distance between the node association characteristic of the ith path node and the node association characteristic of the pth path node until the node distances between the ith path node and the V path nodes are obtained; p is a positive integer less than or equal to V.

Wherein, this tree segmentation unit is used for specifically:

sorting the reachable distances corresponding to the V path nodes respectively, segmenting the tree edges corresponding to the sorted reachable distances in the path node tree in sequence until the number of the path nodes included in the segmented subtree is smaller than or equal to the minimum cluster size, and determining the obtained subtree when the number of the path nodes is smaller than or equal to the minimum cluster size as a k-sub path node tree.

Wherein, this node clustering module includes:

the distance acquiring unit is used for acquiring node distances between the ith path node and the V path nodes respectively based on the node association characteristics;

the distance determining unit is used for determining the path node with the minimum node distance as the minimum adjacent node of the ith path node, and determining the minimum node distance as the reachable distance of the ith path node until the minimum adjacent node and the reachable distance corresponding to the V path nodes are obtained; i is a positive integer less than or equal to V;

the edge connection unit is used for taking the V path nodes as sub-tree nodes, sequencing the reachable distances corresponding to the V path nodes respectively from small to large, and sequentially constructing sub-tree edges between the path nodes corresponding to the sequenced reachable distances and the minimum adjacent node until k sub-path node trees consisting of the V path nodes are obtained; k is a positive integer, and the number of path nodes included in each sub-path node tree is greater than or equal to the minimum cluster size;

the set composing unit is used for composing the path nodes included in each sub-path node tree into a node set for carrying out region division on the road network.

Wherein, this node clustering module includes:

the data acquisition unit is used for acquiring k initial clustering centers and acquiring initial clustering distances from V path nodes to the k initial clustering centers respectively based on the node association characteristics;

the initial clustering unit is used for dividing the V path nodes into initial sets corresponding to the k initial clustering centers based on the initial clustering distances from the V path nodes to the k initial clustering centers respectively;

the updating clustering unit is used for acquiring updating clustering centers corresponding to the k initial sets respectively, and dividing the V path nodes into the updating sets corresponding to the k updating clustering centers on the basis of the updating clustering distances from the V path nodes to the updating clustering centers corresponding to the k initial sets respectively;

the clustering iteration unit is used for determining the k updating sets as k initial sets if the k updating sets do not meet the node clustering condition, and returning and executing a process of obtaining updating clustering centers respectively corresponding to the k initial sets through the updating clustering unit;

and the set determining unit is used for determining the k updating sets as node sets for carrying out region division on the road network if the k updating sets meet the node clustering condition.

Wherein, this node clustering module includes:

a to-be-processed acquiring unit, configured to acquire path nodes to be processed from the V path nodes; the path node to be processed refers to a path node which is not subjected to node clustering processing;

the quantity obtaining unit is used for obtaining the adjacent quantity of the path nodes positioned in the node neighborhood of the path node to be processed; the path node located in the node neighborhood of the path node to be processed refers to the path node of which the characteristic distance with the path node to be processed is smaller than or equal to the neighborhood radius; the characteristic distance refers to the distance between the node association characteristic of the corresponding path node and the node association characteristic of the path node to be processed;

the node expansion unit is used for expanding nodes based on the path nodes to be processed and the path nodes located in the node neighborhoods of the path nodes to be processed to obtain density reachable nodes corresponding to the path nodes to be processed if the adjacent quantity is greater than or equal to the minimum set node number, forming a node set for carrying out regional division on the road network by the path nodes to be processed and the density reachable nodes, and returning and executing the process of acquiring the path nodes to be processed from the V path nodes through the acquisition unit to be processed until the path nodes to be processed do not exist in the V path nodes;

and the node processing unit is used for returning and executing the process of acquiring the path nodes to be processed from the V path nodes through the acquisition unit to be processed if the adjacent quantity is less than the minimum set node number.

Wherein, the device still includes:

the path query module is used for responding to a path query request aiming at a starting path node to a terminating path node and acquiring a target node set where the terminating path node is located; the starting path node belongs to V path nodes, and the ending path node belongs to V path nodes;

the track acquisition module is used for acquiring the path nodes included by the target node set, acquiring a first track path from the initial path node to the path nodes included by the target node set, and acquiring a second track path from the path nodes included by the target node set to the termination path node;

and the track determining module is used for determining a target track path from the starting path node to the ending path node according to the first track path and the second track path.

One aspect of the embodiments of the present application provides a computer device, including a processor, a memory, and an input/output interface;

the processor is respectively connected with the memory and the input/output interface, wherein the input/output interface is used for receiving data and outputting data, the memory is used for storing a computer program, and the processor is used for calling the computer program so as to enable the computer device comprising the processor to execute the data processing method in one aspect of the embodiment of the application.

In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, where the computer program is adapted to be loaded and executed by a processor, so that a computer device having the processor executes a data processing method in one aspect of the embodiments of the present application.

An aspect of an embodiment of the present application provides a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternatives in one aspect of the embodiments of the application. In other words, the computer instructions, when executed by a processor, implement the methods provided in the various alternatives in one aspect of the embodiments of the present application.

The embodiment of the application has the following beneficial effects:

in the embodiment of the application, V path nodes are obtained, and M path sequences consisting of the V path nodes are obtained by performing wandering traversal on the V path nodes; acquiring node pairs from the M path sequences, predicting node association probabilities between the V path nodes and the first path nodes respectively based on the first path nodes in the node pairs, and determining node association characteristics corresponding to the V path nodes respectively through the node association probabilities corresponding to the V path nodes respectively and second path nodes in the node pairs; the first path node and the second path node are in the same path sequence; and clustering the V path nodes based on the node association characteristics to obtain a node set for carrying out region division on the network. By means of wandering traversal of the V path nodes, a path sequence capable of reflecting the incidence relation among the path nodes is obtained, that is, the path sequence may represent structural features between included path nodes, that is, connection relationships between the path nodes and the path nodes, and further, due to the wandering traversal of path nodes, such that adjacent path nodes in the path sequence may be similar to each other, homogeneity from path node to path node is preserved to some extent, and therefore, node pairs are constructed by path sequences, so that the co-occurrence probability between the first path node and the second path node in the node pair is larger, therefore, the extracted node association characteristics can embody the structure between the path nodes through the node pairs, therefore, the structural property is kept in the clustering result, and the accuracy of road network region division is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a data processing scenario provided in an embodiment of the present application;

FIG. 2 is a flow chart of a method for processing data according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a road network topology provided in the embodiment of the present application;

FIG. 4 is a schematic diagram of a path sequence provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a model structure provided in an embodiment of the present application;

fig. 6 is a schematic diagram of a node clustering scene provided in an embodiment of the present application;

fig. 7 is a schematic diagram of a tree splitting scene according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a tree construction scenario provided in an embodiment of the present application;

fig. 9 is a schematic diagram of a road network region division process provided in the embodiment of the present application;

fig. 10 is a schematic diagram of an associative clustering scenario provided in an embodiment of the present application;

fig. 11 is a flowchart of a sequence generation method provided in an embodiment of the present application;

FIG. 12 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

If data of an object (such as a user) needs to be collected in the application, before and during collection, a prompt interface or a popup window is displayed, the prompt interface or the popup window is used for prompting the user to currently collect the XXXX data, only after a confirmation operation of the user on the prompt interface or the popup window is acquired, the relevant step of data acquisition is started, and otherwise, the data acquisition is ended. The acquired user data is used in a reasonable and legitimate scene, application, or the like. Optionally, in some scenarios that require the user data to be used but are not authorized by the user, authorization may also be requested from the user, and when the authorization passes, the user data is reused.

In the embodiment of the present application, please refer to fig. 1, and fig. 1 is a schematic diagram of a data processing scenario provided in the embodiment of the present application. As shown in fig. 1, a computer device may obtain V path nodes 101, where V is a positive integer, where the computer device may perform walk traversal on the V path nodes 101 to obtain M path sequences composed of V path nodes, where M is a positive integer and M is 2, and the M path sequences may include a path sequence 1021, a path sequence 1022, and the like. Further, the computer device may obtain the node pair 103 from the M path sequences, where the node pair 103 includes a first path node and a second path node, and optionally, the number of the node pair 103 may be one or at least two, and the like, which is not limited herein. Taking the first path node and the second path node as an example, the computer device may predict, based on the first path node in the node pair 103, node association probabilities between the V path nodes 101 and the first path node, where the greater the co-occurrence probability between the path nodes in the same node pair, that is, the greater the node association probability of the other path node in the node pair obtained through prediction by one path node in the node pair, and therefore, the node association characteristics corresponding to the V path nodes 101 may be determined through the node association probabilities corresponding to the V path nodes 101 and the second path node in the node pair 103. Further, the V path nodes 101 may be clustered based on the node association characteristics to obtain a node set 104 for performing area division on the routing network, where the number of the node set 104 may be one or at least two, and taking two node sets as an example, the node set 104 may include a node set 1041 and a node set 1042 shown in fig. 1.

Through the above processes, the node pair is obtained on the basis of the path sequence, that is, the co-occurrence probability of the first path node and the second path node is higher, the first path node is used for predicting to obtain the node association probability of other path nodes, that is, the node association probability between the first path node and the second path node, the first path node and the second path node are located in the same node pair, and the co-occurrence probability of the first path node and the second path node is higher, so that the node association probability between the node pair and the first path node and the node association probability between the V path nodes are respectively used for predicting to obtain the node association characteristics corresponding to the V path nodes respectively, the obtained node association characteristics can embody the co-occurrence relationship between the path nodes, that is, the structural characteristics between the path nodes are reserved, so that the V path nodes can be clustered through the node association characteristics, the accuracy and reliability of the clustering result can be improved.

The road network refers to the connection relationship among the traffic routes and the network structure of the map side constructed by the traffic route nodes. A traffic route (link), which refers to each cut-out link, may be considered as a minimum unit in the entire map network. Optionally, the road network may include, but is not limited to, an actual road network, a virtual road network, and the like, where the actual road network may be considered to correspond to an actual map, such as a road network formed by real traffic routes, and the virtual road network may be considered to correspond to a virtual map, such as a road network formed by traffic routes in a game, and the like, and the present invention is not limited thereto.

It is understood that the computer device mentioned in the embodiments of the present application includes, but is not limited to, a terminal device or a server. In other words, the computer device may be a server or a terminal device, or may be a system of a server and a terminal device. The above-mentioned terminal device may be an electronic device, including but not limited to a mobile phone, a tablet computer, a desktop computer, a notebook computer, a palm computer, a vehicle-mounted device, an Augmented Reality/Virtual Reality (AR/VR) device, a helmet display, a smart television, a wearable device, a smart speaker, a digital camera, a camera, and other Mobile Internet Devices (MID) with network access capability, or a terminal device in a scene such as a train, a ship, or a flight, and the like. The above-mentioned server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, vehicle-road cooperation, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

Optionally, the data related to the embodiment of the present application may be stored in a computer device, or the data may be stored based on a cloud storage technology or a blockchain network, which is not limited herein.

Further, please refer to fig. 2, fig. 2 is a flowchart of a data processing method according to an embodiment of the present disclosure. As shown in fig. 2, in the embodiment of the method described in fig. 2, the data processing procedure includes the following steps:

in step S201, V path nodes are acquired.

In this embodiment, the computer device may obtain V path nodes, where V is a positive integer. The computer device may obtain a road network topology, and obtain V path nodes from the road network topology, where the path node may be considered as one traffic route (link) in the road network topology, and optionally, the road network topology may be a topology of an actual road network (e.g., a real road network, etc.) or a topology of a virtual road network (e.g., a road network in a game, etc.), that is, the road network topology corresponds to a road network, and the road network includes V path nodes. Optionally, the computer device may obtain traffic roads included in the road network, obtain a route size, and cut the traffic roads included in the road network based on the route size and a route intersection to obtain V path nodes forming the road network, where the route intersection may be considered as a road inflection point or a road intersection, and the like. For example, in a game, a game map included in the game is acquired, a traffic road in the game map is acquired, and the traffic road in the game map is cut based on a route size and a route intersection point to obtain V path nodes forming the game map; alternatively, a road network topology structure corresponding to the game map is acquired, and V path nodes constituting the game map (i.e., V path nodes constituting the road network corresponding to the game map) are acquired from the road network topology structure. The game may be any game having a game map. In other words, the present application may be applied to any scene with a map, and the computer device may obtain V path nodes constituting a road network corresponding to the map.

For example, referring to fig. 3, fig. 3 is a schematic diagram of a road network topology provided in the embodiment of the present application. As shown in fig. 3, the road network topology includes V path nodes, such as a path node L1, path nodes L2, …, a path node L8, and a path node L9. The circles in fig. 3 are used to represent path nodes (links) in a road network, and the connecting lines indicate that a topological relationship (or referred to as an adjacency relationship) exists between two links. In general, in a network structure (such as a road network topology structure), the adjacency relations of path nodes are structurally similar, for example, the path node L1 and the path node L7 both have 4 adjacent nodes; homogeneity means that the characteristics between the path node and the neighboring nodes of the path node are similar, such as the characteristics of the path node L1 and the characteristics of the neighboring nodes of the path node L1 (such as the path node L2, the path node L3, or the path node L4). The structure may also represent a structural relationship of each path node in the road network, such as an adjacency relationship; homogeneity may represent attribute data of various path nodes, etc.

Step S202, obtaining M path sequences composed of V path nodes by performing wandering traversal on the V path nodes.

In this embodiment of the present application, the computer device may use any one of the V path nodes as a sequence starting point to perform walking traversal on the V path nodes to obtain a path sequence, and through the above processes, may obtain M path sequences composed of the V path nodes, where M is a positive integer. For example, the computer device may obtain sequence starting points from V path nodes, and perform walk-around traversal on the V path nodes based on the sequence starting points to obtain M path sequences composed of V path nodes, where the number of the sequence starting points may be one or at least two, and when the number of the sequence starting points is at least two, the same sequence starting point may exist in sequence starting points corresponding to the M path sequences, respectively. For example, taking M as 3 as an example, it is assumed that the M path sequences include a path sequence 1 "link 1 — > link3 — > link 8", a path sequence 2 "link 1 — > link2 — > link5 — > link 6", and a path sequence 3 "link 2 — > link4 — > link 7", where a sequence start point of the path sequence 1 is the same as a sequence start point of the path sequence 2, and a sequence start point of the path sequence 3 is different from a sequence start point of the path sequence 1 and a sequence start point of the path sequence 2, where a link is used to represent a path node, for example, link1 represents path node 1, and link2 represents path node2, and the like.

Taking a sequence starting point as an example, in a path sequence generation mode, the V path nodes include a path node i; the M path sequences comprise path sequences corresponding to path nodes i; i is a positive integer less than or equal to V, that is, the path node i may be any one of the V path nodes. Specifically, the computer device may select, with the path node i as a sequence starting point, a second sequence node corresponding to the sequence starting point from adjacent nodes of the path node i, and select a third sequence node corresponding to the sequence starting point from adjacent nodes of the second sequence node until a jth sequence node corresponding to the sequence starting point is obtained; j is a positive integer. When the computer device selects the second sequence node corresponding to the sequence starting point from the adjacent nodes of the path node i, one path node may be randomly selected from the adjacent nodes of the path node i, and the selected path node is determined as the second sequence node corresponding to the sequence starting point; alternatively, if a history track route is acquired, a history adjoining frequency at which the adjacent node of the path node i is located at the adjacent position of the path node i is acquired from the history track route, and the adjacent node having the largest history adjoining frequency among the adjacent nodes of the path node i is determined as a second sequence node corresponding to the sequence start point, for example, "path node i — > path node 2" appears 3 times in the history track route, it can be considered that the history adjoining frequency of the path node2 for the path node i is 3. Similarly, a third sequence node corresponding to the start point of the sequence may be selected from the adjacent nodes of the second sequence node, …, until a jth sequence node corresponding to the start point of the sequence is obtained.

Further, if the jth sequence node corresponding to the sequence starting point does not have an adjacent node or j is a sequence length threshold, the jth sequence node corresponding to the sequence starting point is determined as the sequence end point corresponding to the sequence starting point, and the path sequence corresponding to the path node i is determined based on the sequence starting point to the sequence end point. When the jth sequence node corresponding to the sequence starting point does not have an adjacent node, determining the jth sequence node corresponding to the sequence starting point as the sequence end point corresponding to the sequence starting point; or when j is a sequence length threshold, determining the jth sequence node corresponding to the sequence starting point as the sequence end point corresponding to the sequence starting point; alternatively, when it is satisfied that there is no adjacent node in the jth sequence node corresponding to the sequence start point, or j is any one of the sequence length thresholds, the jth sequence node corresponding to the sequence start point may be determined as the sequence end point corresponding to the sequence start point, or the like. Since the number of the adjacent nodes corresponding to each path node may be zero, one, or at least two, when the number of the adjacent nodes of the path node is at least two, there are various different situations when the next sequence node of the path node is selected from the adjacent nodes of the path node, and therefore, when one path node is used as a sequence starting point, one or at least two different path sequences can be obtained. Similarly, M path sequences consisting of V path nodes can be obtained, and path sequences having the same sequence start point may exist in the M path sequences.

In a path sequence generation mode, the V path nodes include a path node i; the M path sequences comprise path sequences corresponding to path nodes i; i is a positive integer less than or equal to V. Determining adjacent nodes of the path node i as a first subsequence related to the sequence starting point by taking the path node i as the sequence starting point, and determining the adjacent nodes of the path node included in the first subsequence as a second subsequence related to the sequence starting point until a d-th subsequence related to the sequence starting point is obtained; d is a positive integer. And if no adjacent node exists in the path node included in the d-th subsequence, or the total number of the path nodes included in the sequence starting point and the first subsequence to the d-th subsequence associated with the sequence starting point is greater than or equal to the sequence length threshold, determining the path sequence corresponding to the path node i according to the sequence starting point and the path nodes included in the first subsequence to the d-th subsequence associated with the sequence starting point.

Optionally, the computer device may use the path node i as a sequence starting point, add the sequence starting point to the initial sequence, use the sequence starting point as a traversal pointer node, obtain an adjacent node corresponding to the traversal pointer node, and add the adjacent node corresponding to the traversal pointer node to the initial sequence. If the number of path nodes included in the current initial sequence is smaller than the sequence length threshold, determining the next path node of the traversal pointer node in the current initial sequence as the traversal pointer node, and returning to execute the process of acquiring the adjacent node corresponding to the traversal pointer node; if the number of path nodes included in the current initial sequence is greater than or equal to the sequence length threshold, determining the initial sequence containing the path nodes whose number is greater than or equal to the sequence length threshold as the path sequence corresponding to the path node i. For example, assuming that the path node i is a path node L1 and the sequence length threshold is 10, the neighboring nodes of the path node L1 are obtained, including a path node L2 and a path node L3, the sequence starting point (i.e., the path node L1) is used as a traversal pointer node, and the neighboring nodes of the traversal pointer node are added to the initial sequence, where the initial sequence is "path node L1 > path node L2- > path node L3", includes 3 path nodes, and is smaller than the sequence length threshold; determining a next path node (i.e. path node L2) of a traversal pointer node (i.e. path node L1) in the current initial sequence as a traversal pointer node, acquiring an adjacent node (i.e. path node L2), such as path node L4, adding the adjacent node of the traversal pointer node into the initial sequence, wherein the initial sequence at this time is 'path node L1- > path node L2- > path node L3- > path node L4', contains 4 path nodes and is smaller than a sequence length threshold; determining a next path node (i.e., path node L3) of the traversal pointer node (i.e., path node L2) in the current initial sequence as a traversal pointer node, acquiring neighboring nodes of the traversal pointer node (i.e., path node L3), such as path node L5 and path node L6, and adding the neighboring nodes of the traversal pointer node to the initial sequence, wherein the initial sequence at this time is "path node L1- > path node L2- > path node L3- > path node L4- > path node L5 > path node L6", contains 6 path nodes, and is smaller than the sequence length threshold; …, if the initial sequence containing the path nodes with the number larger than or equal to the sequence length threshold is obtained, the initial sequence containing the path nodes with the number larger than or equal to the sequence length threshold is determined as the path sequence corresponding to the path node L1.

Optionally, the computer device may use the V path nodes as first sequence starting points, and perform wandering traversal in the V path nodes by using the V first sequence starting points, respectively, to obtain first traversal sequences corresponding to the V path nodes, respectively. The process of performing the walk traversal in the V path nodes respectively by using the V first sequence starting points may refer to the path sequence generation manner, and is not described herein again. All path nodes in the road network are sampled by performing full-amount wandering on the V path nodes, namely, each path node is taken as a sequence starting point to perform wandering traversal, so that the sampling coverage rate of the path nodes is improved. Optionally, different path sequences may be obtained because the next sequence node of the path node is selected differently, and therefore, a plurality of different path sequences may be obtained by using the same path node as the sequence starting point. Optionally, the computer device may execute the process of performing the wandering traversal in the V path nodes by using the V path nodes as the first sequence starting points and using the V first sequence starting points to obtain the first traversal sequences corresponding to the V path nodes, so as to increase a retention range of the structural relationship between the path nodes, further improve co-occurrence between nodes retained by subsequent node association features, and further improve accuracy of node clustering.

Further, the computer device randomly selects a second sequence starting point from the V path nodes, and performs wandering traversal in the V path nodes using the second sequence starting point to obtain a second traversal sequence corresponding to the second sequence starting point. The process of performing the walk traversal in the V path nodes by using the starting point of the second sequence may refer to the path sequence generation manner, and is not described herein again. In the process, path nodes serving as sequence starting points can be randomly selected to improve the randomness of the samples. Optionally, the computer device may perform the process of randomly selecting the second sequence starting point from the V path nodes, and performing walking traversal in the V path nodes using the second sequence starting point to obtain a second traversal sequence corresponding to the second sequence starting point, so as to increase the number of random samples, and improve the randomness of the samples under the condition of ensuring the coverage of the path nodes. Wherein, the sample is used for node association feature extraction.

Furthermore, M path sequences are determined according to the first traversal sequence corresponding to the V path nodes respectively and the second traversal sequence corresponding to the second sequence starting point.

For example, please refer to fig. 4, fig. 4 is a schematic diagram of a path sequence according to an embodiment of the present disclosure. As shown in fig. 4, in the road network 401, the traffic road is composed of V path nodes, and the traffic road is shown by double lines in fig. 4, and the double lines in fig. 4 are cut to obtain V path nodes, wherein the V path nodes are traversed by taking the path node 402 as a sequence starting point to obtain a path sequence 403 (shown by a black thick solid line).

Step S203, acquiring node pairs from the M path sequences.

In an embodiment of the present application, a computer device may obtain a node pair from M path sequences, where the node pair includes a first path node and a second path node, and the first path node and the second path node are in the same path sequence.

Specifically, the computer device may determine, as the target path sequence, a path sequence including an ith path node among the V path nodes among the M path sequences; i is a positive integer less than or equal to V. For example, assuming that the ith path node is link3, a path sequence including the ith path node is determined as a target path sequence, and the number of the target path sequence may be one or at least two, for example, the target path sequence is [ link1, link2, link3, link4, link5 ]. And in the target path sequence, determining the path node in the same path sequence with the ith path node as the co-occurrence path node of the ith path node. Optionally, the computer device may limit a distance between the co-occurrence path node and the ith path node, that is, a size of a node co-occurrence window, when acquiring the co-occurrence path node of the ith path node, specifically, the computer device may acquire the size of the node co-occurrence window, and in the target path sequence, acquire the co-occurrence path node whose sequence distance from the ith path node is less than or equal to the size of the node co-occurrence window. It is considered that the probability of co-occurrence between the path nodes in the same path sequence is greater than the probability of co-occurrence between the path nodes in different path sequences, and the probability of co-occurrence between two path nodes closer to each other in the same path sequence is greater, for example, in the above-mentioned target path sequence, the probability of co-occurrence between link3 and link2 is greater than the probability of co-occurrence between link3 and link 1. Therefore, the co-occurrence path node of the ith path node can be obtained in the target path sequence where the ith path node is located. Optionally, in order to reduce the amount of data that needs to be processed, the number of co-occurrence path nodes of the ith path node may be limited, that is, a path node with a higher co-occurrence probability with the ith path node is selected, so that a node co-occurrence window size may be obtained, in the target path sequence, a co-occurrence path node whose sequence distance from the ith path node is smaller than or equal to the node co-occurrence window size is obtained, that is, the distance between two path nodes included in a node pair obtained based on the node co-occurrence window size in the corresponding path sequence does not exceed the node co-occurrence window size, the node co-occurrence window size is equivalent to a distance range, and when obtaining the co-occurrence path node of the ith path node, the node is obtained within the distance range limited by the node co-occurrence window size. For example, in the target path sequence, the size of the node co-occurrence window is 2, and the co-occurrence path nodes of the ith path node (i.e., link 3) include link1, link2, link4 and link 5. Further, the co-occurrence path node associated with the ith path node and the ith path node are combined into a node pair corresponding to the ith path node, for example, including node pair [ link3, link1], node pair [ link3, link2], node pair [ link3, link4] and node pair [ link3, link5 ]. The ith path node is a first path node in a node pair corresponding to the ith path node; the co-occurrence path node associated with the ith path node is the second path node in the node pair corresponding to the ith path node. By acquiring the node pairs in the manner, all path nodes with frequency of occurrence can be reserved, and the frequency of occurrence can be used for representing the frequency of occurrence of the path nodes in the M path sequences, so that the node coverage rate is improved, and the accuracy of feature extraction is further improved.

Optionally, the occurrence frequency corresponding to each of the V path nodes is obtained from the M path sequences, and the first path node is obtained from the V path nodes based on the occurrence frequency. The path nodes are screened through the frequency of occurrence, and the path nodes with higher influence are reserved, so that the data volume needing to be processed is reduced, and the efficiency of feature extraction is improved. Further, a second path node in the same path sequence as the first path node may be obtained from the M path sequences, and the first path node and the second path node may be formed into a node pair. Or, a first path sequence where a first path node is located may be obtained from the M path sequences, and a node co-occurrence window size is obtained, in the first path sequence, a second path node whose sequence distance from the first path node is smaller than or equal to the node co-occurrence window size is obtained, and the first path node and the second path node form a node pair corresponding to the first path node.

Step S204, based on the first path node in the node pair, predicting node association probabilities between the V path nodes and the first path node respectively, and determining node association characteristics corresponding to the V path nodes respectively through the node association probabilities corresponding to the V path nodes respectively and the second path node in the node pair.

In the embodiment of the application, the computer device may obtain the feature recognition model, and obtain a first node feature of a first path node in the node pair; the feature recognition model comprises a first initial parameter matrix and a second initial parameter matrix. Inputting the first node characteristics into a characteristic recognition model, and performing characteristic conversion on the first node characteristics by adopting a first initial parameter matrix in the characteristic recognition model to obtain hidden characteristics; and performing feature prediction on the hidden features by adopting a second initial parameter matrix in the feature recognition model to obtain node association probabilities between the V path nodes and the first path nodes respectively. For example, referring to fig. 5, fig. 5 is a schematic diagram of a model structure provided in an embodiment of the present application. As shown in FIG. 5, a computer device may characterize 501 (x) a first node₁，x₂，x₃，x₄，…，x_V) Inputting a feature recognition model, and performing feature conversion on the first node feature 501 by using a first initial parameter matrix W in the feature recognition model to obtain a hidden feature 502 (h)₁，h₂，h₃，…，h_N) Wherein the dimension of the first initial parameter matrix W may be considered V × N. Optionally, one possible feature transformation way is

Wherein h is used for representing hidden features, W is used for representing a first initial parameter matrix, and x is used for representing first node features. The first node feature 501 may be obtained based on any one of a word vector conversion (word 2 vec) technology, a node vector conversion (node 2 vec) technology, or a pre-trained language Representation model (bert), and is not limited herein. Further, the hidden feature 502 is subjected to feature prediction by using a second initial parameter matrix W' in the feature recognition model, so as to obtain node association probabilities 503 between the V path nodes and the first path nodes, respectively, where the node association probabilities 503 may be regarded as a V-dimensional vector, as shown in (y) in fig. 5₁，y₂，y₃，y₄，…，y_V) E.g. y₁For representing the node association probability, y, between the path node L1 and the first path node₂For indicating a node association probability between the path node L2 and the first path node, and the like. Optionally, one possible feature prediction method is

Wherein h is used for representing hidden features, W' is used for representing a second initial parameter matrix, and y is used for representing node association probabilities between the V path nodes and the first path nodes respectively.

Further, parameter adjustment can be performed on the first initial parameter matrix and the second initial parameter matrix through the node association probabilities corresponding to the V path nodes respectively and the second path node in the node pair until a first parameter matrix corresponding to the first initial parameter matrix and a second parameter matrix corresponding to the second initial parameter matrix are obtained; the first parameter matrix and the second parameter matrix are used to predict node association probabilities that match node pairs. That is, under the condition of the first parameter matrix and the second parameter matrix, the node association probability between each of the V path nodes and the first path node is predicted to match the node pair. For example, in the node pair (link 1, link 2), the node characteristics of the link1 are input under the first parameter matrix and the second parameter matrix, and the obtained node association probability corresponding to the link2 is large.

Optionally, the number of the node pairs is one or at least two, and the computer device may sequentially perform parameter adjustment on the first initial parameter matrix and the second initial parameter matrix based on the node pairs until the first parameter matrix and the second parameter matrix are obtained. Or, the computer device may obtain, based on the node pair, a second path node set located in the same node pair as the first path node, where the second path node set includes one or at least two second path nodes, each of the second path nodes and the first path node forms a node pair, and perform parameter adjustment on the first initial parameter matrix and the second initial parameter matrix based on the second path node set to obtain a first parameter matrix and a second parameter matrix, where, during the parameter adjustment, the greater the comprehensive probability of the node association probabilities of the second path nodes included in the second path node set is, the further the first parameter matrix and the second parameter matrix are obtained, and optionally, the comprehensive probability may be a mean value, a mean square error, a maximum value, or the like of the node association probabilities of the second path nodes included in the second path node set, without limitation, by integrating node pairs to adjust parameters, the data amount of the node pairs needing to be trained can be reduced, and the efficiency of feature extraction is further improved. When the first initial parameter matrix and the second initial parameter matrix are subjected to parameter adjustment, information in the feature recognition model is fused into the first initial parameter matrix and the second initial parameter matrix, so that the first parameter matrix and the second parameter matrix are obtained, the co-occurrence relation between path nodes in a node pair can be embodied by the first parameter matrix and the second parameter matrix, and the node feature of V path nodes can be represented by the first parameter matrix and the second parameter matrix. Further, node association characteristics corresponding to the V path nodes may be determined from the first parameter matrix or the second parameter matrix.

Optionally, when feature prediction is performed on the hidden feature by using the second initial parameter matrix in the feature recognition model to obtain node association probabilities between the V path nodes and the first path node, feature prediction may be performed on the hidden feature by using the second initial parameter matrix in the feature recognition model to obtain path features corresponding to the V path nodes, which may be denoted as (u)₁，u₂，u₃，u₄，…，u_V) (ii) a And carrying out characteristic normalization processing on the path characteristics corresponding to the V path nodes respectively to obtain node association probability y between the V path nodes and the first path node respectively. The feature normalization processing can be realized by a formula (i):

①

as shown in formula (I)Shown, y_fFor representing the node association probability, u, between the f-th path node and the first path node_fFor representing the path characteristics of the f-th path node.

Or, the feature normalization processing may be implemented by a formula:

②

as shown in formula (II), yf is used for representing the node association probability between the f-th path node and the first path node, and uf is used for representing the path characteristic of the f-th path node. exp () is used to represent an exponential function or the like.

Optionally, the feature normalization processing method is not limited to the above method, and other normalization methods may also be used, such as a logistic regression (softmax) method.

Optionally, when the parameter of the first initial parameter matrix and the second initial parameter matrix is adjusted through the node association probabilities corresponding to the V path nodes respectively and the second path node in the node pair, the computer device may obtain the target association probability of the second path node in the node pair from the node association probabilities corresponding to the V path nodes respectively; generating a loss function according to the target association probability and the node association probability corresponding to the residual path nodes, and performing parameter adjustment on the first initial parameter matrix and the second initial parameter matrix based on the loss function; the remaining path node refers to a path node other than the first path node and the second path node in the node pair among the V path nodes.

And S205, clustering the V path nodes based on the node association characteristics to obtain a node set for carrying out region division on the road network.

In this embodiment, the computer device may perform clustering processing on the V path nodes based on the node association characteristics to obtain a node set for performing area division on the routing network.

The computer equipment can obtain node distances between the ith path node and the V path nodes respectively based on the node association characteristics, determine the path node with the minimum node distance as the minimum adjacent node of the ith path node, and determine the minimum node distance as the reachable distance of the ith path node until the minimum adjacent node and the reachable distance corresponding to the V path nodes respectively are obtained; i is a positive integer less than or equal to V. And constructing k sub-path node trees based on the minimum adjacent nodes and the reachable distances corresponding to the V path nodes respectively, and determining the path nodes included in each sub-path node tree as a node set for dividing the road network region. In an actual road network, the difference of topology density distribution is likely to be large, link connection relations of hot areas are rich, link connection relations of cold areas are few, and density distribution of different areas is considered through determination of reachable distances of path nodes and the like, so that all path nodes can be obtained, clustering of all path nodes with high connectivity is achieved, and the clustering effect is improved.

Optionally, in a node set generation manner, the computer device may obtain node distances between an ith path node and V path nodes, respectively, based on the node association characteristics, determine a path node with a minimum node distance as a minimum adjacent node of the ith path node, and determine the minimum node distance as an reachable distance of the ith path node until obtaining minimum adjacent nodes and reachable distances corresponding to the V path nodes, respectively; i is a positive integer less than or equal to V. Referring to fig. 6, fig. 6 is a schematic diagram of a node clustering scene provided in the embodiment of the present application. As shown in fig. 6, the reachable distance and the minimum neighboring node corresponding to path node 1, the reachable distance and the minimum neighboring node corresponding to path node2, …, and the reachable distance and the minimum neighboring node corresponding to path node V are obtained. Further, V path nodes are used as tree nodes, and a tree edge is constructed between the V path nodes and the minimum adjacent nodes corresponding to the V path nodes, respectively, so as to construct a path node tree, as shown in a path node tree 601 in fig. 6. Segmenting the path node tree based on the reachable distances corresponding to the V path nodes respectively to obtain k sub-path node trees; k is a positive integer. The path nodes included in each sub-path node tree are grouped into a node set for performing region division on the road network, such as the node set 602 in fig. 6, where a solid line in fig. 6 is used to indicate an reachable distance, and a dashed line is used to indicate the number of nodes of the corresponding path node at a certain reachable distance.

When the node distances between the ith path node and the V path nodes are obtained based on the node association features, the computer device may obtain the node association features of the ith path node and the feature distances between the node association features corresponding to the V path nodes, and obtain the first node distance of the ith path node from the V feature distances until the first node distances corresponding to the V path nodes are obtained. Optionally, taking the ith path node as an example, the computer device may obtain a minimum sample number, sort the V characteristic distances, obtain a tth characteristic distance from the sorted V characteristic distances, and determine the tth characteristic distance as a first node distance of the ith path node, where t is the minimum sample number, and optionally, may refer to the first node distance of the ith path node as core (li). Further, determining a node distance between the ith path node and the pth path node from the first node distance of the ith path node, the first node distance of the pth path node and the characteristic distance between the node association characteristic of the ith path node and the node association characteristic of the pth path node until obtaining node distances between the ith path node and the V path nodes respectively; p is a positive integer less than or equal to V. The node distance between the ith path node and the pth path node may be a maximum value of a first node distance of the ith path node, a first node distance of the pth path node, and a feature distance between a node-associated feature of the ith path node and a node-associated feature of the pth path node, such as max { core (Li), core (Lp), dis (Li, Lp) }, or may be a mean value, such as ave { core (Li), core (Lp), dis (Li, Lp) }, and the like, which is not limited herein. Wherein, core (Li) is used for representing the first node distance of the ith path node, core (Lp) is used for representing the first node distance of the p path node, and dis (Li, Lp) is used for representing the characteristic distance between the node association characteristic of the ith path node and the node association characteristic of the p path node.

Further, when the path node tree is segmented based on the reachable distances corresponding to the V path nodes respectively to obtain k sub-path node trees, the computer device may sort the reachable distances corresponding to the V path nodes respectively, sequentially segment tree edges corresponding to the sorted reachable distances in the path node tree until the number of path nodes included in the segmented subtree is less than or equal to the minimum cluster size, and determine the segmented subtree obtained when the number of path nodes is less than or equal to the minimum cluster size as the k sub-path node trees; or until the difference between the number of path nodes included in the sub-tree obtained by splitting and the minimum cluster size is smaller than or equal to the node number threshold, determining the sub-tree obtained when the difference is smaller than or equal to the node number threshold as a k sub-path node tree, and the like.

For example, please refer to fig. 7, and fig. 7 is a schematic diagram of a tree splitting scene according to an embodiment of the present application. As shown in fig. 7, in the path node tree 701, the tree node represents a path node, the tree edge represents a minimum adjacent node connecting the path node and the path node, and the weight of the tree edge represents the reachable distance of the corresponding path node. The computer device may sort the reachable distances corresponding to the V path nodes, sequentially split the tree edges corresponding to the sorted reachable distances in the path node tree, and assuming that the smallest reachable distance corresponds to the tree edge 7011, split the tree edge 7011 to obtain a subtree 7021 and a subtree 7022; assuming that the second smallest reachable distance corresponds to the tree edge 7012 and is located in the subtree 7022, splitting the tree edge 7012 of the subtree 7022 to obtain a subtree 7031 and a subtree 7032; …, respectively; until k subtrees are obtained, taking k greater than or equal to 4 as an example, such as a subtree 7041, a subtree 7042, a subtree 7043, a subtree 704k, and the like, wherein the number of path nodes included in each subtree can be considered to satisfy a minimum cluster size, such as a minimum cluster size or a difference from the minimum cluster size less than or equal to a node number threshold. The k subtrees are determined as k subtree node trees.

Optionally, in a node set generation manner, the computer device may obtain node distances between the ith path node and the V path nodes, respectively, based on the node association characteristics, determine a path node with the smallest node distance as a smallest adjacent node of the ith path node, and determine the smallest node distance as the reachable distance of the ith path node until the smallest adjacent node and the reachable distance corresponding to the V path nodes, respectively, are obtained; i is a positive integer less than or equal to V. The node distance, the reachable distance, the minimum adjacent node, and the like may be obtained in the above-mentioned manner for generating the node set. Furthermore, V path nodes can be used as subtree nodes, the reachable distances corresponding to the V path nodes are sorted from small to large, and subtree edges between the path nodes corresponding to the sorted reachable distances and the minimum adjacent node are sequentially constructed until k subtree node trees composed of the V path nodes are obtained; k is a positive integer, and the number of path nodes included in each sub-path node tree is greater than or equal to the minimum cluster size. And forming a node set for carrying out region division on the road network by using the path nodes included in each sub-path node tree.

For example, referring to fig. 8, fig. 8 is a schematic diagram of a tree-building scene provided in an embodiment of the present application. As shown in fig. 8, assuming that V path nodes include path node 8011, path node 8012, path node 8013, path node 8014, path node 8015, and path node 8016, the reachable distances corresponding to the V path nodes are sorted from small to large, and it is assumed that { (path node 8015, path node 8016, 3), (path node 8014, path node 8013, 5), (path node 8011, path nodes 8012, 6), (path node 8016, path node 8011, 9) … } are obtained, where (L1, L2, dismin) is used to indicate that the minimum neighboring node of path node L1 is path node L2, and the reachable distance of path node L1 is dismin, e.g., (path node 8011, path node 8012, 6) indicates that the minimum neighboring node of path node 8011 is path node 8012 and the reachable distance is 6. Sub-tree edges between the path node corresponding to the sorted reachable distances and the minimum adjacent node are sequentially constructed, for example, a sub-tree edge between a path node 8015 and a path node 8016 corresponding to a reachable distance "3", and a sub-tree edge between a path node 8014 and a path node 8013 corresponding to a reachable distance "5", and … are sequentially constructed until k sub-path node trees composed of V path nodes are obtained, and a sub-path node tree 8021 and a sub-path node tree 8022 are obtained assuming that the minimum cluster size is 2. The path nodes included in the sub-path node tree are grouped into a node set for dividing the area of the road network, for example, the node set corresponding to the sub-path node tree 8021 includes a path node 8011, a path node 8012, a path node 8015 and a path node 8016, and the node set corresponding to the sub-path node tree 8022 includes a path node 8013 and a path node 8014.

Optionally, in a node set generation manner, the computer device may obtain k initial clustering centers, and obtain initial clustering distances from the V path nodes to the k initial clustering centers, respectively, based on the node association characteristics. And dividing the V path nodes into initial sets corresponding to the k initial clustering centers based on the initial clustering distances from the V path nodes to the k initial clustering centers respectively. And acquiring updated clustering centers corresponding to the k initial sets respectively, and dividing the V path nodes into the updated sets corresponding to the k updated clustering centers on the basis of the updated clustering distances from the V path nodes to the updated clustering centers corresponding to the k initial sets respectively. And if the k updating sets do not meet the node clustering condition, determining the k updating sets as k initial sets, and returning to execute the process of obtaining the updating clustering centers respectively corresponding to the k initial sets. And if the k update sets meet the node clustering condition, determining the k update sets as node sets for carrying out region division on the road network.

Optionally, in a node set generation manner, the computer device may obtain path nodes to be processed from the V path nodes; the path node to be processed refers to a path node which is not subjected to node clustering processing. Acquiring the adjacency number of path nodes positioned in the node neighborhood of the path node to be processed; the path node located in the node neighborhood of the path node to be processed refers to the path node of which the characteristic distance with the path node to be processed is smaller than or equal to the neighborhood radius; the characteristic distance refers to a distance between the node association characteristic of the corresponding path node and the node association characteristic of the path node to be processed. And if the adjacent quantity is greater than or equal to the minimum set node number, performing node expansion based on the path node to be processed and the path nodes located in the node neighborhoods of the path nodes to be processed to obtain density reachable nodes corresponding to the path nodes to be processed, forming a node set for performing region division on the road network by the path nodes to be processed and the density reachable nodes, and returning to execute the process of acquiring the path nodes to be processed from the V path nodes until the path nodes to be processed do not exist in the V path nodes. And if the number of the adjacency is less than the minimum set node number, returning to execute the process of acquiring the path nodes to be processed from the V path nodes.

The above is several optional node set generation modes.

Optionally, the computer device may obtain node attribute types corresponding to the V path nodes, and obtain node attribute features corresponding to the V path nodes in the node attribute types, respectively. The node attribute type includes, but is not limited to, a geographical location type, a path direction type, an area code type, and the like, where the area code type is used to indicate an encoded attribute type of an area where the path node is located. And performing feature fusion on the node association features respectively corresponding to the V path nodes and the node attribute features respectively corresponding to the V path nodes to obtain node fusion features respectively corresponding to the V path nodes. And clustering the V path nodes based on the node fusion characteristics corresponding to the V path nodes respectively to obtain a node set for carrying out region division on the road network. For the process of clustering the V path nodes based on the node fusion characteristics, reference may be made to the process of clustering the V path nodes directly based on the node association characteristics (i.e., the above-described node set generation manner), and the node association characteristics in the process are replaced with the node fusion characteristics, that is, the manner of clustering the V path nodes based on the node fusion characteristics may be considered, which is not described herein again.

Optionally, the analysis scenario from the path node to the path node may be generalized to a scenario from the path node to the area, or from the area to the path node, or from the area to the area, so as to improve the coverage of the path node, increase the data size available for analysis to a certain extent, and improve the efficiency and accuracy of node analysis.

For example, the analysis scenario of path node to path node can be generalized to the analysis scenario of path node to area, and can be used for path query and traffic analysis. For example, in a possible scenario, a path query request from a start path node to a stop path node is responded, and a target node set where the stop path node is located is obtained; the start path node belongs to the V path nodes, and the end path node belongs to the V path nodes. The method comprises the steps of obtaining path nodes included by a target node set, obtaining a first track path from a starting path node to the path nodes included by the target node set, and obtaining a second track path from the path nodes included by the target node set to a terminating path node. And determining a target track path from the starting path node to the ending path node according to the first track path and the second track path. That is to say, the target track path includes a track path from the start path node directly to the end path node, and also includes any transit path node from the start path node to the area where the end path node is located (i.e. the target node set), and then the track path from the transit path node to the end path node, so that the generalization and fault tolerance of the path planning are improved, and when a track path with poor traffic conditions (such as congestion) exists, the track path reaching the destination (i.e. the end path node) can still be obtained, thereby improving the practicability of the path planning.

For example, the method can be used for path query, traffic analysis, familiar road modeling, and the like in an analysis scenario of path node to path node, and an analysis scenario of area to area. For example, in a possible scenario, in response to an explicit path modeling request for a first path node to be detected to a second path node to be detected, a first node set where the first path node to be detected is located is obtained, a second node set where the second path node to be detected is located is obtained, and a historical path trajectory from a first generalized path node in the first node set to the second generalized path node in the second node set is obtained until a historical path trajectory from any one path node in the first node set to any one path node in the second node set is obtained, where the first generalized path node is any one path node in the first node set, and the second generalized path node is any one path node in the second node set. And obtaining an acquaintance from a first area corresponding to the first node set to a second area corresponding to the second node set according to the historical path track from any path node in the first node set to any path node in the second node set. The historical path track from any path node in the first node set to any path node in the second node set can be determined as the mature path from the first area corresponding to the first node set to the second area corresponding to the second node set. Or, the historical path track whose occurrence number is greater than or equal to the threshold of the frequency of the mature roads may be determined as the mature roads from the first area corresponding to the first node set to the second area corresponding to the second node set. Or the historical path tracks may be sorted based on the occurrence times of the historical path tracks, and the mature roads from the first area corresponding to the first node set to the second area corresponding to the second node set are obtained from the sorted historical path tracks.

For example, in one possible scenario, in response to a selection request for a downstream path node, a first request path node and a second request path node corresponding to the selection request are obtained, a third node set where the second request path node is located is obtained, and a historical path trajectory of the path node included in the first request path node to the third node set is obtained. And acquiring historical selection frequency of adjacent nodes of the first request path node in a historical path track, and acquiring downstream path nodes of the first request path node from the adjacent nodes of the first request path node based on the historical selection frequency corresponding to the adjacent nodes of the first request path node.

The above exemplary scenarios to which the present application can be applied are not limited to the above scenarios, and may be applied to scenarios in which path nodes need to be generalized.

In the embodiment of the present application, a path sequence capable of representing an association relationship between path nodes may be obtained by performing walking traversal on V path nodes, that is, the path sequence may represent structural features between included path nodes, that is, a connection relationship between a path node and a path node, and due to the walking traversal on the path nodes, adjacent path nodes in the path sequence may be similar to each other, and homogeneity between the path node and the path node is maintained to a certain extent, therefore, a node pair is constructed by the path sequence, so that a co-occurrence probability between a first path node and a second path node in the node pair is relatively high, and thus, the node association features extracted by the node pair can represent structures between the path node and the path node, and further, the structure is maintained in a clustering result, the accuracy of road network region division is improved.

Optionally, referring to fig. 9, fig. 9 is a schematic diagram of a road network region division flow provided in the embodiment of the present application. As shown in fig. 9, the process includes the steps of:

in step S901, the track path is wandered.

In this embodiment of the application, the computer device may obtain V path nodes included in a road network, perform trajectory-path walking traversal on the V path nodes, and obtain M path sequences composed of the V path nodes, which may be described in detail in steps S201 to S202 in fig. 2.

Step S902, train node association features of the path nodes.

In this embodiment, the computer device may obtain node pairs from the M path sequences, predict node association probabilities between the V path nodes and the first path nodes, respectively, based on the first path nodes in the node pairs, and determine node association characteristics corresponding to the V path nodes, respectively, through the node association probabilities corresponding to the V path nodes, respectively, and the second path nodes in the node pairs. Specific reference may be made to the specific description shown in step S203 to step S204 in fig. 2. For example, in this case, node clustering is performed, and a clustering result as shown in fig. 10 may be generated, that is, fig. 10 is a schematic diagram of an associated clustering scenario provided in this embodiment of the present application, and when node clustering processing is performed on the road network 1001, a certain path node in the area 1002 and a certain path node in the area 1003 may be clustered into a node set under a very small probability, and therefore, step S903 may be executed. That is, when node clustering is performed based on the node-associated feature, the amount of data to be processed can be reduced, and the efficiency of feature processing can be improved. Further optionally, step S903 may be executed to improve the accuracy of node clustering.

And step S903, fusing the geographic position information in the node association characteristics.

In this embodiment of the application, the computer device may obtain node attribute types corresponding to V path nodes, take a geographical location attribute as an example, obtain geographical location information of the V path nodes, obtain node attribute features corresponding to V path nodes, respectively, based on the V geographical location information, perform feature fusion on node association features corresponding to V path nodes, and perform feature fusion with the node attribute features corresponding to V path nodes, respectively, to obtain node fusion features corresponding to V path nodes, respectively. For example, the geographic location information includes location longitude information and location latitude information, and the computer device may perform feature concatenation on the node association feature and the node attribute feature of the ith path node to obtain a node fusion feature of the ith path node. For example, the dimension of the node-associated feature is N, and taking the node attribute feature as a longitude and latitude feature as an example, the node fusion feature (node-associated feature, longitude, and latitude) is obtained, and the dimension of the node fusion feature is assumed to be (N + 2). Or, the attribute dimension in the node association feature may be acquired, and the node attribute feature may be added to the attribute dimension of the node association feature to obtain the node fusion feature. For example, a longitude feature corresponding to position longitude information may be added to a first dimension of the node-associated feature, and a latitude feature corresponding to position latitude information may be added to a second dimension of the node-associated feature, so as to obtain the node fusion feature, where the first dimension is an odd number of the node-associated feature, and the second dimension is an even number of the node-associated feature, or the first dimension is an even number of the node-associated feature, and the second dimension is an odd number of the node-associated feature.

Step S904, clustering the V path nodes.

In this embodiment, the computer device may cluster the V path nodes to obtain a node set for performing region division on the road network. Specific reference may be made to the specific description shown in step S205 in fig. 2.

Fig. 11 can be referred to for trajectory path walking, and fig. 11 is a flowchart of a sequence generation method provided in the embodiment of the present application. As shown in fig. 11, the process may include the steps of:

step S1101, a path node is traversed.

In the embodiment of the present application, the computer device may traverse the path nodes, specifically, may traverse the road network, to obtain V path nodes constituting the road network.

In step S1102, an adjacent node of the path node is acquired.

In this embodiment, the computer device may obtain adjacent nodes corresponding to the V path nodes, respectively.

And step S1103, the nodes randomly walk and traverse.

In an embodiment of the present application, the computer device may perform a random walk traversal on the V path nodes.

In step S1104, a path sequence is generated.

In an embodiment of the present application, the computer device may generate M path sequences composed of V path nodes based on the random walk traversal result.

The above steps S1101 to S1104 can be referred to the specific description shown in steps S201 to S202 in fig. 2.

Further, please refer to fig. 12, wherein fig. 12 is a schematic diagram of a data processing apparatus according to an embodiment of the present application. The data processing means may be a computer program (comprising program code etc.) running on a computer device, for example the data processing means may be an application software; the apparatus may be used to perform the corresponding steps in the methods provided by the embodiments of the present application. As shown in fig. 12, the data processing apparatus 1200 may be used in the computer device in the embodiment corresponding to fig. 2, and specifically, the apparatus may include: the node clustering system comprises a node acquisition module 11, a sequence acquisition module 12, a node pair acquisition module 13, a feature determination module 14 and a node clustering module 15.

A node obtaining module 11, configured to obtain V path nodes;

the sequence obtaining module 12 is configured to perform wandering traversal on the V path nodes to obtain M path sequences composed of the V path nodes; v is a positive integer; m is a positive integer;

a node pair obtaining module 13, configured to obtain node pairs from the M path sequences;

a feature determining module 14, configured to predict, based on a first path node in a node pair, node association probabilities between V path nodes and the first path node, and determine, through the node association probabilities corresponding to the V path nodes respectively and a second path node in the node pair, node association features corresponding to the V path nodes respectively; the first path node and the second path node are in the same path sequence;

and the node clustering module 15 is configured to perform clustering processing on the V path nodes based on the node association characteristics to obtain a node set for performing area division on the road network.

Wherein the V path nodes include path node i; the M path sequences comprise path sequences corresponding to path nodes i; i is a positive integer less than or equal to V;

the sequence acquisition module 12 includes:

the first walking unit 12a is configured to select, with the path node i as a sequence starting point, a second sequence node corresponding to the sequence starting point from adjacent nodes of the path node i, and select a third sequence node corresponding to the sequence starting point from adjacent nodes of the second sequence node until a jth sequence node corresponding to the sequence starting point is obtained; j is a positive integer;

the first sequence determining unit 12b is configured to determine, if there is no adjacent node in the jth sequence node corresponding to the sequence starting point or j is a sequence length threshold, the jth sequence node corresponding to the sequence starting point as a sequence end point corresponding to the sequence starting point, and determine the path sequence corresponding to the path node i based on the sequence starting point to the sequence end point.

the sequence acquisition module 12 includes:

a second walking unit 12c, configured to determine, with the path node i as a sequence starting point, an adjacent node of the path node i as a first subsequence associated with the sequence starting point, and determine, as a second subsequence associated with the sequence starting point, an adjacent node of the path node included in the first subsequence until a d-th subsequence associated with the sequence starting point is obtained; d is a positive integer;

the second sequence determining unit 12d is configured to determine, according to the start point of the sequence and the path nodes included in the first to the d-th subsequences associated with the start point of the sequence, a path sequence corresponding to the path node i if no adjacent node exists in the path nodes included in the d-th subsequence, or if the start point of the sequence and the total number of the path nodes included in the first to the d-th subsequences associated with the start point of the sequence are greater than or equal to a sequence length threshold.

Wherein, the sequence acquiring module 12 includes:

a complete traversal unit 12e, configured to take the V path nodes as first sequence starting points, and perform wandering traversal in the V path nodes by using the V first sequence starting points, respectively, to obtain first traversal sequences corresponding to the V path nodes, respectively;

the random traversal unit 12f is configured to randomly select a second sequence starting point from the V path nodes, and perform walking traversal in the V path nodes using the second sequence starting point to obtain a second traversal sequence corresponding to the second sequence starting point;

the third sequence determining unit 12g is configured to determine M path sequences according to the first traversal sequences corresponding to the V path nodes respectively and the second traversal sequence corresponding to the start point of the second sequence.

The node pair obtaining module 13 includes:

a sequence selecting unit 13a, configured to determine, as a target path sequence, a path sequence including an ith path node of the V path nodes among the M path sequences; i is a positive integer less than or equal to V;

a co-occurrence obtaining unit 13b, configured to obtain a size of a node co-occurrence window, and in the target path sequence, obtain a co-occurrence path node whose sequence distance from the ith path node is smaller than or equal to the size of the node co-occurrence window;

a node pair forming unit 13c, configured to form a node pair corresponding to the ith path node from the co-occurrence path node associated with the ith path node and the ith path node; the ith path node is a first path node in a node pair corresponding to the ith path node; the co-occurrence path node associated with the ith path node is the second path node in the node pair corresponding to the ith path node.

The node pair obtaining module 13 includes:

a frequency screening unit 13d, configured to obtain occurrence frequencies corresponding to the V path nodes respectively from the M path sequences, and obtain a first path node from the V path nodes based on the occurrence frequencies;

the node pair forming unit 13c is further configured to obtain, from the M path sequences, a second path node in the same path sequence as the first path node, and form a node pair from the first path node and the second path node.

Wherein, the feature determining module 14 includes:

the feature recognition unit 14a is configured to obtain a feature recognition model, and obtain a first node feature of a first path node in the node pair; the characteristic identification model comprises a first initial parameter matrix and a second initial parameter matrix;

the feature conversion unit 14b is configured to input the first node feature into the feature recognition model, and perform feature conversion on the first node feature by using a first initial parameter matrix in the feature recognition model to obtain a hidden feature;

the probability prediction unit 14c is configured to perform feature prediction on the hidden feature by using a second initial parameter matrix in the feature recognition model to obtain node association probabilities between the V path nodes and the first path nodes, respectively;

a parameter adjusting unit 14d, configured to perform parameter adjustment on the first initial parameter matrix and the second initial parameter matrix according to the node association probabilities respectively corresponding to the V path nodes and a second path node in the node pair;

the matrix generating unit 14e is configured to obtain a first parameter matrix corresponding to the first initial parameter matrix and a second parameter matrix corresponding to the second initial parameter matrix; the first parameter matrix and the second parameter matrix are used for predicting node association probability matched with the node pairs;

the feature determining unit 14f is configured to determine node association features corresponding to the V path nodes from the first parameter matrix or the second parameter matrix.

The probability prediction unit 14c includes:

the feature prediction subunit 141c is configured to perform feature prediction on the hidden feature by using the second initial parameter matrix in the feature recognition model to obtain path features corresponding to the V path nodes, respectively;

the normalization processing subunit 142c is configured to perform feature normalization processing on the path features corresponding to the V path nodes, respectively, to obtain node association probabilities between the V path nodes and the first path node, respectively.

Wherein, the parameter adjusting unit 14d includes:

a probability obtaining subunit 141d, configured to obtain, from the node association probabilities corresponding to the V path nodes, a target association probability of a second path node in the node pair;

the parameter adjusting subunit 142d is configured to generate a loss function according to the target association probability and the node association probability corresponding to the remaining path node, and perform parameter adjustment on the first initial parameter matrix and the second initial parameter matrix based on the loss function; the remaining path node refers to a path node other than the first path node and the second path node in the node pair among the V path nodes.

Wherein, the node clustering module 15 includes:

the attribute obtaining unit 15a is configured to obtain node attribute types corresponding to the V path nodes, and obtain node attribute features corresponding to the V path nodes in the node attribute types, respectively;

the feature fusion unit 15b is configured to perform feature fusion on the node association features respectively corresponding to the V path nodes and the node attribute features respectively corresponding to the V path nodes to obtain node fusion features respectively corresponding to the V path nodes;

and the fusion clustering unit 15c is configured to perform clustering processing on the V path nodes based on node fusion characteristics corresponding to the V path nodes, respectively, to obtain a node set for performing area division on the road network.

Wherein, the node clustering module 15 includes:

a distance obtaining unit 15d, configured to obtain node distances between the ith path node and the V path nodes, respectively, based on the node association characteristics;

a distance determining unit 15e, configured to determine a path node with a minimum node distance as a minimum adjacent node of the ith path node, and determine the minimum node distance as an reachable distance of the ith path node until obtaining minimum adjacent nodes and reachable distances corresponding to the V path nodes, respectively; i is a positive integer less than or equal to V;

a tree construction unit 15f, configured to construct a tree edge between minimum adjacent nodes corresponding to the V path nodes and the V path nodes, with the V path nodes as tree nodes, and construct a path node tree;

the tree splitting unit 15g is configured to split the path node tree based on the reachable distances corresponding to the V path nodes, respectively, to obtain k sub-path node trees; k is a positive integer;

the set composing unit 15h is configured to compose a node set for performing region division on the road network by using the path nodes included in each sub-path node tree.

The interval obtaining unit 15d includes:

a distance obtaining subunit 151d, configured to obtain node association features of the ith path node, feature distances between the node association features and the node association features corresponding to the V path nodes, and obtain a first node distance of the ith path node from the V feature distances until obtaining first node distances corresponding to the V path nodes, respectively;

a distance determining subunit 152d, configured to determine, from a first node distance of an ith path node, a first node distance of a pth path node, and a feature distance between a node association feature of the ith path node and a node association feature of the pth path node, a node distance between the ith path node and the pth path node until obtaining node distances between the ith path node and the V path nodes, respectively; p is a positive integer less than or equal to V.

Wherein, this tree segmentation unit 15g specifically is used for:

Wherein, the node clustering module 15 includes:

the distance obtaining unit 15d is configured to obtain node distances between the ith path node and the V path nodes, respectively, based on the node association characteristics;

the distance determining unit 15e is configured to determine a path node with a minimum node distance as a minimum adjacent node of the ith path node, and determine the minimum node distance as an reachable distance of the ith path node until obtaining minimum adjacent nodes and reachable distances corresponding to the V path nodes, respectively; i is a positive integer less than or equal to V;

the edge connection unit 15i is configured to use the V path nodes as sub-tree nodes, sort the reachable distances respectively corresponding to the V path nodes from small to large, and sequentially construct sub-tree edges between the path nodes corresponding to the sorted reachable distances and the minimum adjacent node until k sub-path node trees composed of the V path nodes are obtained; k is a positive integer, and the number of path nodes included in each sub-path node tree is greater than or equal to the minimum cluster size;

the set composing unit 15h is configured to compose path nodes included in each sub-path node tree into a node set for performing region division on the road network.

Wherein, the node clustering module 15 includes:

the data acquisition unit 15j is configured to acquire k initial clustering centers, and acquire initial clustering distances from the V path nodes to the k initial clustering centers, respectively, based on the node association characteristics;

the initial clustering unit 15k is configured to divide the V path nodes into initial sets corresponding to the k initial clustering centers based on initial clustering distances from the V path nodes to the k initial clustering centers, respectively;

update clustering unit 15lThe system comprises a plurality of initial sets, a plurality of updating cluster centers and a plurality of path nodes, wherein the updating cluster centers respectively correspond to the k initial sets;

a clustering iteration unit 15m, configured to determine k update sets as k initial sets if the k update sets do not satisfy the node clustering condition, and determine that the k update sets pass through the update clustering unit 15lReturning to execute the process of obtaining the updated clustering centers corresponding to the k initial sets respectively;

and a set determining unit 15n, configured to determine the k update sets as node sets for performing region division on the road network if the k update sets satisfy the node clustering condition.

Wherein, the node clustering module 15 includes:

a to-be-processed acquiring unit 15o configured to acquire a to-be-processed path node from the V path nodes; the path node to be processed refers to a path node which is not subjected to node clustering processing;

a number obtaining unit 15p configured to obtain the number of neighbors of the path node located in the node neighborhood of the path node to be processed; the path node located in the node neighborhood of the path node to be processed refers to a path node of which the characteristic distance with the path node to be processed is smaller than or equal to the radius of the neighborhood; the characteristic distance refers to the distance between the node association characteristic of the corresponding path node and the node association characteristic of the path node to be processed;

a node expansion unit 15q configured to, if the number of neighbors is greater than or equal to the minimum set node number, perform node expansion based on the path node to be processed and the path nodes located in the node neighborhoods of the path nodes to be processed to obtain density reachable nodes corresponding to the path nodes to be processed, form a node set for performing area division on the road network from the path nodes to be processed and the density reachable nodes, and return to execute the process of obtaining the path nodes to be processed from the V path nodes by the acquisition unit 15o to be processed until no path nodes to be processed exist in the V path nodes;

the node processing unit 15r is configured to return to execute the process of acquiring the path node to be processed from the V path nodes through the to-be-processed acquiring unit 15o if the number of neighbors is smaller than the minimum number of aggregation nodes.

Wherein, the apparatus 1200 further comprises:

a path query module 16, configured to respond to a path query request from a start path node to a stop path node, and obtain a target node set where the stop path node is located; the starting path node belongs to V path nodes, and the ending path node belongs to V path nodes;

the track acquisition module 17 is configured to acquire a path node included in the target node set, acquire a first track path from a start path node to the path node included in the target node set, and acquire a second track path from the path node included in the target node set to a stop path node;

and a track determining module 18, configured to determine a target track path from the start path node to the end path node according to the first track path and the second track path.

The embodiment of the application provides a data processing device, which can obtain V path nodes, and obtains M path sequences consisting of the V path nodes by performing wandering traversal on the V path nodes; acquiring node pairs from the M path sequences, predicting node association probabilities between the V path nodes and the first path nodes respectively based on the first path nodes in the node pairs, and determining node association characteristics corresponding to the V path nodes respectively through the node association probabilities corresponding to the V path nodes respectively and second path nodes in the node pairs; the first path node and the second path node are in the same path sequence; and clustering the V path nodes based on the node association characteristics to obtain a node set for carrying out region division on the network. The path sequence which can reflect the incidence relation among all path nodes is obtained by carrying out wandering traversal on the V path nodes, that is, the path sequence may represent structural features between included path nodes, that is, connection relationships between the path nodes and the path nodes, and further, due to the wandering traversal of path nodes, such that adjacent path nodes in the path sequence may be similar to each other, homogeneity between path nodes is preserved to some extent, and therefore, node pairs are constructed by path sequences, so that the co-occurrence probability between the first path node and the second path node in the node pair is larger, therefore, the extracted node association characteristics can embody the structure between the path nodes through the node pairs, and further, the structural property is kept in the clustering result, and the accuracy of road network region division is improved.

Referring to fig. 13, fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 13, the computer device in the embodiment of the present application may include: one or more processors 1301, memory 1302, and input-output interfaces 1303. The processor 1301, the memory 1302, and the input/output interface 1303 are connected by a bus 1304. The memory 1302 is used for storing a computer program including program instructions, and the input/output interface 1303 is used for receiving data and outputting data; processor 1301 is configured to execute program instructions stored by memory 1302.

The processor 1301 may perform the following operations:

In some possible implementations, the processor 1301 may be a Central Processing Unit (CPU), or other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 1302 may include both read-only memory and random-access memory, and provides instructions and data to the processor 1301 and the input/output interface 1303. A portion of the memory 1302 may also include non-volatile random access memory. For example, memory 1302 may also store information of the device type.

In a specific implementation, the computer device may execute the implementation manners provided in the steps in fig. 2 through the built-in functional modules, which may specifically refer to the implementation manners provided in the steps in fig. 2, and are not described herein again.

The embodiment of the present application provides a computer device, including: the system comprises a processor, an input/output interface and a memory, wherein the processor acquires a computer program in the memory, executes each step of the method shown in the figure 2 and performs data processing operation. The method and the device for obtaining the path sequence have the advantages that V path nodes are obtained, and M path sequences formed by the V path nodes are obtained by performing wandering traversal on the V path nodes; acquiring node pairs from the M path sequences, predicting node association probabilities between the V path nodes and the first path nodes respectively based on the first path nodes in the node pairs, and determining node association characteristics corresponding to the V path nodes respectively through the node association probabilities corresponding to the V path nodes respectively and second path nodes in the node pairs; the first path node and the second path node are in the same path sequence; and clustering the V path nodes based on the node association characteristics to obtain a node set for carrying out region division on the network. The path sequence which can reflect the incidence relation among all path nodes is obtained by carrying out wandering traversal on the V path nodes, that is, the path sequence may represent structural features between included path nodes, that is, connection relationships between the path nodes and the path nodes, and further, due to the wandering traversal of path nodes, such that adjacent path nodes in the path sequence may be similar to each other, homogeneity between path nodes is preserved to some extent, and therefore, node pairs are constructed by path sequences, so that the co-occurrence probability between the first path node and the second path node in the node pair is larger, therefore, the extracted node association characteristics can embody the structure between the path nodes through the node pairs, therefore, the structural property is kept in the clustering result, and the accuracy of road network region division is improved.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, where the computer program is suitable for being loaded by the processor and executing the data processing method provided in each step in fig. 2, and for details, reference may be made to implementation manners provided in each step in fig. 2, and details are not described here again. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application. By way of example, a computer program can be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network.

The computer readable storage medium may be the data processing apparatus provided in any of the foregoing embodiments or an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, provided on the computer device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the computer device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the computer device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device executes the method provided in the various optional manners in fig. 2, thereby implementing the wandering traversal of V path nodes, obtaining a path sequence that can embody the association relationship between the path nodes, that is, the path sequence can represent the structural features between the included path nodes, that is, the connection relationship between the path nodes, and moreover, because of the wandering traversal of the path nodes, the adjacent path nodes in the path sequence are likely to be similar, and the homogeneity between the path nodes is retained to some extent, therefore, the node pair is constructed by the path sequence, so that the co-occurrence probability between the first path node and the second path node in the node pair is greater, therefore, the structure between the path nodes can be embodied through the node association characteristics extracted by the node pairs, the structural property is further kept in the clustering result, and the accuracy of road network region division is improved.

The terms "first," "second," and the like in the description and in the claims and drawings of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprises" and any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to the listed steps or modules, but may alternatively include other steps or modules not listed or inherent to such process, method, apparatus, product, or apparatus.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the specification for the purpose of clearly illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The method and the related apparatus provided by the embodiments of the present application are described with reference to the flowchart and/or the structural diagram of the method provided by the embodiments of the present application, and each flow and/or block of the flowchart and/or the structural diagram of the method, and the combination of the flow and/or block in the flowchart and/or the block diagram can be specifically implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block or blocks.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device can be combined, divided and deleted according to actual needs.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and should not be taken as limiting the scope of the present application, so that the present application will be covered by the appended claims.

Claims

1. A method of data processing, the method comprising:

acquiring V path nodes, acquiring a sequence starting point from the V path nodes, and performing wandering traversal on the V path nodes based on the sequence starting point to obtain M path sequences consisting of the V path nodes; v is a positive integer; m is a positive integer; the number of the sequence starting points is one or at least two;

clustering the V path nodes based on the node association characteristics to obtain a node set for carrying out region division on a network; the road network comprises the V path nodes.

2. The method of claim 1, wherein the V path nodes comprise path node i; the M path sequences comprise path sequences corresponding to the path nodes i; i is a positive integer less than or equal to V;

the obtaining of the sequence starting point from the V path nodes, and performing wandering traversal on the V path nodes based on the sequence starting point to obtain M path sequences composed of the V path nodes, includes:

selecting a second sequence node corresponding to the sequence starting point from adjacent nodes of the path node i by taking the path node i as the sequence starting point, and selecting a third sequence node corresponding to the sequence starting point from adjacent nodes of the second sequence node until a jth sequence node corresponding to the sequence starting point is obtained; j is a positive integer;

and if the jth sequence node corresponding to the sequence starting point does not have an adjacent node or j is a sequence length threshold, determining the jth sequence node corresponding to the sequence starting point as a sequence end point corresponding to the sequence starting point, and determining the path sequence corresponding to the path node i based on the sequence starting point to the sequence end point.

3. The method of claim 1, wherein the V path nodes comprise path node i; the M path sequences comprise path sequences corresponding to the path nodes i; i is a positive integer less than or equal to V;

the obtaining a sequence starting point from the V path nodes, and performing wandering traversal on the V path nodes based on the sequence starting point to obtain M path sequences composed of the V path nodes, includes:

determining adjacent nodes of the path node i as a first subsequence associated with the sequence starting point by taking the path node i as the sequence starting point, and determining adjacent nodes of the path node included in the first subsequence as a second subsequence associated with the sequence starting point until a d-th subsequence associated with the sequence starting point is obtained; d is a positive integer;

if no adjacent node exists in the path node included in the d-th subsequence, or the total number of the path nodes included in the sequence starting point and the first to d-th subsequences associated with the sequence starting point is greater than or equal to a sequence length threshold value, determining the path sequence corresponding to the path node i according to the sequence starting point and the path nodes included in the first to d-th subsequences associated with the sequence starting point.

4. The method of claim 1, wherein said obtaining a sequence start point from said V path nodes, and performing a walk-through traversal on said V path nodes based on said sequence start point to obtain M path sequences consisting of said V path nodes, comprises:

respectively taking the V path nodes as first sequence starting points, and respectively performing wandering traversal in the V path nodes by using the V first sequence starting points to obtain first traversal sequences respectively corresponding to the V path nodes;

randomly selecting a second sequence starting point from the V path nodes, and performing wandering traversal in the V path nodes by using the second sequence starting point to obtain a second traversal sequence corresponding to the second sequence starting point;

and determining M path sequences according to the first traversal sequences corresponding to the V path nodes respectively and the second traversal sequence corresponding to the starting point of the second sequence.

5. The method of claim 1, wherein said obtaining node pairs from said sequence of M paths comprises:

determining a path sequence containing the ith path node in the V path nodes as a target path sequence in the M path sequences; i is a positive integer less than or equal to V;

acquiring a node co-occurrence window size, and acquiring co-occurrence path nodes of which the sequence distance from the ith path node is smaller than or equal to the node co-occurrence window size in the target path sequence;

combining the co-occurrence path node associated with the ith path node and the ith path node into a node pair corresponding to the ith path node; the ith path node is the first path node in the node pair corresponding to the ith path node; the co-occurrence path node associated with the ith path node is the second path node in the node pair corresponding to the ith path node.

6. The method of claim 1, wherein said obtaining node pairs from said sequence of M paths comprises:

acquiring occurrence frequencies respectively corresponding to the V path nodes from the M path sequences, and acquiring the first path node from the V path nodes based on the occurrence frequencies;

and acquiring the second path nodes in the same path sequence as the first path nodes from the M path sequences, and forming node pairs by the first path nodes and the second path nodes.

7. The method of claim 1, wherein predicting node association probabilities between the V path nodes and the first path node, respectively, based on the first path node in the node pair, and determining node association characteristics corresponding to the V path nodes, respectively, through the node association probabilities corresponding to the V path nodes, respectively, and the second path node in the node pair, comprises:

acquiring a feature recognition model, and acquiring a first node feature of a first path node in the node pair; the feature recognition model comprises a first initial parameter matrix and a second initial parameter matrix;

inputting the first node characteristics into the characteristic recognition model, and performing characteristic conversion on the first node characteristics by adopting the first initial parameter matrix in the characteristic recognition model to obtain hidden characteristics;

performing feature prediction on the hidden features by adopting the second initial parameter matrix in the feature recognition model to obtain node association probabilities between the V path nodes and the first path nodes respectively;

performing parameter adjustment on the first initial parameter matrix and the second initial parameter matrix through the node association probabilities corresponding to the V path nodes respectively and the second path node in the node pair until a first parameter matrix corresponding to the first initial parameter matrix and a second parameter matrix corresponding to the second initial parameter matrix are obtained; the first parameter matrix and the second parameter matrix are used for predicting node association probability matched with the node pair;

and determining node association characteristics corresponding to the V path nodes respectively from the first parameter matrix or the second parameter matrix.

8. The method of claim 7, wherein the performing feature prediction on the hidden feature by using the second initial parameter matrix in the feature recognition model to obtain node association probabilities between the V path nodes and the first path node respectively comprises:

performing feature prediction on the hidden features by using the second initial parameter matrix in the feature recognition model to obtain path features corresponding to the V path nodes respectively;

and carrying out feature normalization processing on the path features respectively corresponding to the V path nodes to obtain node association probabilities between the V path nodes and the first path nodes respectively.

9. The method of claim 7, wherein the parameter adjusting the first initial parameter matrix and the second initial parameter matrix according to the node association probabilities respectively corresponding to the V path nodes and a second path node in the node pair comprises:

acquiring a target association probability of a second path node in the node pair from the node association probabilities respectively corresponding to the V path nodes;

generating a loss function according to the target association probability and the node association probability corresponding to the nodes of the rest paths, and performing parameter adjustment on the first initial parameter matrix and the second initial parameter matrix based on the loss function; the remaining path nodes refer to path nodes other than the first path node and the second path node in the node pair among the V path nodes.

10. The method of claim 1, wherein the clustering the V path nodes based on the node association features to obtain a set of nodes for area partitioning a network comprises:

acquiring node attribute types corresponding to the V path nodes, and acquiring node attribute characteristics respectively corresponding to the V path nodes in the node attribute types;

performing feature fusion on the node association features corresponding to the V path nodes respectively and the node attribute features corresponding to the V path nodes respectively to obtain node fusion features corresponding to the V path nodes respectively;

and clustering the V path nodes based on the node fusion characteristics corresponding to the V path nodes respectively to obtain a node set for carrying out region division on the road network.

11. The method of claim 1, wherein the clustering the V path nodes based on the node association features to obtain a set of nodes for area partitioning a network comprises:

based on the node association characteristics, acquiring node distances between the ith path node and the V path nodes respectively, determining the path node with the minimum node distance as the minimum adjacent node of the ith path node, and determining the minimum node distance as the reachable distance of the ith path node until the minimum adjacent node and the reachable distance corresponding to the V path nodes respectively are obtained; i is a positive integer less than or equal to V;

taking the V path nodes as tree nodes, and constructing tree edges between the V path nodes and the minimum adjacent nodes corresponding to the V path nodes respectively to construct a path node tree;

based on the reachable distances corresponding to the V path nodes respectively, the path node tree is segmented to obtain k sub-path node trees; k is a positive integer;

and forming a node set for carrying out region division on the road network by using the path nodes included in each sub-path node tree.

12. The method according to claim 11, wherein the obtaining node distances between the ith path node and the V path nodes respectively based on the node association characteristics comprises:

acquiring node association characteristics of an ith path node, characteristic distances between the node association characteristics and the node association characteristics corresponding to the V path nodes respectively, and acquiring a first node distance of the ith path node from the V characteristic distances until the first node distances corresponding to the V path nodes respectively are obtained;

determining the node distance between the ith path node and the p-th path node from the first node distance of the ith path node, the first node distance of the p-th path node and the characteristic distance between the node association characteristic of the ith path node and the node association characteristic of the p-th path node until the node distances between the ith path node and the V path nodes are obtained; p is a positive integer less than or equal to V.

13. The method according to claim 11, wherein the splitting the path node tree based on the reachable distances respectively corresponding to the V path nodes to obtain k sub-path node trees comprises:

sorting the reachable distances respectively corresponding to the V path nodes, sequentially splitting tree edges corresponding to the sorted reachable distances in the path node tree until the number of path nodes included in the split subtree is smaller than or equal to the minimum cluster size, and determining the subtree obtained when the number of path nodes is smaller than or equal to the minimum cluster size as a k-number sub-path node tree.

14. The method of claim 1, wherein the clustering the V path nodes based on the node association features to obtain a set of nodes for area partitioning a network comprises:

sorting the reachable distances respectively corresponding to the V path nodes from small to large by taking the V path nodes as subtree nodes, and sequentially constructing subtree edges between the path nodes corresponding to the sorted reachable distances and the minimum adjacent node until k subtree node trees consisting of the V path nodes are obtained; k is a positive integer, and the number of path nodes included in each sub-path node tree is greater than or equal to the minimum cluster size;

15. The method as claimed in claim 1, wherein said clustering said V path nodes based on said node association characteristics to obtain a node set for area partition of a network comprises:

acquiring k initial clustering centers, and acquiring initial clustering distances from the V path nodes to the k initial clustering centers respectively based on the node association characteristics;

dividing the V path nodes into initial sets corresponding to the k initial clustering centers on the basis of initial clustering distances from the V path nodes to the k initial clustering centers respectively;

acquiring updated clustering centers respectively corresponding to the k initial sets, and dividing the V path nodes into updated sets corresponding to the k updated clustering centers on the basis of updated clustering distances from the V path nodes to the updated clustering centers respectively corresponding to the k initial sets;

if the k updating sets do not meet the node clustering condition, determining the k updating sets as the k initial sets, and returning to execute the process of obtaining the updating clustering centers corresponding to the k initial sets respectively;

and if the k updating sets meet the node clustering condition, determining the k updating sets as node sets for carrying out region division on the road network.

16. The method of claim 1, wherein the clustering the V path nodes based on the node association features to obtain a set of nodes for area partitioning a network comprises:

acquiring path nodes to be processed from the V path nodes; the path node to be processed refers to a path node which is not subjected to node clustering processing;

acquiring the adjacency number of the path nodes positioned in the node neighborhood of the path node to be processed; the path node located in the node neighborhood of the path node to be processed refers to a path node of which the characteristic distance with the path node to be processed is smaller than or equal to the neighborhood radius; the characteristic distance refers to the distance between the node association characteristic of the corresponding path node and the node association characteristic of the path node to be processed;

if the adjacent quantity is larger than or equal to the minimum set node number, performing node expansion based on the path nodes to be processed and the path nodes located in the node neighborhood of the path nodes to be processed to obtain density reachable nodes corresponding to the path nodes to be processed, forming a node set for performing region division on the road network by the path nodes to be processed and the density reachable nodes, and returning to execute the process of acquiring the path nodes to be processed from the V path nodes until the path nodes to be processed do not exist in the V path nodes;

and if the adjacency number is less than the minimum set node number, returning to execute the process of acquiring the path nodes to be processed from the V path nodes.

17. The method of claim 1, wherein the method further comprises:

responding to a path query request aiming at a starting path node to a terminating path node, and acquiring a target node set where the terminating path node is located; the starting path node belongs to the V path nodes, and the terminating path node belongs to the V path nodes;

acquiring path nodes included in the target node set, acquiring a first track path from the starting path node to the path nodes included in the target node set, and acquiring a second track path from the path nodes included in the target node set to the terminating path node;

and determining a target track path from the starting path node to the ending path node according to the first track path and the second track path.

18. A data processing apparatus, characterized in that the apparatus comprises:

the node acquisition module is used for acquiring V path nodes;

a sequence obtaining module, configured to obtain a sequence starting point from the V path nodes, and perform wandering traversal on the V path nodes based on the sequence starting point to obtain M path sequences composed of the V path nodes; v is a positive integer; m is a positive integer; the number of the sequence starting points is one or at least two;

a node pair obtaining module, configured to obtain node pairs from the M path sequences;

a feature determining module, configured to predict, based on a first path node in the node pair, node association probabilities between the V path nodes and the first path node, and determine, through the node association probabilities corresponding to the V path nodes respectively and a second path node in the node pair, node association features corresponding to the V path nodes respectively; the first path node and the second path node are in the same path sequence;

and the node clustering module is used for clustering the V path nodes based on the node association characteristics to obtain a node set used for carrying out region division on the road network.

19. A computer device comprising a processor, a memory, an input output interface;

the processor is connected to the memory and the input/output interface, respectively, wherein the input/output interface is configured to receive data and output data, the memory is configured to store a computer program, and the processor is configured to call the computer program to enable the computer device to perform the method according to any one of claims 1 to 17.

20. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program adapted to be loaded and executed by a processor, to cause a computer device having the processor to perform the method of any of claims 1-17.