CN111598114A

CN111598114A - Method for determining hidden state sequence and method for determining functional type of block

Info

Publication number: CN111598114A
Application number: CN201910127322.5A
Authority: CN
Inventors: 李勇; 夏彤; 金德鹏; 孙福宁
Original assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd; Tencent Dadi Tongtu Beijing Technology Co Ltd
Current assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd; Tencent Dadi Tongtu Beijing Technology Co Ltd
Priority date: 2019-02-20
Filing date: 2019-02-20
Publication date: 2020-08-28
Anticipated expiration: 2039-02-20
Also published as: CN111598114B

Abstract

The application relates to a method for determining a hidden state sequence, which comprises the steps of obtaining an observation sequence corresponding to a target block; determining the local probability that the target block is respectively in each hidden state of the hidden Markov model in each time slice covered by the observation sequence and determining reverse pointers respectively corresponding to each local probability based on the observation sequence, the initial state probability and the state transition probability corresponding to the target block in the hidden Markov model and the Gaussian distribution mean value and the Gaussian distribution variance which are jointly corresponding to each candidate block related to the hidden Markov model; determining the hidden state of the target block in the last time slice covered by the observation sequence based on the maximum local probability of the local probabilities of the target block in the hidden states in the last time slice; and the optimal path backtracking is carried out on the basis of the hidden state of the target block in the last time slice and each reverse pointer to obtain a hidden state sequence, so that the state transition condition of the block can be determined.

Description

Method for determining hidden state sequence and method for determining functional type of block

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a hidden state sequence, a method and an apparatus for determining a functional type of a street block, a computer-readable storage medium, and a computer device.

Background

With the development of computer technology, people increasingly model the urban area based on observation data corresponding to the urban area (such as behavior data of population activities in the urban area) so as to evaluate the population flow characteristics of the urban area.

Traditionally, a symbiotic relationship among activities such as "eating" is modeled among time, place, and population by characterizing Learning (such as Cross-modular recreation Learning), which usually occurs at a restaurant-type place during lunch hours or evening meals. As shown in fig. 1, after the model learns the symbiotic relationship among time, place, and activities of the population, one of the three items can be used to infer possible situations of the other two items, for example, the model can be queried at different times, so as to recover the change rule of the activities of the population in different places. However, the conventional method does not support determining the state transition situation of the block, and has certain limitations.

Disclosure of Invention

Based on this, it is necessary to provide a method and an apparatus for determining a hidden state sequence, a method and an apparatus for determining a function type of a neighborhood, a computer-readable storage medium, and a computer device, for solving the technical problem that the conventional technology does not support determining a state transition situation of the neighborhood.

A method of determining a sequence of hidden states, comprising:

acquiring an observation sequence corresponding to a target block;

determining local probabilities that the target block is respectively in each hidden state of the hidden Markov model in each time slice covered by the observation sequence based on the observation sequence, the initial state probability corresponding to the target block in the hidden Markov model, the state transition probability corresponding to the target block, the Gaussian distribution mean value corresponding to each candidate block related to the hidden Markov model, and the Gaussian distribution variance corresponding to each candidate block, and determining reverse pointers corresponding to each local probability;

determining the hidden state of the target block in the last time slice covered by the observation sequence based on the maximum local probability of the local probabilities of the target block in the hidden states in the last time slice;

and performing optimal path backtracking based on the hidden state of the target block in the last time slice and each back pointer to obtain a hidden state sequence.

A method for determining a functional type of a neighborhood, comprising:

acquiring observation sequences corresponding to all candidate blocks related to the hidden Markov model;

respectively determining local probabilities of the candidate blocks in hidden states of the hidden Markov model in time slices covered by the observation sequence based on initial state probabilities respectively corresponding to the candidate blocks, state transition probabilities respectively corresponding to the candidate blocks, mean Gaussian distribution values commonly corresponding to the candidate blocks and variance Gaussian distribution values commonly corresponding to the candidate blocks in the hidden Markov model, and respectively determining reverse pointers respectively corresponding to the local probabilities based on the local probabilities;

respectively determining the hidden state of each candidate block in the last time slice covered by the observation sequence based on the maximum local probability of the local probabilities of each candidate block in the hidden state in the last time slice;

performing optimal path backtracking based on the hidden state of each candidate block in the last time slice and each back pointer to obtain a hidden state sequence corresponding to each candidate block;

and clustering is carried out on the basis of the hidden state sequences respectively corresponding to the candidate blocks, and the function types of the candidate blocks are respectively determined from the candidate function types on the basis of the clustering result.

An apparatus for determining a sequence of hidden states, comprising:

the first observation sequence acquisition module is used for acquiring an observation sequence corresponding to a target block;

a first intermediate parameter determining module, configured to determine, based on the observation sequence, an initial state probability corresponding to the target block in a hidden markov model, a state transition probability corresponding to the target block, a gaussian distribution mean value corresponding to each candidate block related to the hidden markov model in common, and a gaussian distribution variance corresponding to each candidate block in common, local probabilities that the target block is in each hidden state of the hidden markov model in each time slice covered by the observation sequence, respectively, and determine reverse pointers corresponding to each local probability, respectively;

a first end hidden state determining module, configured to determine, based on a maximum local probability of local probabilities that the target block is in each hidden state in a last time slice covered by the observation sequence, a hidden state in which the target block is located in the last time slice;

and the first hidden state sequence determining module is used for performing optimal path backtracking on the basis of the hidden state of the target block in the last time slice and each back pointer to obtain a hidden state sequence.

An apparatus for determining a functional type of a neighborhood, comprising:

the second observation sequence acquisition module is used for acquiring observation sequences corresponding to the candidate blocks related to the hidden Markov model;

a second intermediate parameter determining module, configured to determine, based on an initial state probability corresponding to each candidate block in the hidden markov model, a state transition probability corresponding to each candidate block, a gaussian distribution mean value corresponding to each candidate block, and a gaussian distribution variance corresponding to each candidate block, local probabilities that each candidate block is in each hidden state of the hidden markov model in each time slice covered by the observation sequence, respectively, and determine, based on each local probability, reverse pointers corresponding to each local probability, respectively;

a second end hidden state determining module, configured to determine, based on a maximum local probability of local probabilities of the candidate blocks being in the hidden states in a last time slice covered by the observation sequence, a hidden state of the candidate blocks in the last time slice;

a second hidden state sequence determining module, configured to perform optimal path backtracking based on the hidden state of each candidate block in the last time slice and each backward pointer, so as to obtain a hidden state sequence corresponding to each candidate block;

and the function type determining module is used for clustering based on the hidden state sequences respectively corresponding to the candidate blocks and respectively determining the function type of each candidate block from the candidate function types based on the clustering result.

A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method as described above.

Based on the scheme, the local probabilities of the hidden states of the hidden markov model of the target block in the time slices covered by the observation sequence are determined based on the observation sequence corresponding to the target block, the initial state probability corresponding to the target block in the hidden markov model, the state transition probability corresponding to the target block, the mean gaussian distribution commonly corresponding to the candidate blocks related to the hidden markov model and the variance of the gaussian distribution commonly corresponding to the candidate blocks, and the corresponding hidden state sequence is obtained according to the local probabilities. Therefore, the hidden state sequence corresponding to the block is obtained through the hidden Markov model, the state transition condition of the block can be determined, and the limitation in the traditional mode is broken.

Drawings

FIG. 1 is a modeling result based on characterization learning in the conventional art;

FIG. 2 is a diagram of an application environment for determination of a hidden state sequence in one embodiment;

FIG. 3 is a flow diagram illustrating the determination of a hidden state sequence in one embodiment;

FIG. 4 is a diagram illustrating street blocks within a central city of Beijing city as determined from road networks in one embodiment;

FIG. 5 is a diagram illustrating a mapping between hidden states and active behavior features in one embodiment;

FIG. 6 is a diagram illustrating an observation sequence corresponding to a target block in one embodiment;

FIG. 7 is a schematic flow chart diagram illustrating a method for training a hidden Markov model in one embodiment;

FIG. 8 is a flowchart illustrating a method for determining a functional type of a neighborhood according to an embodiment;

FIG. 9 is a diagram illustrating hidden states corresponding to a street block under actual test according to an embodiment;

FIG. 10 is a diagram illustrating the distribution of functions in a neighborhood of a city under actual test in one embodiment;

FIG. 11 is a graphical illustration of the predicted outcome of population mobility behavior in actual testing in one embodiment;

FIG. 12 is a block diagram showing the structure of a hidden state sequence determining apparatus according to an embodiment;

FIG. 13 is a block diagram showing the configuration of a function type determining apparatus of a neighborhood in one embodiment;

FIG. 14 is a block diagram showing a configuration of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In this document, the term "above" is understood to include the present numbers with respect to the description of the numerical ranges, such as "two or more" meaning equal to or greater than two.

The method for determining the hidden state sequence provided by the embodiments of the present application can be applied to the application environment shown in fig. 2. The application environment may relate to the terminal 210 and the server 220, and the terminal 210 and the server 220 may be connected through a network.

Specifically, the terminal 210 acquires an observation sequence corresponding to the target block and sends the observation sequence to the server 220. The server 220 receives an observation sequence corresponding to a target block; further, based on the observation sequence, the initial state probability corresponding to the target block in the hidden Markov model, the state transition probability corresponding to the target block, the Gaussian distribution mean value corresponding to all candidate blocks related to the hidden Markov model, and the Gaussian distribution variance corresponding to all candidate blocks, the local probability that the target block is in each hidden state of the hidden Markov model in each time slice covered by the observation sequence is determined, and a reverse pointer corresponding to each local probability is determined; determining the hidden state of the target block in the last time slice covered by the observation sequence based on the maximum local probability of the local probabilities of the target block in the hidden states in the last time slice; and then, performing optimal path backtracking based on the hidden state of the target block in the last time slice and each back pointer to obtain a hidden state sequence.

In other embodiments, the terminal 210 may also independently perform the above-mentioned series of steps from obtaining the observation sequence corresponding to the target block to obtaining the hidden state sequence, without the participation of the server 220.

The terminal 210 may specifically include at least one of a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, and the like, but is not limited thereto. Server 220 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in FIG. 3, a method of determining a sequence of hidden states is provided. The method is applied to a computer device (such as the terminal 210 or the server 220 in fig. 2) for example. The method may include the following steps S302 to S308.

S302, acquiring an observation sequence corresponding to the target block.

A neighborhood, is a polygonal geographic area bounded by the geographic boundaries of streets. Specifically, the geographic boundaries of streets may be extracted from a road network of a city, such that blocks within the city are determined based on the extracted geographic boundaries. As shown in fig. 4, these are 665 blocks in the central city of beijing city determined from the secondary road network of beijing city. It can be understood that the road network is a natural division of basic geographic units of human activities in a city, and the street zones determined by the road network are often more single in function, and people living in the same street zone have a similar life mode.

Correspondingly, the target block is a block which needs to determine a hidden state sequence corresponding to the observation sequence based on the corresponding observation sequence through the hidden markov model. In this embodiment, a hidden markov model may be trained in advance based on each observation sequence corresponding to each of two or more blocks, the two or more blocks may be candidate blocks related to the hidden markov model, and the target block may be selected from each candidate block related to the hidden markov model. For example, a hidden markov model is trained based on the observation sequences corresponding to 665 blocks in the central city of beijing city shown in fig. 4, the 665 blocks are candidate blocks related to the hidden markov model, and the target block can be selected from the 665 blocks.

The observation sequence corresponding to the target block may include demographic data of the target block in more than two time slices. The demographic data may relate to activity behavior characteristics corresponding to the demographic behavior Of the neighborhood, such as population movement numbers and access frequency to a predetermined type Of Point Of Interest (POI). Specifically, the population floating number may include the number of people moving in, the number of people staying, and the number of people moving out; the predetermined types of points of interest may include at least one of 9 types of restaurants, companies, institutions, shopping, services, attractions, entertainment, education, and residences, such as 4 types of restaurants, education, attractions, and residences, and 9 types of restaurants, companies, institutions, shopping, services, attractions, entertainment, education, and residences.

Assuming that the target block is the r-th candidate block related to the hidden markov model, the observation sequence corresponding to the target block can be represented as O_r＝{O_r,1,O_r,2,O_r,3,...,O_r,NAnd N represents the total number of time slices covered by the observation sequence corresponding to the r-th candidate block. O is_r,nAnd (3) showing the demographic activity data of the nth candidate block in the nth time slice, wherein N is 1,2,3, … and N. For example, for the 4 th month in 2018, a total of 30 days, each day is divided into 24 time slices at intervals of 1 hour, 720 time slices are determined, if the population activity data in the 720 time slices of the r-th block form an observation sequence, N is equal to 720, and the observation sequence can be represented as O_r＝{O_r,1,O_r,2,O_r,3,...,O_r,720}。

And, O_r,n＝{O_r,n,1,O_r,n,2,O_r,n,3,...,O_r,n,MAnd M represents the total number of observation features covered by observation data in the observation sequence corresponding to the r-th block. O is_r,n,mAnd (3) showing the demographic activity data of the nth candidate block in the nth time slice, wherein the demographic activity data corresponds to the mth activity behavior characteristic, and M is 1,2,3, … and M. For example, demographic data in the observation sequence corresponding to the r-th block relates toThe behavior characteristics of each activity are respectively as follows: the number of people moving in, staying at, moving out, frequency of visits to points of interest such as restaurants, frequency of visits to points of interest such as education, frequency of visits to points of interest such as attractions, and frequency of visits to points of interest such as residences, i.e. a total of 7 activity characteristics are involved, then M equals 7. For another example, the activity behavior characteristics related to the demographic activity data in the observation sequence corresponding to the r-th block are respectively: the number of people moved in, the number of people stopped, the number of people moved out, the frequency of access to 9 types of points of interest, company, organization, shopping, services, attractions, entertainment, education and home, i.e. 12 activity features in total are involved, then M equals 12.

S304, based on the observation sequence, the initial state probability corresponding to the target block in the hidden Markov model, the state transition probability corresponding to the target block, the Gaussian distribution mean value corresponding to all candidate blocks related to the hidden Markov model, and the Gaussian distribution variance corresponding to all candidate blocks, determining the local probability that the target block is respectively in each hidden state of the hidden Markov model in each time slice covered by the observation sequence, and determining the reverse pointer corresponding to each local probability.

In this embodiment, the model parameters of the hidden markov model may include initial state probabilities pi corresponding to candidate blocks related to the hidden markov model respectively_rAnd state transition probabilities A corresponding to the candidate blocks_rThe mean value μ of the gaussian distribution commonly corresponding to each candidate block, and the variance σ of the gaussian distribution commonly corresponding to each candidate block. That is, the model parameters of the hidden markov model can be expressed as θ ═ { pi ═_r,A_rμ, σ }, R ═ 1,2,3, …, and R, R is the total number of candidate blocks involved in the hidden markov model.

For example, a hidden markov model is trained in advance based on each observation sequence corresponding to 3 blocks, where the 3 blocks are candidate blocks related to the hidden markov model, and the model parameters of the hidden markov model may include: initial state probability pi corresponding to 1 st candidate block₁And a state transition probability A corresponding to the 1 st candidate block₁And an initial state probability pi corresponding to the 2 nd candidate block₂And a state transition probability A corresponding to the 2 nd candidate block₂And an initial state probability pi corresponding to the 3 rd candidate block₃And a state transition probability A corresponding to the 3 rd candidate block₃A gaussian distribution mean μ corresponding to the 3 candidate blocks, and a gaussian distribution variance σ corresponding to the 3 candidate blocks.

Supposing that the r candidate block related to the hidden Markov model is determined as the target block, and the initial state probability corresponding to the target block is the initial state probability pi corresponding to the r candidate block_rIt may include: and the probability that the r candidate block is respectively in each hidden state of the hidden Markov model in the 1 st time slice covered by the observation sequence. In particular, n_rCan be represented by a matrix of 1 × K, i.e., pi_r＝[π_r,1π_r,2π_r,3... π_r,KAnd | K represents the total number of hidden states of the hidden markov model. Pi_r，kRepresents the probability that the nth candidate block is in the kth hidden state of the hidden markov model in the 1 st time slice, K is 1,2,3, …, K.

The state transition probability corresponding to the target block is the state transition probability A corresponding to the r-th candidate block_rIt may include: probability that the r-th candidate block is transferred between every two hidden states of the hidden Markov model. Specifically, A_rCan be represented as a matrix of K × K as follows, namely:

wherein A is_r，j，kRepresents the probability of the r-th candidate block transitioning from the j-th hidden state to the K-th hidden state, K being 1,2,3, …, K, j being 1,2,3, …, K representing the total number of hidden states of the hidden markov model.

The gaussian distribution mean μ that corresponds to each candidate block involved in the hidden markov model may include: and under the condition that the candidate block is in each hidden state of the hidden Markov model, respectively generating a mean value of Gaussian distribution obeyed by the probability of each activity behavior characteristic related to the population activity data in the observation sequence. Specifically, μ can be represented in a K × M matrix as follows, i.e.:

wherein, mu_k，mAnd the mean value of the gaussian distribution which represents the probability of generating the mth activity behavior feature under the condition that each candidate block is in the kth hidden state of the hidden markov model, wherein K is 1,2,3, …, K, M is 1,2,3, …, M, K is the total number of the hidden states of the hidden markov model, and M is the total number of the activity behavior features related to the population activity data in the observation sequence.

The gaussian distribution variance σ corresponding to each candidate block related to the hidden markov model includes: and under the condition that the candidate block is in each hidden state of the hidden Markov model, respectively generating the variance of Gaussian distribution obeyed by the probability of each activity behavior characteristic related to the population activity data in the observation sequence. Like μ, σ can be represented as a K × M matrix as follows, i.e.:

wherein σ_k,mAnd the variance of the Gaussian distribution which represents the probability of generating the mth activity behavior feature under the condition that each candidate block is in the kth hidden state of the hidden Markov model, wherein K is 1,2,3, …, K, M is 1,2,3, …, M, K is the total number of the hidden states of the hidden Markov model, and M is the total number of the activity behavior features related to the population activity data in the observation sequence.

Hidden states are parameters that can be used to characterize the demographic characteristics of the neighborhood, among other things. In particular, the hidden states may be used to characterize the population density, population flow, and population activity type of a neighborhood. The population density and the population flow can be represented by the population flowing number of the block, such as the number of people moving in, the number of people moving out and the number of people staying in, and the population activity type can be represented by the access frequency aiming at the predetermined type of interest points. Hidden states of the hidden markov model can be preset based on actual requirements, such as setting the hidden states of the hidden markov model as 100 hidden states (State _1 to State _100) shown in fig. 5, respectively, and for any one of the hidden states of State _1 to State _100, the hidden State has 12 characteristic parameters, and the 12 characteristic parameters are respectively used for representing the number of people moving into the street (architectural), the number of people moving out (Leaving), the number of people Staying at the residence (stable), the access frequency for interest points such as restaurants (Restaurant), the access frequency for interest points such as Company (Company), the access frequency for interest points such as organization (Agency), the access frequency for interest points such as Shopping (Shopping), the access frequency for interest points such as Service (Service), the access frequency for interest points such as scenic spots (enterprise), the access frequency for interest points such as scenic spots (events), the access frequency for interest points such as actions, Frequency of access to points of interest such as Education (Education), and frequency of access to points of interest such as Residence (Residence).

For the hidden markov model, when the observation sequence and the model parameters of the hidden markov model are known, decoding can be performed by a Viterbi algorithm (i.e., Viterbi algorithm), so as to determine the hidden state sequence corresponding to the observation sequence.

The decoding process is a recursive calculation process, that is, for each time slice covered by the observation sequence corresponding to the target block, based on the emission probability of the population activity data in the time slice in the observation sequence corresponding to the target block under the condition that the target block is in the target hidden state of the hidden markov model in the time slice, the state transition probability corresponding to the target block in the hidden markov model, and the local probabilities that the target block is in the hidden states of the hidden markov model in the previous time slice adjacent to the time slice, the local probability that the target block is in the target hidden state in the time slice is determined. It can be understood that, the hidden states of the hidden markov model are sequentially used as target hidden states, so that the local probabilities that the target block is respectively in the hidden states of the hidden markov model in the time slices covered by the corresponding observation sequences can be determined.

Specifically, the local probability that the target block is in the kth hidden state of the hidden markov model in the nth time slice may refer to a maximum value among probabilities corresponding to all state transition paths of the target block in the kth hidden state in the nth time slice, and may be recorded as a maximum value_n(k)。

In addition, the local probability that the target block is in the k hidden state of the hidden Markov model in the nth time slice can be calculated by the following formula_n(k)：

Wherein the content of the first and second substances,_n-1(j) representing the local probability that the target block is in the jth hidden state of the hidden Markov model in the (n-1) th time slice; a. the_r,j,kRepresenting the probability of transitioning from the jth hidden state to the kth hidden state;

to represent_n-1(1)A_r,1,k、_n-1(2)A_r,2,k、_n-1(3)A_r,3,k…, and_n-1(K)A_r,K,kmaximum value of (1);

representing the emission probability of the population activity data in the nth time slice in the observation sequence corresponding to the target block generated under the condition that the target block is in the kth hidden state in the nth time slice; k represents the total number of hidden states of the hidden markov model.

It can be understood that, for the 1 st time slice, there is no time slice adjacent to the 1 st time slice, so the target street can be obtained by initializing according to the following formulaLocal probability of region being in kth hidden state of hidden Markov model in 1 st time slice₁(k)：

Wherein, pi_r,kRepresenting the probability that the target block is in the kth hidden state of the hidden Markov model in the 1 st time slice;

and the emission probability of the population activity data in the 1 st time slice in the observation sequence corresponding to the target block is generated under the condition that the target block is in the kth hidden state in the 1 st time slice.

Obtained by initialization₁(k) Then, the above formula can be passed

Recursion to obtain₂(k)、₃(k) …, and_N(k) and N represents the total number of time slices covered by the observation sequence corresponding to the target block.

After the local probabilities of the hidden states of the hidden markov model of the target block in the time slices covered by the observation sequence are determined, the corresponding reverse pointers can be determined based on the local probabilities. The local probabilities and backpointers may have a one-to-one correspondence. Local probability of being in k hidden state of hidden Markov model in n time slice_n(k) The corresponding back pointer can refer to the hidden state of the n-1 th node in the state transition path which makes the probability of the target block being in the kth hidden state in the nth time slice be the maximum, and can be written as psi_n(k)。

Specifically, ψ can be calculated by the following formula_n(k)：

Wherein the argmax operator is used to determine if the middle brackets are inIs (i.e. the expression of_n-1(j)A_r,j,k) The index j with the largest value. In addition, the parameters are used here_n-1(j)、A_r,j,kAnd K is the same as the definition in the previous text, and the description is omitted.

It should be noted that the local probability of the kth hidden state of the hidden markov model in the 1 st time slice₁(k) Corresponding backpointer psi₁(k) Not of practical significance, so psi can be adjusted₁(k) Set to 0, i.e. #₁(1)、ψ₁(2)、ψ₁(3) And psi₁(K) Can be set to 0.

In addition, the local probability that the target block is in each hidden state of the hidden markov model in each time slice covered by the observation sequence and each backward pointer corresponding to each local probability can be represented by a matrix D, where K represents the total number of each hidden state of the hidden markov model, and N represents the total number of each time slice covered by the observation sequence corresponding to the target block.

S306, based on the maximum local probability of the local probabilities of the target block respectively in the hidden states in the last time slice covered by the observation sequence, determining the hidden state of the target block in the last time slice.

The local probability of each hidden state in the last time slice covered by the observation sequence corresponding to the target block is the local probability of each hidden state in the Nth time slice, namely_N(1)、_N(2)、_N(3) …, and_N(K)。

the maximum local probability is the local probability with the maximum numerical value in the local probabilities in the hidden states in the last time slice covered by the observation sequence corresponding to the target block. It is assumed that,_N(1)、_N(2)、_N(3) …, and_N(K) the largest of the numerical values is_N(3)，_N(3) Is the most importantThe local probability is large.

In this embodiment, the hidden state corresponding to the maximum local probability may be determined as the hidden state of the target block in the last time slice. For example, from_N(1)、_N(2)、_N(3) …, and_N(K) in (1), determining_N(3) And if the probability is the maximum local probability corresponding to the target block, the 3 rd hidden state is the hidden state of the target block in the last time slice. As another example, from_N(1)、_N(2)、_N(3) …, and_N(K) in (1), determining_N(K) And if the probability is the maximum local probability corresponding to the target block, the Kth hidden state is the hidden state of the target block in the last time slice.

S308, optimal path backtracking is carried out on the basis of the hidden state of the target block in the last time slice and each back pointer, and a hidden state sequence is obtained.

Specifically, in the process of performing optimal path backtracking, the hidden state in the nth time slice in the hidden state sequence may be determined based on the following formula

Wherein the content of the first and second substances,

and representing the hidden state in the (n + 1) th time slice in the hidden state sequence corresponding to the target block.

It will be appreciated that the hidden state in the last time slice (i.e., the Nth time slice) is determined

Then, can pass through the formula

Determining the hidden position of the N-1 time sliceHidden state

And then pass through

Determining the hidden state in the N-2 th time slice, analogizing, and finally passing

And determining the hidden state in the 1 st time slice. Thereby, a hidden state sequence corresponding to the target block is obtained

In addition, the hidden state sequence corresponding to the target block can be used for characterizing the population activity characteristics of the target block in different time slices.

The method for determining the hidden state sequence determines the local probability that the target block is respectively in each hidden state of the hidden Markov model in each time slice covered by the observation sequence based on the observation sequence corresponding to the target block, the initial state probability corresponding to the target block in the hidden Markov model, the state transition probability corresponding to the target block, the Gaussian distribution mean value commonly corresponding to each candidate block related to the hidden Markov model and the Gaussian distribution variance commonly corresponding to each candidate block, and then obtains the corresponding hidden state sequence. Therefore, the hidden state sequence corresponding to the block is obtained through the hidden Markov model, the state transition condition of the block can be determined, and the limitation in the traditional mode is broken.

It should be noted that, a solution for modeling a symbiotic relationship among time, place, and human activities in a manner of characterizing learning (such as Cross-modular representation learning) in the conventional technology is provided. Besides the defect that the state transition situation of the neighborhood cannot be determined, the scheme cannot distinguish the states of different neighborhoods and cannot support parallel processing of data.

In contrast, the model parameters of the hidden markov model in the present application include initial state probabilities corresponding to the candidate blocks related to the hidden markov model, state transition probabilities corresponding to the candidate blocks, gaussian distribution means corresponding to the candidate blocks, and gaussian distribution variances corresponding to the candidate blocks, and the candidate blocks have the initial state probabilities and the state transition probabilities corresponding to each other, so that differences in state transition between the candidate blocks due to different function types of the candidate blocks, that is, states of different blocks can be differentiated.

In another embodiment, a hidden markov model corresponding to each block may be learned using an observation sequence corresponding to each block, and model parameters of the hidden markov model include an initial state probability corresponding to each block, a state transition probability corresponding to each block, and an observation probability corresponding to each block. However, this scheme also cannot reflect the difference in state transition between blocks due to the difference in the types of functions to which the blocks belong.

Alternatively, for each block, a hidden markov model corresponding to the block may be learned using the observation sequence corresponding to the block. However, learning a hidden markov model for each block respectively faces the problem of insufficient learning of the model due to sparse training data, and the hidden markov models are independent, so that the association between the blocks cannot be established.

However, in the present application, a plurality of blocks correspond to the same hidden markov model, and the model parameters of the hidden markov model include initial state probabilities corresponding to the blocks related to the hidden markov model, state transition probabilities corresponding to the blocks, a gaussian distribution mean corresponding to the blocks, and a gaussian distribution variance corresponding to the blocks. On one hand, each block has corresponding initial state probability and state transition probability, so that the difference of state transition between blocks due to different function types can be reflected; on the other hand, the observation sequences corresponding to the blocks are used for learning a hidden Markov model together, but the observation sequences corresponding to the blocks are not used for learning the hidden Markov models respectively, so that the problems that the training data are sparse, the model learning is insufficient, and the association between the blocks cannot be established are effectively solved.

In one embodiment, the step of obtaining the observation sequence corresponding to the target block, i.e. step S302, may include the following steps: acquiring an original observation sequence corresponding to a target block; the original observation sequence comprises original population activity data of a target block in more than two time slices, and activity behavior characteristics related to each original population activity data comprise population floating number and access frequency aiming at predetermined types of interest points; and carrying out maximum value normalization on the population flowing number in each original population activity data and the TF-IDF parameters corresponding to the access frequency aiming at the interest points of the preset type in each original population activity data to obtain an observation sequence corresponding to the target block.

Assuming that the target block is the r-th candidate block, the original observation sequence corresponding to the r-th candidate block can be represented as X_r＝{X_r,1,X_r,2,X_r,3,...,X_r,NAnd N represents the total number of time slices covered by the observation sequence corresponding to the r-th candidate block. X_r,nRepresenting the original population activity data of the nth candidate block in the nth time slice, wherein R is 1,2,3, …, R, N is 1,2,3, …, N, and R represents the total number of the candidate blocks involved by the hidden Markov model.

And, X_r,n＝{X_r,n,1,X_r,n,2,X_r,n,3,...,X_r,n,MM is the total number of activity behavior features to which the raw demographic activity data relates. X_r,n,mIndicating the original demographic data of the nth time slice of the nth candidate block relates toM activity behavior features, M ═ 1,2,3, …, M.

Suppose that the mth activity behavior feature X in the original demographic activity data of the nth candidate neighborhood in the nth time slice_r,n,mBelonging to a population floating number (e.g. X)_r,n,mThe number of persons moving in, staying in, or moving out), X can be calculated by the following formula_r,n,mCarrying out maximum value normalization to obtain a normalization result O_r,n,m：

Wherein the content of the first and second substances,

represents X_r,1,m、X_r,2,m、X_r,3,m…, and X_r,N,mMaximum value of (2).

Suppose that the mth activity behavior feature X in the original demographic activity data of the nth candidate neighborhood in the nth time slice_r,n,mBelonging to access frequencies (such as X) for predetermined types of points of interest_r,n,mBelonging to a frequency of access to a point of interest of the type restaurant, company, organization, shopping, service, attraction, entertainment, education, or home), X may be calculated first by the following formula_r,n,mCorresponding TF-IDF parameter Y_r,n,m：

Where F represents the total number of access frequencies for a predetermined type of point of interest, such as where each raw demographic data in the raw observation sequence corresponding to the r-th candidate block relates to access frequencies for 9 types of points of interest, restaurant, company, organization, shopping, services, attraction, entertainment, education, and home, then F equals 9. As another example, each original demographic data in the original observation sequence corresponding to the r-th candidate block relates to access frequencies for 4 types of points of interest, restaurant, education, attraction, and home, and F equals 4.

Furthermore, the TF-IDF parameter Y is calculated by the following formula_r,n,mCarrying out maximum value normalization to obtain a normalization result O_r,n,m：

Wherein the content of the first and second substances,

represents Y_r,1,m、Y_r,2,m、Y_r,3,m…, and Y_r,N,mMaximum value of (2).

It should be noted that, for the original observation sequence X corresponding to the r-th candidate block, the original observation sequence X is obtained_r＝{X_r,1,X_r,2,X_r,3,...,X_r,N}，X_r,n＝{X_r,n,1,X_r,n,2,X_r,n,3,...,X_r,n,MR1, 2,3, …, R, N1, 2,3, …, N. Suppose { X_r,n,1,X_r,n,2,X_r,n,3Belongs to population floating numbers, and { X }_r,n,4,X_r,n,5,X_r,n,6,...,X_r,n,MBelongs to access frequencies for predetermined types of points of interest. Then, respectively for X_r,n,1、X_r,n,2And X_r,n,3Carrying out maximum value normalization to obtain a normalization result O_r,n,1、O_r,n,2And O_r,n,3. And, calculating X_r,n,4、X_r,n,5、 X_r,n,6…, and X_r,n,MRespectively corresponding TF-IDF parameters Y_r,n,4、Y_r,n,5、Y_r,n,6…, and Y_r,n,MThen respectively to Y_r,n,4、Y_r,n,5、Y_r,n,6…, and Y_r,n,MCarrying out maximum value normalization to obtain a normalization result O_r,n,4、O_r,n,5、O_r,n,6…, and O_r,n,M. Thereby, the population activity data O of the nth time slice of the nth candidate block is obtained_r,n＝{O_r,n,1,O_r,n,2,O_r,n,3,...,O_r,n,M} and thenObtaining an observation sequence O corresponding to the r-th candidate block_r＝{O_r,1,O_r,2,O_r,3,...,O_r,N}。

In addition, as described with reference to an actual example, maximum normalization processing is performed on the population flow numbers (the number of people moving in, the number of people staying in, and the number of people moving out) related to the population activity data in the original observation sequence corresponding to the qinghua garden block in beijing, and the corresponding TF-IDF parameters are calculated for the access frequencies of the interest points of the predetermined type covered by the maximum normalization processing, and then the maximum normalization processing is performed on the calculated TF-IDF parameters, so that the observation sequence corresponding to the qinghua garden block as shown in fig. 6 can be obtained.

In one embodiment, the manner of determining the local probability that the target block is in any hidden state of the hidden markov model within any time slice covered by the observation sequence may comprise the following steps: determining the emission probability of population activity data in the time slice in the observation sequence generated under the condition that the target block is in the hidden state in the time slice based on the population activity data in the time slice in the observation sequence, the Gaussian distribution mean value which is jointly corresponding to each candidate block related to the hidden Markov model and the Gaussian distribution variance which is jointly corresponding to each candidate block; and determining the local probability of the target block in the hidden state in the time slice based on the local probability of each hidden state of the target block in the hidden Markov model in the previous time slice adjacent to the time slice, the state transition probability corresponding to the target block in the hidden Markov model and the emission probability.

It should be noted that, the local probability that the target block is in the hidden state in the first time slice covered by the observation sequence is determined based on the probability corresponding to the hidden state in the initial state probability corresponding to the target block and the emission probability of the demographic activity data generated in the first time slice in the observation sequence under the condition that the target block is in the hidden state in the first time slice.

In one embodiment, the step of determining the emission probability of the target block in the observation sequence under the condition that the target block is in the hidden state in the time slice based on the population activity data in the time slice in the observation sequence, the gaussian distribution mean value commonly corresponding to the candidate blocks related by the hidden markov model, and the gaussian distribution variance commonly corresponding to the candidate blocks may include the following steps: and determining the emission probability of the target block generating the population activity data under the condition that the target block is in the hidden state of the hidden Markov model in the time slice based on the variance of the Gaussian distribution obeyed by the probability of each activity behavior feature related to the population activity data generating the observation sequence under the condition of the hidden state, the mean value of the Gaussian distribution obeyed by the probability of each activity behavior feature related to the population activity data generating the observation sequence under the condition that the target block is in the hidden state of the hidden Markov model in the observation sequence, and the population activity data of the target block in the time slice.

Specifically, the emission probability of the target block generating the population activity data in the nth time slice in the observation sequence under the condition that the target block is in the kth hidden state in the nth time slice can be calculated by the following formula

Wherein o is_r,n,mRepresenting the mth activity behavior characteristic related to the population activity data of the target block in the nth time slice; mu.s_k,mMeans representing a gaussian distribution obeyed by the probability of producing the mth active behavior feature under the condition of being in the kth hidden state; sigma_k,mRepresenting a variance of a gaussian distribution obeyed by the probability of producing the mth active behavior feature under the condition of being in the kth hidden state; m represents the total number of activity behavior features to which each of the oral activity data in the observation sequence relates.

In one embodiment, after the step of determining the hidden state of the target block in the last time slice, the method further comprises the following steps: and predicting the hidden state of the target block in the time slice next to the last time slice based on the hidden state of the target block in the last time slice covered by the observation sequence and the state transition probability corresponding to the target block in the hidden Markov model.

It can be understood that the model parameters of the hidden markov model include the state transition probability corresponding to the target block, that is, the state transition probability corresponding to the target block is determined given the hidden markov model. As described above, the state transition probability corresponding to the target block includes the probability that the target block transitions between every two hidden states of the hidden markov model, and thus the hidden state of the target block in the last time slice covered by the observation sequence corresponding to the target block is determined

Thereafter, the slave hidden state can be determined based on the state transition probability corresponding to the target block

Starting to carry out the maximum transition probability of the state transition, thereby taking the hidden state corresponding to the maximum transition probability as the hidden state of the target block in the time slice next to the last time slice covered by the observation sequence.

For example, the state transition probability A corresponding to the target neighborhood_rExpressed in a matrix that assumes the hidden state of the target block in the last time slice covered by its corresponding observation sequence

In the 3 rd hidden state, the slave A_r,3,1、A_r,3,2、A_r,3,3…, and A_r,3,KDetermining the probability of the maximum value as the hidden state

And starting the maximum transition probability of the state transition. In addition, the method can be used for producing a composite materialLet A be_r,3,2Is A_r,3,1、A_r,3,2、A_r,3,3…, and A_r,3,KThe probability that the median value is the largest can be regarded as the hidden state of the target block in the time slice next to the last time slice covered by the observation sequence corresponding to the target block.

In one embodiment, after the step of predicting the hidden state of the target block in the time slice next to the last time slice, the method further comprises the following steps: and predicting the population activity data of the target block in the next time slice of the last time slice based on the hidden state of the target block in the next time slice of the last time slice and the Gaussian distribution mean value in the hidden Markov model.

It is understood that the hidden markov model includes the following model parameters: the gaussian distribution mean value corresponding to each candidate block related to the hidden markov model is determined under the condition of the given hidden markov model. As described above, the mean gaussian distribution may include a mean gaussian distribution to which the probability of each activity behavior feature related to the demographic activity data in the observation sequence is respectively generated under the condition that the target block is in each hidden state of the hidden markov model, and thereby the hidden state of the target block in the next time slice of the last time slice is determined

Then, the target block in the mean of the Gaussian distribution can be in the hidden state

Respectively generating the mean value of Gaussian distribution obeyed by the probability of each activity behavior characteristic related to the population activity data in the observation sequence corresponding to the target block,as demographic activity data for the target block in the time slice next to the last time slice.

For example, the mean value μ of the gaussian distribution commonly corresponding to each candidate block involved in the hidden markov model is represented by the following matrix, assuming that the hidden state of the target block in the last time slice covered by the observation sequence corresponding to the target block is determined

For the 3 rd hidden state, { μ } may be_3,1，μ_3,2，μ_3,3，...，μ_3,MAnd M is the total number of activity behavior characteristics related to the population activity data in the observation sequence corresponding to the candidate block.

In one embodiment, as shown in fig. 7, the hidden markov model training method may include the following steps S702 to S710.

S702, acquiring observation sequences corresponding to the candidate blocks respectively.

S704, in the current iteration, the current intermediate state probability and the current intermediate state transition probability corresponding to each candidate block are determined based on the observation sequence corresponding to each candidate block, the initial state probability and the state transition probability corresponding to each candidate block determined last time, the Gaussian distribution mean value corresponding to each candidate block together, and the Gaussian distribution variance corresponding to each candidate block together.

S706, determining the current initial state probability of each candidate block based on each current intermediate state probability, and determining the current state transition probability of each candidate block based on each current intermediate state transition probability.

And S708, based on the current intermediate state probabilities, determining a current Gaussian distribution mean value and a current Gaussian distribution variance which are jointly corresponding to the candidate blocks.

And S710, when the iteration termination condition is met, obtaining a hidden Markov model based on the initial state probability and the state transition probability respectively corresponding to each candidate block determined for the last time and the Gaussian distribution mean value and the Gaussian distribution variance jointly corresponding to each candidate block.

For hidden markov models, under the condition that the observation sequence is known, model parameters can be learned through a Baum-Welch algorithm (namely a Baum-Welch algorithm), so that the hidden markov model is determined.

The process of training the hidden markov model (i.e., the process of learning the model parameters of the hidden markov model) with a known sequence of observations is an iterative process. Specifically, in each iteration, for each candidate block, based on the observation sequence corresponding to the candidate block, the initial state probability and the state transition probability corresponding to the candidate block determined last time, and the gaussian distribution mean and the gaussian distribution variance commonly corresponding to the candidate blocks related to the hidden markov model, the current intermediate state probability and the current intermediate state transition probability corresponding to the candidate block are determined, so as to determine the current intermediate state probabilities respectively corresponding to the candidate blocks and the current intermediate state transition probabilities respectively corresponding to the candidate blocks.

Wherein, the current intermediate state probability corresponding to the r-th candidate block can be expressed as gamma_r。γ_rCan be represented as a matrix of N × K as follows:

wherein the content of the first and second substances,

and the probability that the nth candidate block is in the kth hidden state in the nth time slice is shown, wherein N is 1,2,3, …, N, K is 1,2,3, …, K, N is the total number of the time slices covered by the observation sequence corresponding to the nth candidate block, and K is the total number of the hidden states of the hidden Markov model.

The current intermediate state transition probability corresponding to the r-th candidate block may be represented as ξ_r。ξ_rCan be represented as a matrix of P × K, where P ═ K (N-1), i.e.:

wherein the content of the first and second substances,

the probability that the nth candidate block is in the jth hidden state in the (N-1) th time slice and the kth hidden state in the nth time slice is shown, N is 2,3,4, …, N, K is 1,2,3, …, K, j is 1,2,3, …, K, N is the total number of the time slices covered by the observation sequence corresponding to the nth candidate block, and K is the total number of the hidden states of the hidden markov model.

In addition, the probability that the nth candidate block is in the kth hidden state in the nth time slice can be calculated through the following formula

Wherein the content of the first and second substances,

representing the forward probability corresponding to the condition that the nth candidate block is in the kth hidden state in the nth time slice, that is, under the condition that the nth candidate block is in the kth hidden state in the nth time slice, generating corresponding population activity data (i.e., { O }) in the observation sequence corresponding to the nth candidate block corresponding to each time slice before the nth time slice_r,1,O_r,2,O_r,3,...,O_r,n});

the backward probability of the nth candidate block under the condition of being in the kth hidden state in the nth time slice is represented, that is, the corresponding population activity data in the observation sequence corresponding to the nth candidate block is generated in each time slice after the nth time slice under the condition that the nth candidate block is in the kth hidden state in the nth time slice (that is, { O }_r,n+1,O_r,n+2,O_r,n+3,...,O_r,N}), wherein N is the total number of time slices covered by the observation sequence corresponding to the r-th candidate block;

representing the forward probability corresponding to the kth candidate block under the condition that the nth candidate block is in the kth hidden state in the nth time slice, that is, generating corresponding population activity data (i.e., { O }) in the observation sequence corresponding to the kth candidate block in the nth time slice and time slices before the nth time slice under the condition that the nth candidate block is in the kth hidden state in the nth time slice_r,1,O_r,2,O_r,3,...,O_r,_N}).

The probability that the nth candidate block is in the jth hidden state in the (n-1) th time slice and in the kth hidden state in the nth time slice can be calculated by the following formula

Wherein the content of the first and second substances,

representing the forward probability corresponding to the condition that the nth candidate block is in the jth hidden state in the nth-1 time slice, i.e. under the condition that the nth candidate block is in the jth hidden state in the nth-1 time slice, the r th time slice and each time slice before the nth-1 time slice correspondingly generate the r th time sliceCorresponding observation data (i.e. O) in observation sequence corresponding to each candidate block_r,1,O_r,2,O_r,3,...,O_r,n-1) The probability of (d); a. the_r,j,k ^(t)Representing the probability that the determined r-th candidate block is transferred from the j-th hidden state to the k-th hidden state;

indicating that the nth candidate block is in the kth hidden state in the nth time slice, generating the population activity data (namely { O) in the nth time slice in the observation sequence corresponding to the nth candidate block_r,n,1,O_r,n,2,O_r,n,3,...,O_r,n,MM is the total number of activity behavior features involved in the demographic activity data in the observation sequence); for parameters herein

And

the definitions of (a) and (b) may be the same as the previous definitions, and are not repeated herein.

In addition, the emission probability of the population activity data in the nth time slice in the observation sequence corresponding to the r candidate block is generated under the condition that the r candidate block is in the kth hidden state in the nth time slice can be calculated through the following formula

It should be noted that, here, the parameter σ is set_k,m、μ_k,m、o_r,n,mAnd M may be the same as the above definition, and are not repeated herein.

In addition, the kth hidden shape of the nth candidate block in the nth time slice can be calculated by the following formulaForward probability corresponding to state condition

Wherein the content of the first and second substances,

representing the forward probability corresponding to the condition that the nth candidate block is in the kth hidden state in the (n-1) th time slice, namely under the condition that the nth candidate block is in the kth hidden state in the (n-1) th time slice, respectively generating corresponding population activity data (namely O) in the observation sequence corresponding to the nth candidate block in the (n-1) th time slice and each time slice before the (n-1) th time slice_r,₁,O_r,2,O_r,3,...,O_r,n-1) The probability of (c). For parameters herein

And K can be the same as the above definition, and are not repeated herein.

And the forward probability corresponding to the condition that the kth candidate block is in the kth hidden state in the 1 st time slice

Wherein, pi_r,k ^(t)Representing the probability that the determined r candidate block is in the k hidden state in the 1 st time slice;

indicating that the r candidate block is in the k hidden state in the 1 st time slice, generating the population activity data (namely { O) } in the 1 st time slice in the observation sequence corresponding to the r candidate block_r,1,1,O_r,1,2,O_r,1,3,...,O_r,1,M}).

In addition, the backward probability corresponding to the condition that the nth candidate block is in the kth hidden state in the nth time slice can be calculated through the following formula

Wherein the content of the first and second substances,

the backward probability corresponding to the condition that the nth candidate block is in the kth hidden state in the (n + 1) th time slice is represented, that is, the population activity data (i.e., { O } corresponding to the nth candidate block in the observation sequence corresponding to the nth candidate block is generated in each time slice after the (n + 1) th time slice under the condition that the nth candidate block is in the kth hidden state in the (n + 1) th time slice_r,n+2,O_r,n+3,O_r,n+4,...,O_r,N}), wherein N is the total number of time slices covered by the observation sequence corresponding to the r-th candidate block;

indicating that the r candidate block is in the k hidden state in the (n + 1) th time slice, generating the population activity data (namely { O } in the (n + 1) th time slice in the observation sequence corresponding to the r candidate block_r,n+1,1,O_r,n+1,2,O_r,n+1,3,...,O_r,n+1,MM is the total number of activity behavior characteristics related to the population activity data in the observation sequence corresponding to the r-th candidate block); p(s)_r,n|s_r,n+1) Indicating that the r-th candidate block is in a hidden state s in the n-th time slice_r,nAnd is in a hidden state s in the (n + 1) th time slice_r,n+1The probability of (c).

And, the r-th candidate block is in the k-th hidden state in the last time slice (i.e. the nth time slice) covered by the corresponding observation sequenceCorresponding backward probability

It should be noted that, in each iteration, the current intermediate state probabilities and the current intermediate state transition probabilities corresponding to the candidate blocks need to be calculated based on the model parameters determined last time. However, for the first determination of the model parameters in the first iteration, there are no last determined model parameters, so before performing the first determination of the model parameters in the first iteration, the model parameters of the hidden markov model may be initialized to obtain initial values of the model parameters (the initial values of the model parameters may be represented as θ)⁽⁰⁾＝{π_r ⁽⁰⁾,A_r ⁽⁰⁾,μ⁽⁰⁾,σ⁽⁰⁾}). And then, when the model parameters are determined for the first time in the first iteration, calculating to obtain the current intermediate state probabilities and the current intermediate state transition probabilities corresponding to the candidate blocks respectively based on the initial values of the model parameters.

In a current iteration, after determining current intermediate state probabilities and current intermediate state transition probabilities respectively corresponding to candidate blocks related to a hidden markov model once, for each candidate block, determining a current initial state probability of the candidate block based on the current intermediate state probability corresponding to the candidate block, and determining a current state transition probability of the candidate block based on the current intermediate state transition probability corresponding to the candidate block, thereby determining current initial state probabilities respectively corresponding to the candidate blocks once and current state transition probabilities respectively corresponding to the candidate blocks once.

Specifically, the current intermediate state probability pi corresponding to the r-th candidate block_r ^(t+1)The method can comprise the following steps: the current probability of the r-th candidate block being in each hidden state of the hidden Markov model in the 1 st time slice, i.e. pi_r,k ^(t+1)K is 1,2,3, …, K being the total number of hidden states of the hidden markov model.

Can pass throughThe current probability pi of the kth hidden state of the hidden Markov model of the kth candidate block in the 1 st time slice is calculated according to the following formula_r,k ^(t+1)：

Wherein the content of the first and second substances,

the probability that the kth candidate block is in the kth hidden state within the 1 st time slice in the current intermediate state probability corresponding to the kth candidate block can be represented.

Current state transition probability A corresponding to the r-th candidate block_r ^(t+1)The method can comprise the following steps: the current probability that the r-th candidate block is transferred between every two hidden states of the hidden Markov model, namely A_r,j,k ^(t+1)K is 1,2,3, …, K, j is 1,2,3, …, K is the total number of hidden states of the hidden markov model.

Wherein, the current probability A of the r-th candidate block being transferred from the j-th hidden state to the k-th hidden state can be calculated by the following formula_r,j,k ^(t+1)：

Wherein, for the parameters

The definitions of the above parameters may be the same as those of the corresponding parameters in the foregoing, and are not described herein again.

In the current iteration, after the current initial state probability of each candidate block and the current state transition probability of each candidate block are determined once, the current gaussian distribution mean value commonly corresponding to each candidate block can be further determined based on the current intermediate state probability respectively corresponding to each candidate block, and the current gaussian distribution variance commonly corresponding to each candidate block is determined based on the current intermediate state probability respectively corresponding to each candidate block.

Specifically, the current gaussian distribution mean value commonly corresponding to each candidate block may be determined based on the current intermediate state probability respectively corresponding to each candidate block and the observation sequence respectively corresponding to each candidate block. In addition, the current gaussian distribution variance commonly corresponding to each candidate block can be determined based on the current intermediate state probability respectively corresponding to each candidate block, the observation sequence respectively corresponding to each candidate block, and the current gaussian distribution mean.

After determining the current model parameters of the hidden markov model (i.e., the current initial state probability corresponding to each candidate block, the current state transition probability corresponding to each candidate block, the gaussian distribution mean value corresponding to each candidate block, and the gaussian distribution variance corresponding to each candidate block) in the current iteration, it can be determined whether the iteration termination condition is satisfied. The iteration termination condition is a condition for judging whether the current model parameter is converged. The iteration termination condition may be preset based on an actual requirement, for example, the iteration termination condition may include, but is not limited to, that the number of iterations corresponding to a current round of iteration is greater than or equal to a predetermined threshold number.

If the iteration termination condition is met, the last determined model parameters (the initial state probability and the state transition probability corresponding to each candidate block determined at the last time, the Gaussian distribution mean value corresponding to each candidate block together, and the Gaussian distribution variance corresponding to each candidate block together) are the final model parameters of the hidden Markov model. If the iteration termination condition is not met, the next iteration can be executed, and the model training is continued.

It can be understood that the trained hidden markov model can be used to determine hidden state sequences corresponding to candidate blocks involved in the hidden markov model.

The model training method comprises the steps of determining the current intermediate state probability and the current intermediate state transition probability corresponding to each block based on the observation sequence corresponding to the block, the initial state probability and the state transition probability corresponding to the block which are determined last time, and the Gaussian distribution mean value and the Gaussian distribution variance which are jointly corresponding to each block and are related to the hidden Markov model, determining the current initial state probability of each block based on the current intermediate state probability corresponding to each block, and determining the current state transition probability of each block based on the current intermediate state transition probability corresponding to each block. Therefore, in the iterative computation process, the operation support for determining the current initial state probability and the current state transition probability of each block is parallel, the time complexity is effectively reduced, and the method can be applied to large-scale data scenes, namely, the dynamic learning of long-time and fine-grained places is supported.

In addition, the method aims at a scheme for modeling symbiotic relationship among time, place and human activities in a mode of characterizing learning (such as Cross-Modal representational learning) in the traditional technology. In addition to the defect that parallel processing of data is difficult to support, the scheme cannot determine the transition situation of the state of the blocks and cannot distinguish the states of different blocks.

However, in the present application, the trained hidden markov model estimates the population flow characteristics of the blocks, and the model parameters of the trained hidden markov model include state transition probabilities corresponding to the blocks, so that the state transition situation of each block can be determined, and the states of different blocks can be distinguished.

In another embodiment, one hidden markov model corresponding to each block may be learned using an observation sequence corresponding to each block, and model parameters of the hidden markov model include an initial state probability corresponding to each block, a state transition probability corresponding to each block, and an observation probability corresponding to each block. However, this approach does not reflect the difference in state transitions between blocks due to the difference in the types of functions to which they belong.

However, in the model training method provided by the present application, one hidden markov model is commonly learned using the observation sequences corresponding to the respective blocks, but the model parameters of the hidden markov model include initial state probabilities corresponding to the respective blocks related to the hidden markov model, state transition probabilities corresponding to the respective blocks, gaussian distribution means corresponding to the respective blocks, and gaussian distribution variances corresponding to the respective blocks. On one hand, each block has corresponding initial state probability and state transition probability, so that the difference of state transition between blocks due to different function types can be reflected; on the other hand, the observation sequences corresponding to the blocks are used for learning a hidden Markov model together, but the observation sequences corresponding to the blocks are not used for learning the hidden Markov models respectively, so that the problems that the training data are sparse, the model learning is insufficient, and the association between the blocks cannot be established are effectively solved.

In an embodiment, in the current iteration, the step of determining the current intermediate state probability and the current intermediate state transition probability respectively corresponding to each candidate block based on the observation sequence respectively corresponding to each candidate block, the initial state probability and the state transition probability respectively corresponding to each candidate block determined last time, the gaussian distribution mean value jointly corresponding to each candidate block, and the gaussian distribution variance jointly corresponding to each candidate block, that is, step S302, may include the following steps: in the current iteration, determining current target sequence segments corresponding to all candidate blocks respectively; the current target sequence segment corresponding to the candidate block is a sequence segment which is not used as a target sequence segment in the current iteration in each sequence segment contained in the observation sequence corresponding to the candidate block; and determining the current intermediate state probability and the current intermediate state transition probability respectively corresponding to each candidate block based on each current target sequence segment, the initial state probability respectively corresponding to each candidate block determined last time, the state transition probability respectively corresponding to each candidate block, the Gaussian distribution mean value commonly corresponding to each candidate block and the Gaussian distribution variance commonly corresponding to each candidate block.

Accordingly, after the step of determining the current gaussian distribution mean and the current gaussian distribution variance commonly corresponding to the respective street blocks based on the current intermediate state probabilities respectively corresponding to the respective candidate street blocks, that is, after step S306, the following steps may be further included: and returning to the step of determining the current target sequence segments corresponding to the candidate blocks until all the sequence segments contained in the observation sequences corresponding to the candidate blocks are used as target sequence segments in the current iteration, and judging whether the iteration termination condition is met.

In this embodiment, for each candidate block, the observation sequence corresponding to the candidate block may be split into two or more sequence segments. Accordingly, in each iteration, for each candidate block, the current intermediate state probability and the current intermediate state transition probability corresponding to the candidate block can be calculated once based on each sequence segment included in the observation sequence corresponding to the candidate block. It can be understood that, each time the current intermediate state probability and the current intermediate state transition probability corresponding to each candidate block involved in the hidden markov model are calculated, the current model parameters of the hidden markov model should be determined once (that is, the current initial state probability and the current state transition probability corresponding to each candidate block, respectively, and the current gaussian distribution mean and the current gaussian distribution variance corresponding to each candidate block involved in the hidden markov model are determined once).

In connection with the foregoing example, the observation sequences respectively corresponding to the candidate blocks related to the hidden markov model each include the demographic activity data within 720 time slices, and for the observation sequence corresponding to each candidate block, the observation sequence may be split into 15 sequence segments at an interval of 48. For example, the observation sequence O corresponding to the r-th candidate block_r＝{O_r,1,O_r,2,O_r,3,...,O_r,720The sequence is divided into 15 sequence segments, and the 1 st sequence segment is { O }_r,1,O_r,2,O_r,3,...,O_r,48The 2 nd sequence fragment is { O }_r,49,O_r,2,O_r,3,...,O_r,96Are sequentially classifiedTo conclude, the 15 th sequence fragment is { O_r,673,O_r,2,O_r,3,...,O_r,720}。

Accordingly, in the t +1 th iteration, the process may be as follows: for each candidate block, first, the 1 st sequence segment in the observation sequence corresponding to the candidate block is selected (for example, the 1 st sequence segment of the r-th candidate block may be { O }_r,1,O_r,2,O_r,3,...,O_r,48}) determining a current target sequence segment corresponding to the candidate block, determining a current intermediate state probability and a current intermediate state transition probability corresponding to the candidate block based on a 1 st sequence segment corresponding to the candidate block and model parameters determined last time (namely, initial state probability and state transition probability corresponding to the candidate block obtained by the t-th iteration and Gaussian distribution mean and Gaussian distribution variance which are commonly corresponding to the candidate blocks related to the hidden Markov model), determining a current initial state probability and a current intermediate state transition probability of the candidate block based on the current intermediate state probability corresponding to the candidate block, and determining a current state transition probability of the candidate block based on the current intermediate state transition probability corresponding to the candidate block. And determining the current Gaussian distribution mean value and the current Gaussian distribution variance which correspond to the candidate blocks together based on the current intermediate state probability which corresponds to the candidate blocks respectively and relates to the hidden Markov model.

Furthermore, for each candidate block, the 2 nd sequence segment in the observation sequence corresponding to the candidate block (for example, the 2 nd sequence segment of the r-th candidate block may be { O }_r,49,O_r,2,O_r,3,...,O_r,96}) determining a current target sequence segment corresponding to the candidate block, determining a current intermediate state probability and a current intermediate state transition probability corresponding to the candidate block based on a 2 nd sequence segment corresponding to the candidate block and a model parameter determined last time (namely, in the t +1 th iteration, based on an initial state probability and a state transition probability corresponding to the candidate block obtained from a 1 st sequence segment corresponding to the candidate block and a Gaussian distribution mean and a Gaussian distribution variance which are jointly corresponding to candidate blocks related to a hidden Markov model),and determining the current initial state probability of the candidate block based on the current intermediate state probability corresponding to the candidate block, and determining the current state transition probability of the candidate block based on the current intermediate state transition probability corresponding to the candidate block. And determining the current Gaussian distribution mean value and the current Gaussian distribution variance which correspond to the candidate blocks together based on the current intermediate state probability which corresponds to the candidate blocks respectively and relates to the hidden Markov model.

And analogizing in sequence until a 15 th sequence segment in the observation sequence corresponding to each candidate block is determined as a current target sequence segment corresponding to the candidate block, performing the similar steps based on the 15 th sequence segment corresponding to the candidate block, determining a current intermediate state transition probability corresponding to the candidate block, determining the current state transition probability of the candidate block, and determining a current Gaussian distribution mean value and a current Gaussian distribution variance which are jointly corresponding to the candidate blocks based on the current intermediate state probabilities respectively corresponding to the candidate blocks related to the hidden Markov model.

So far, the t +1 th iteration is completed, and whether the iteration termination condition is met or not can be judged. If so, obtaining a trained hidden Markov model based on the last determined model parameters (namely, in the t +1 round of iteration, the initial state probability and the state transition probability which respectively correspond to each candidate block and are obtained based on the 15 th sequence segment corresponding to each candidate block, and the Gaussian distribution mean value and the Gaussian distribution variance which jointly correspond to each candidate block); if not, executing the next iteration (i.e. the t +2 th iteration), wherein the processing procedure in the t +2 th iteration is similar to the processing procedure in the t +1 th iteration, which is not described herein again.

In one embodiment, for each candidate block, the current intermediate state probability corresponding to the candidate block includes current probabilities that the candidate block is in hidden states in target time slices covered by the corresponding observation sequence. And, the current Gaussian distribution mean μ^(t+1)Including generating persons in observation sequences separately under hidden states of hidden Markov modelsCurrent mean value of the gaussian distribution obeyed by the probability of each activity behavior feature to which the oral activity data relates, i.e. mu_k,m ^(t+1)，k＝1,2,3,…,K， m＝1,2,3,…,M。

Accordingly, the manner of determining a current mean of a gaussian distribution to which the probability of any activity behavior feature to which the demographic activity data in the observation sequence relates is obeyed under any hidden state of the hidden markov model may comprise the steps of: and determining the current mean value of Gaussian distribution obeyed by the probability of generating the activity behavior feature under the condition of the hidden state based on the current probability of each candidate block in the hidden state in each target time slice and the activity behavior feature related to the population activity data of each candidate block in each target time slice.

Specifically, the current mean μ of the gaussian distribution to which the probability of generating the mth active behavior feature is obeyed under the condition of the kth hidden state can be calculated by the following formula_k,m ^(t+1)：

Wherein the content of the first and second substances,

representing the current probability that the nth candidate block is in the kth hidden state in the nth time slice; o is_r,n,mRepresenting the mth activity behavior characteristic related to the population activity data of the nth candidate block in the nth target time slice; r represents the total number of each candidate block related to the hidden Markov model; n1 represents the total number of target time slices covered by the observation sequence corresponding to the r-th candidate block.

It should be noted that, if the observation sequence corresponding to each candidate block is divided into two or more sequence segments, in each iteration, for each candidate block, the current intermediate state probability and the current intermediate state probability corresponding to the candidate block are calculated once based on each sequence segment included in the observation sequence corresponding to the candidate blockThe transition probability of the intermediate state, then calculate the current mean value mu of the Gaussian distribution obeyed by the probability of generating the mth activity behavior feature under the condition of the kth hidden state_k,m ^(t+1)Then, each target time slice covered by the observation sequence corresponding to the nth candidate block described above is each time slice covered by the current target sequence segment, that is, N1 may be equal to the total number of time slices covered by the current target sequence segment. For the example of splitting an observation sequence containing demographic activity data in 720 time slices into 15 sequence segments at 48 intervals, N1 may then be equal to 48.

In addition, if the observation sequence corresponding to each candidate block is not split into more than two sequence segments, in each iteration, for each candidate block, the current intermediate state probability and the current intermediate state transition probability corresponding to the candidate block are calculated once based on the complete observation sequence corresponding to the candidate block, and the current mean value μ of the gaussian distribution to which the probability of generating the mth activity behavior feature obeys is calculated under the condition of being in the kth hidden state_k,m ^(t+1)In this case, the target time slices covered by the observation sequence corresponding to the nth candidate block are the time slices covered by the complete observation sequence, i.e., N1 may be equal to the total number of time slices covered by the complete observation sequence corresponding to the nth candidate block. For the example of the observation sequence containing demographic activity data in 720 time slices in the foregoing, N1 may then be equal to 720.

In one embodiment, the current gaussian distribution variance σ^(t+1)Including the current variance, sigma, of the Gaussian distribution to which the probability of activity behavior features involved in the demographic data in the observation sequence is respectively generated, subject to the hidden states of the hidden Markov model_k,m ^(t+1)，k＝1,2,3,…,K， m＝1,2,3,…,M。

From this, the way of determining the current variance of the gaussian distribution to which the probability of any activity behavior feature to which the demographic activity data in the observation sequence relates is obeyed under any hidden state of the hidden markov model may comprise the steps of: and determining the current variance of the Gaussian distribution to which the probability of generating the activity behavior feature under the condition of the hidden state is obeyed based on the current probability of each candidate block in the hidden state in each target time slice, the activity behavior feature to which the population activity data of each candidate block in each target time slice relates, and the current mean of the Gaussian distribution to which the probability of generating the activity behavior feature under the condition of the hidden state is obeyed.

Specifically, the current variance σ of the gaussian distribution to which the probability of generating the mth active behavior feature is obeyed under the condition of the kth hidden state can be calculated by the following formula_k,m ^(t+1)：

Wherein, here for the parameters

O_r,n,m、μ_k,m ^(t+1)And N1, may be the same as those described above, and will not be described herein.

In one embodiment, as shown in FIG. 8, a method for determining the functional type of a neighborhood is provided. The method is applied to a computer device (such as the terminal 210 or the server 220 in fig. 2) for example. The method may include the following steps S802 to S810.

S802, acquiring observation sequences corresponding to the candidate blocks related to the hidden Markov model.

S804, based on the initial state probability corresponding to each candidate block in the hidden Markov model, the state transition probability corresponding to each candidate block, the Gaussian distribution mean value corresponding to each candidate block together and the Gaussian distribution variance corresponding to each candidate block together, respectively determining the local probability of each candidate block in each hidden state of the hidden Markov model in each time slice covered by the observation sequence, and based on each local probability, determining the reverse pointer corresponding to each local probability respectively.

S806, based on the maximum local probability of the local probabilities of the candidate blocks in the hidden states in the last time slice covered by the observation sequence, the hidden state of each candidate block in the last time slice is determined.

And S808, performing optimal path backtracking based on the hidden state of each candidate block in the last time slice and each backward pointer to obtain a hidden state sequence corresponding to each candidate block.

And S810, clustering is carried out on the basis of the hidden state sequences respectively corresponding to the candidate blocks, and the function types of the candidate blocks are respectively determined from the candidate function types on the basis of the clustering result.

And the function type can be used for representing the functions of the block. The candidate function types may be preset based on actual needs, such as tourist spots, residential areas, general areas, business areas, schools, complex areas, companies, and others.

In this embodiment, the hidden state sequences corresponding to the candidate blocks related to the hidden markov model are determined by the hidden state sequence determining method provided in any embodiment of the present application, and then clustering is performed based on the hidden state sequences corresponding to the candidate blocks, and the function types to which the candidate blocks belong are determined from the candidate function types based on the clustering result.

Specifically, the hidden state sequence corresponding to each candidate block is determined

R is 1,2,3, …, and R represents the total number of candidate blocks involved in the hidden markov model. Further, the sequence distance between each two hidden state sequences is determined, that is, the sequence between the hidden state sequence corresponding to the 1 st candidate block and the hidden state sequence corresponding to the 2 nd candidate block, the hidden state sequence corresponding to the 3 rd candidate block, …, and the hidden state sequence corresponding to the nth candidate block are determinedColumn distance, sequence distance between the hidden state sequence corresponding to the 2 nd candidate block and the hidden state sequence corresponding to the 3 rd candidate block, the hidden state sequence corresponding to the 4 th candidate block, …, and the hidden state sequence corresponding to the nth candidate block, respectively, and so on. And then, clustering each candidate block based on the sequence distance between each two hidden state sequences through a K-means clustering algorithm, thereby determining a plurality of clusters, wherein each cluster corresponds to each candidate function type. And aiming at each candidate block, determining the cluster to which the candidate block belongs so as to determine the function type to which the candidate block belongs.

In addition, the manner of determining the sequence distance between two hidden state sequences may specifically be as follows: and calculating the state distance between the hidden states corresponding to the same time slice in the two hidden state sequences in a Euler distance calculation mode, and further determining the sequence distance between the two hidden state sequences based on the state distance between the hidden states in each time slice. Specifically, the ratio of the sum of the state distances between the hidden states within each time slice to the total number of each time slice may be taken as the sequence distance between two sequences of hidden states.

Hidden state sequence corresponding to 1 st candidate block

Hidden state sequence corresponding to 2 nd candidate block

For example, by calculating the Euler distance

And

the state distance d1,

And

the state distance d2,

And

the state distances d3, …,

And

the hidden state sequence dN, based on d1, d2, d3, …, and dN

And hidden state sequence

The sequence distance between. In particular, hidden state sequences

And hidden state sequence

May be a sequence distance of

It should be noted that, determining the function type to which the candidate block belongs can provide reference for city planning and city infrastructure, and can also directly guide the introduction of new points of interest and the location of shops.

It should be understood that, under reasonable circumstances, although the steps in the flowcharts referred to in the foregoing embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in each flowchart may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The functional characteristics of the technical scheme provided by the application are explained in combination with actual tests, and the technical scheme provided by the application is tested according to the population activity data of 665 central blocks in Beijing City and 2000000 users in 2018 and 4 months. And dividing the population activity data in the 1 month period into a training set and a test set, wherein the population activity data in the previous three weeks is the training set which is used as an existing observation sequence to learn the model parameters of the hidden Markov model, and the test set is used for verifying the performance of the trained hidden Markov model.

First, in practical tests, 100 hidden states as shown in fig. 5 were learned on 665 central blocks in beijing city, and in order to demonstrate the ability of the technical solution in the present application to find hidden states and to reveal the dynamics of the blocks within the city, a series of specific examples and detailed explanations are given below.

As shown in fig. 9, a mean value corresponding to the hidden states frequently appearing in the hidden state sequence corresponding to each block is shown, and transition conditions of the hidden states on the working day and the non-working day are respectively shown. It is understood that the normal weekend in month 4, and the 3-day-to-minded holidays of 5 days (thursday), 6 days (friday) and 7 days (saturday) in month 4 belong to non-workdays, normal workdays and 8 days (sundays) in month 4 belong to workdays.

First, discussing the hidden states discovered, each hidden state shown in FIG. 9 has two aspects of semantics: (1) population density and population flow, such as hidden state32 indicating high and high population flow, state 21 indicating low and high population density, and state17 indicating low and low population flow. (2) Access frequency for different types of points of interest, such as hidden state 31 indicating that the most frequently accessed point of interest is an educational type, and state 21 indicating that the most frequently accessed point of interest is an attraction type. As shown in fig. 10 (a), the hidden state79 appears in the block during the day because the qinghua university occupies most of the area of the block, and the hidden state 99 appears in the block where the beijing university is located.

Further, the dynamics represented by the state transition process are discussed. It is apparent from fig. 10 that the dynamics of the blocks within a city have periodicity, since the status of the same time period on different days is generally the same. It is noted that the dynamics in some regions, as shown in fig. 9 (f), have a large difference between weekdays and non-working, while for other regions, as shown in fig. 9 (c), are very similar between weekdays and non-working.

Taking the dynamics of the neighborhood where the university of qinghua is located as an example, as shown in fig. 9 (a), there are fewer people at night than in the daytime because the average value of the hidden state70 and the hidden state 31 is smaller than the hidden state 79. Furthermore, on weekdays, a sudden crowd movement occurs because the hidden state32 appears at 8: 00-9: 00 and 17: 00-19: 00. the transition from hidden state70 to hidden state32 and hidden state32 to hidden state79 during the workday reveals a dynamic feature where only students live in the area at night and more teachers enter the school in the morning, with a denser population than at night. In comparison to (a) and (b) in fig. 9, fig. 9 (c) shows that population density is consistently high and the types of points of interest visited are most often sights, whether on weekdays or non-weekdays.

In addition, the performance of the technical scheme provided by the application on determining the function type of the city is further evaluated in practical tests. Fig. 10 shows the clustering results for each block and the geographical distribution of the corresponding area. According to the technical scheme, 8 function types are obtained on the data set, namely tourist attractions, residential areas, comprehensive areas, business areas, schools, composite areas, companies and the like, and the functions of some blocks (including streets shown in fig. 9 on a map) are verified through a manual labeling method, so that the technical scheme provided by the application can effectively determine the function types of the cities. Further, this result is compared with the most advanced function type determination method in the actual test, i.e. an LDA model (Latent Dirichlet Allocation model) using the points of interest and mobility, the result of the actual test is similar to the processing result of the LDA model, and the Normalized Mutual Information (NMI) is 0.25 (range from-0.5 to 1). In conclusion, the blocks with more common states and similar state transition processes are more likely to have the same functions, and the technical scheme of the application is proved to be capable of deducing the distribution of the functional blocks in the whole city.

Meanwhile, the performance of the technical scheme in the aspect of predicting the population mobility behavior is evaluated in practical tests. Prediction results as shown in fig. 11, (a) in fig. 11 illustrates the difference between the predicted value and the actual value of the number of persons staying in the neighborhood of qinghua university from 22 to 30 in 4 months in 2018.

In order to further prove the superiority of the technical scheme of the application, the method is compared with a common hidden Markov model in an actual test. The comparison result of the indexes is shown in (b) in fig. 11, wherein the average RMSE (Root Mean Square Error) of the population flow prediction is 0.195, and the accuracy of Top3 when predicting the most frequently visited interest point is 41.4%, so that the technical scheme in the application is obviously superior to the common hidden markov model. In summary, the technical scheme in the application can be effectively applied to people stream prediction and frequently visited interest point prediction of the blocks in the city.

In one embodiment, as shown in fig. 12, a determination apparatus 1200 for a hidden state sequence is provided. The apparatus may include the following modules 1202 to 1208.

A first observation sequence obtaining module 1202, configured to obtain an observation sequence corresponding to a target block.

A first intermediate parameter determining module 1204, configured to determine, based on the observation sequence, an initial state probability corresponding to the target block in the hidden markov model, a state transition probability corresponding to the target block, a gaussian distribution mean value corresponding to each candidate block related to the hidden markov model, and a gaussian distribution variance corresponding to each candidate block, local probabilities that the target block is in each hidden state of the hidden markov model within each time slice covered by the observation sequence, respectively, and determine reverse pointers corresponding to each local probability, respectively.

A first end hidden state determining module 1206, configured to determine, based on a maximum local probability of local probabilities that the target block is in each hidden state in a last time slice covered by the observation sequence, a hidden state in which the target block is located in the last time slice.

The first hidden state sequence determining module 1208 is configured to perform optimal path backtracking based on the hidden state of the target block in the last time slice and each reverse pointer, so as to obtain a hidden state sequence.

In one embodiment, as shown in fig. 13, a determination apparatus 1300 of the functional type of the neighborhood is provided. The apparatus may include the following modules 1302 to 1310.

A second observation sequence obtaining module 1302, configured to obtain observation sequences corresponding to candidate blocks related to the hidden markov model, respectively;

a second intermediate parameter determining module 1304, configured to determine local probabilities of the candidate blocks in hidden states of the hidden markov model within time slices covered by the observation sequence based on initial state probabilities corresponding to the candidate blocks, state transition probabilities corresponding to the candidate blocks, gaussian distribution mean values corresponding to the candidate blocks, and gaussian distribution variances corresponding to the candidate blocks, respectively, and determine reverse pointers corresponding to the local probabilities based on the local probabilities;

a second end hidden state determining module 1306, configured to determine hidden states of the candidate blocks in a last time slice covered by the observation sequence based on a maximum local probability of local probabilities of the candidate blocks in the hidden states in the last time slice;

a second hidden state sequence determining module 1308, configured to perform optimal path backtracking based on the hidden state of each candidate block in the last time slice and each backward pointer, so as to obtain a hidden state sequence corresponding to each candidate block;

the function type determining module 1310 is configured to perform clustering based on the hidden state sequences corresponding to the candidate blocks, and determine the function type to which each candidate block belongs from the candidate function types based on the clustering result.

It should be noted that, for specific definition of technical features in the determination device 1200 for determining a hidden state sequence, reference may be made to the definition of the determination method for a hidden state sequence in the foregoing, and for specific definition of technical features in the determination device 1300 for determining a function type of a street block, reference may be made to the definition of the determination method for a function type of a street block in the foregoing, which is not described herein again. The various modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In an embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the above-mentioned method of determining a sequence of hidden states and/or the method of determining a functional type of a street block.

An internal block diagram of a computer device in one embodiment is shown in FIG. 14. The computer device may specifically be the server 220 in fig. 2. As shown in fig. 14, the computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor is configured to provide computational and control capabilities. The memory includes a nonvolatile storage medium storing an operating system and a computer program, and an internal memory providing an environment for the operating system and the computer program in the nonvolatile storage medium to run. The network interface is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement the above-described method of determining a sequence of hidden states and/or the method of determining a functional type of a neighborhood.

Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, taking the hidden state sequence determining apparatus 1200 provided in the present application as an example, the apparatus may be implemented in a form of a computer program, and the computer program may be run on a computer device as shown in fig. 14. The memory of the computer device may store various program modules constituting the hidden state sequence determining apparatus 1200, such as a first observation sequence obtaining module 1202, a first intermediate parameter determining module 1204, a first terminal hidden state determining module 1206, a first hidden state sequence determining module 1208 and the like shown in fig. 12. The computer program constituted by the respective program modules causes the processor to execute the steps in the determination method of the hidden state sequence of the embodiments of the present application described in the present specification.

For example, the computer apparatus shown in fig. 14 may execute step S302 by the first observation sequence acquisition module 1202 in the determination apparatus 1200 of the hidden state sequence shown in fig. 12, execute step S304 by the first intermediate parameter determination module 1204, and so on.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

Accordingly, in an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the above-mentioned determination method of the sequence of hidden states and/or the determination method of the functional type of the neighborhood.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of determining a sequence of hidden states, comprising:

acquiring an observation sequence corresponding to a target block;

2. The method of claim 1, wherein the obtaining of the observation sequence corresponding to the target block comprises:

acquiring an original observation sequence corresponding to the target block; the original observation sequence comprises original population activity data of the target block in more than two time slices, and activity behavior characteristics related to each original population activity data comprise population floating number and access frequency aiming at a preset type of interest points;

and carrying out maximum value normalization on the population flowing number in each original population activity data and the TF-IDF parameter corresponding to the access frequency aiming at the preset type of interest points in each original population activity data to obtain an observation sequence corresponding to the target block.

3. The method of claim 1, wherein determining the local probability that the target neighborhood is in any hidden state of the hidden markov model within any time slice covered by the observation sequence comprises:

determining the emission probability of the population activity data in the time slice in the observation sequence generated under the condition that the target block is in the hidden state in the time slice based on the population activity data in the time slice in the observation sequence, the Gaussian distribution mean value which is jointly corresponding to each candidate block related to the hidden Markov model and the Gaussian distribution variance which is jointly corresponding to each candidate block;

determining the local probability of the target block in the hidden state in the time slice based on the local probability of each hidden state of the hidden Markov model of the target block in the previous time slice adjacent to the time slice, the state transition probability corresponding to the target block in the hidden Markov model and the emission probability;

and the local probability of the target block in the hidden state in the first time slice covered by the observation sequence is determined based on the probability corresponding to the hidden state in the initial state probability corresponding to the target block and the emission probability of the population activity data generated in the first time slice in the observation sequence under the condition that the target block is in the hidden state in the first time slice.

4. The method according to claim 3, wherein the mean Gaussian distribution comprises a mean of a Gaussian distribution obeyed by the probability of each activity behavior feature to which the population activity data in the observation sequence relates, respectively, under the condition of each hidden state of the hidden Markov model, and the variance of the Gaussian distribution comprises a variance of the Gaussian distribution obeyed by the probability of each activity behavior feature to which the population activity data in the observation sequence relates, respectively, under the condition of each hidden state of the hidden Markov model;

the determining, based on the population activity data in the time slice in the observation sequence, the gaussian distribution mean value commonly corresponding to the candidate blocks related to the hidden markov model, and the gaussian distribution variance commonly corresponding to the candidate blocks, a transmission probability that the target block generates the population activity data in the time slice in the observation sequence under the condition that the target block is in the hidden state in the time slice includes:

and determining the emission probability of the target block generating the population activity data under the condition that the target block is in the hidden state of the hidden Markov model in the time slice based on the variance of the Gaussian distribution obeyed by the probability of generating each activity behavior feature related to the population activity data of the observation sequence under the condition that the target block is in the hidden state, the mean value of the Gaussian distribution obeyed by the probability of generating each activity behavior feature related to the population activity data of the observation sequence respectively under the condition that the target block is in the hidden state of the hidden Markov model in the observation sequence, and the population activity data of the target block in the time slice.

5. The method of claim 1, wherein the hidden Markov model training mode comprises:

acquiring observation sequences corresponding to the candidate blocks respectively;

in the current round of iteration, determining the current intermediate state probability and the current intermediate state transition probability respectively corresponding to each candidate block based on the observation sequence respectively corresponding to each candidate block, the initial state probability and the state transition probability respectively corresponding to each candidate block determined last time, the Gaussian distribution mean value commonly corresponding to each candidate block and the Gaussian distribution variance commonly corresponding to each candidate block;

determining the current initial state probability of each candidate block based on each current intermediate state probability, and determining the current state transition probability of each candidate block based on each current intermediate state transition probability;

based on the current intermediate state probabilities, determining a current Gaussian distribution mean value and a current Gaussian distribution variance which are jointly corresponding to the candidate blocks;

and when the iteration termination condition is met, obtaining the hidden Markov model based on the initial state probability and the state transition probability respectively corresponding to each candidate block which are determined for the last time and the Gaussian distribution mean value and the Gaussian distribution variance which are jointly corresponding to each candidate block.

6. The method of claim 5, wherein the current intermediate state probability corresponding to the candidate block comprises a current probability that the candidate block is in the hidden state in each target time slice of the time slices covered by the corresponding observation sequence; the current mean value of the Gaussian distribution to which the probability of each activity behavior feature related to the demographic activity data in the observation sequence is respectively generated comprises the current mean value of the Gaussian distribution under the condition of each hidden state of the hidden Markov model;

a way of determining a current mean of a gaussian distribution obeying the probability of any activity behavior feature to which demographic activity data in the observation sequence relates, subject to any hidden state of the hidden markov model, comprises:

and determining a current mean value of a Gaussian distribution to which the probability of generating the activity behavior feature under the condition of being in the hidden state is subjected based on the current probability of each candidate block in the hidden state in each target time slice and the activity behavior feature related to the population activity data of each candidate block in each target time slice.

7. The method according to claim 6, wherein the current Gaussian distribution variance comprises a current variance of the Gaussian distribution to which probabilities of activity behavior features involved by the demographic activity data in the observation sequence are respectively generated on condition that the hidden states of the hidden Markov model are present;

a way of determining a current variance of a gaussian distribution obeying to a probability of any activity behavior feature to which demographic activity data in the observation sequence relates, subject to any hidden state of the hidden markov model, comprising:

and determining the current variance of the Gaussian distribution to which the probability of generating the activity behavior feature under the condition of the hidden state is obeyed based on the current probability of each candidate block in the hidden state in each target time slice, the activity behavior feature to which the population activity data of each candidate block in each target time slice relates, and the current mean of the Gaussian distribution to which the probability of generating the activity behavior feature under the condition of the hidden state is obeyed.

8. The method as claimed in any one of claims 5 to 7, wherein the determining, in the current iteration, a current intermediate state probability and a current intermediate state transition probability corresponding to each candidate block based on the observation sequence corresponding to each candidate block, the initial state probability and the state transition probability corresponding to each candidate block determined last time, the mean value of the Gaussian distribution corresponding to each candidate block in common, and the variance of the Gaussian distribution corresponding to each candidate block in common comprises:

in the current iteration, determining current target sequence segments corresponding to the candidate blocks respectively; the current target sequence segment corresponding to the candidate block is a sequence segment which is not used as a target sequence segment in the current iteration in each sequence segment contained in the observation sequence corresponding to the candidate block;

determining current intermediate state probability and current intermediate state transition probability corresponding to each candidate block based on each current target sequence segment, initial state probability corresponding to each candidate block determined last time, state transition probability corresponding to each candidate block, Gaussian distribution mean value corresponding to each candidate block and Gaussian distribution variance corresponding to each candidate block;

after the determining the current mean gaussian distribution and the current variance of gaussian distribution that are commonly corresponding to the blocks based on the current intermediate state probability that each of the candidate blocks respectively corresponds to, the method further includes:

and returning to the step of determining the current target sequence segments corresponding to the candidate blocks respectively, and judging whether an iteration termination condition is met or not until all sequence segments contained in the observation sequence corresponding to the candidate blocks are used as target sequence segments in the current iteration.

9. The method of claim 1, further comprising, after determining the hidden state of the target block within the last time slice:

and predicting the hidden state of the target block in the time slice next to the last time slice based on the hidden state of the target block in the last time slice covered by the observation sequence and the state transition probability corresponding to the target block in the hidden Markov model.

10. The method of claim 9, further comprising, after predicting the hidden state of the target block in a time slice next to the last time slice:

and predicting the population activity data of the target block in the next time slice of the last time slice based on the hidden state of the target block in the next time slice of the last time slice and the Gaussian distribution mean value in the hidden Markov model.

11. A method for determining a functional type of a neighborhood, comprising:

12. An apparatus for determining a sequence of hidden states, comprising:

13. An apparatus for determining a functional type of a neighborhood, comprising:

14. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 11.

15. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 11.