CN114792110A

CN114792110A - Method and device for generating point of interest data

Info

Publication number: CN114792110A
Application number: CN202110098911.2A
Authority: CN
Inventors: 陈文冬; 史超; 孟平
Original assignee: Nanjing Yibo Software Technology Co ltd
Current assignee: Nanjing Yibo Software Technology Co ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2022-07-26
Anticipated expiration: 2041-01-25
Also published as: CN114792110B

Abstract

The embodiment of the application provides a method and a device for generating point of interest data, which are used for solving the technical problem that the clustering basis of the traditional clustering algorithm is single, and the method comprises the following steps: determining sample data of a plurality of monitoring objects, wherein the sample data of each monitoring object in the plurality of monitoring objects comprises a coordinate position of each monitoring object and a comprehensive score of a user behavior corresponding to each monitoring object; calculating the distance between every two coordinate positions of every two monitoring objects in the plurality of monitoring objects to obtain a distance matrix; and determining the point of interest data corresponding to the monitoring object according to the comprehensive score and the distance matrix.

Description

Method and device for generating point of interest data

Technical Field

The present application relates to the field of data processing, and in particular, to a method and an apparatus for generating point of interest data.

Background

Any meaningful point on the map is generally referred to as a point of interest (Poi), which may be tangible such as a store, bar, gas station, hospital, station, etc., or intangible with coordinate attributes.

When Poi is an intangible point with coordinate attributes, it can be used to identify areas of interest to the user. For example, when making a hotel purchase, a company's buyer needs to know the distribution of hotels of interest to the company's employees to target the purchase area for the hotel purchase. In this case, the distribution of hotels of interest to the employees of the company may be characterized by Poi. Specifically, the area on the map where Poi exists may be considered to be the area where the hotel in which the employee is interested is located.

Currently, traditional clustering algorithms such as DBSCAN and Kmeans are usually adopted to cluster sample data, and the clustering center is taken as Poi. However, conventional clustering algorithms only rely on spatial distances between data samples for clustering. That is, the conventional clustering algorithm considers that the closer the data sample is, the higher the similarity is, the more the data sample should be clustered into one cluster. In the above scenario of hotel purchase, if a hotel is located in a remote suburban area, no matter how explosive an order of the hotel is, because hotel distribution near the hotel is relatively dispersed, the traditional clustering algorithm still cannot cluster the hotel and the hotels near the hotel into one cluster, and thus cannot generate interest points near the hotel, which leads to a buyer of a company considering that the vicinity of the hotel is not an area in which employees are interested.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating point of interest data, which are used for solving the technical problem that the clustering basis of the traditional clustering algorithm is single, and in order to achieve the purpose, the embodiment of the application adopts the following technical scheme:

in a first aspect, a method for generating point of interest data is provided, including: determining sample data of a plurality of monitoring objects, wherein the sample data of each monitoring object in the plurality of monitoring objects comprises a coordinate position of each monitoring object and a comprehensive score of a user behavior corresponding to each monitoring object; calculating the distance between every two coordinate positions of every two monitoring objects in the plurality of monitoring objects to obtain a distance matrix; and determining the interest point data corresponding to the monitoring object according to the comprehensive score and the distance matrix. The generating device of the interest point data determines the interest point data corresponding to the monitoring objects according to the comprehensive score of the user behavior corresponding to each monitoring object and the distance matrix calculated by the coordinate position of each monitoring object, namely, in the process of clustering to generate the interest point data, the method is different from traditional clustering algorithms such as DBSCAN, Kmeans and the like.

With reference to the first aspect, in a possible implementation manner, the determining the point of interest data corresponding to the monitoring object according to the composite score and the distance matrix includes: determining a first parameter corresponding to the sample data of the multiple monitored objects, wherein the first parameter is a reference value required when the point of interest data corresponding to the monitored objects is determined; determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs, wherein the second parameter is used for representing the attention degree of a user to the region to which the coordinate position of each monitoring object belongs; and determining the point of interest data corresponding to the monitoring object according to the distance matrix, the comprehensive score, the first parameter and the second parameter. Because the attention degrees of the user to different regions are greatly different, the interest point data can be generated in different regions in a targeted manner by determining different second parameter values for different regions, so that the accuracy of positioning the region in which the user is interested is improved.

With reference to the first aspect, in a possible implementation manner, the first parameter includes a preset radius distance and a cumulative probability distribution threshold; the determining the point of interest data corresponding to the monitored object according to the distance matrix, the comprehensive score, the first parameter and the second parameter includes: for any sample data of the multiple monitoring objects, processing is carried out according to the following mode for the sample data of the first monitoring object: generating a neighborhood sample of the sample data of the first monitored object according to the distance matrix and the preset radius distance, wherein the neighborhood sample is the sample data of other monitored objects of which the distance between the coordinate position of the monitored object in the multiple monitored objects and the coordinate position of the first monitored object is smaller than the preset radius distance; determining a cumulative probability distribution of the neighborhood samples; determining the quantity threshold of the neighborhood samples according to the comprehensive score of the user behavior corresponding to the first monitoring object and the second parameter; if the cumulative probability distribution of the neighborhood samples is not less than the cumulative probability distribution threshold and the quantity of the neighborhood samples is not less than the quantity threshold of the neighborhood samples, marking a cluster on the sample data of the first monitored object, and executing the operation on the sample data of other monitored objects in the neighborhood samples; otherwise, marking the sample data of the first monitoring object as noise; after the sample data of each monitoring object in the multiple monitoring objects is marked, the sample data of the monitoring objects belonging to the same cluster is determined as the point of interest data corresponding to the monitoring object. In the method provided by the application, the spatial distance between the data samples and the significance degree of the user behavior are simultaneously considered. The spatial distance aspect is controlled by the number threshold of the neighborhood samples, and the significance degree aspect of the user behavior is controlled by the cumulative probability distribution threshold.

With reference to the foregoing first aspect, in a possible implementation manner, the determining a second parameter corresponding to an area to which a coordinate position of each monitored object belongs includes: and determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs according to the quantity threshold of the neighborhood samples. The purpose of this step is to calculate suitable second parameters for different regions.

With reference to the first aspect, in a possible implementation manner, the cumulative probability distribution of the neighborhood samples satisfies the following first formula:

wherein a random variable x represents the composite score and the distribution of x is assumed to obey a gaussian distribution; the parameter mu of Gaussian distribution is the mean value of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs, and the parameter sigma of Gaussian distribution ² The variance of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs; the upper limit scoresum represents the number of samples of the first monitored objectThe sum of the composite scores of the neighborhood samples according to. The cumulative probability distribution of the neighborhood samples may reflect how significant the user behavior is within the neighborhood.

With reference to the foregoing first aspect, in a possible implementation manner, the threshold of the number of neighborhood samples satisfies the following second formula:

wherein round (m,1) represents that one decimal is reserved for m, C represents a preset height factor, γ represents a preset offset factor, k represents the second parameter, and x represents a comprehensive score of the user behavior corresponding to the first monitored object. For the same value of x, the value of y corresponding to the region with lower user attention (which may correspond to the larger value of the second parameter k) is smaller, and the value of y corresponding to the region with higher user attention (which may correspond to the smaller value of the second parameter k) is larger, that is, for the same user behavior, in the region with lower user attention, only a small number of neighborhood samples need to be gathered to generate Poi, and in the region with higher user attention, more neighborhood samples need to be gathered to help generate Poi. Therefore, the scheme can ensure that Poi can be clustered in the areas with low user attention degree, and can effectively avoid the situation of Poi flooding in the areas with high user attention degree.

With reference to the first aspect, in a possible implementation manner, the composite score of the user behavior corresponding to each monitoring object satisfies the following third formula:

wherein, score _i A score, w, representing the ith user behavior corresponding to each monitored object _i And n represents the total number of the user behaviors corresponding to each monitoring object. User behavior summary corresponding to each monitored objectThe composite score is the weighted sum of the scores of each user behavior corresponding to the monitoring object, so that the composite score can reflect the significance degree of the user behavior more comprehensively and accurately.

With reference to the foregoing first aspect, in a possible implementation manner, the score of the ith user behavior corresponding to each monitored object satisfies a fourth formula as follows:

wherein s is _i The frequency of occurrence of the ith user behavior corresponding to each monitoring object is represented, s represents the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects, min(s) represents the minimum value of the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects, and max(s) represents the maximum value of the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects. Since the unit and magnitude of different user behaviors are different, the score of each user behavior corresponding to each monitoring object is normalized to [0, 10] by adopting a fourth formula]The interval of (2) is beneficial to unifying the measurement standard of each user behavior.

With reference to the foregoing first aspect, in a possible implementation manner, the point of interest data corresponding to the monitoring object includes multiple point of interest data; after the point of interest data corresponding to the monitoring object is determined, the method further includes: determining the distance between the coordinate positions corresponding to any two interest point data in the plurality of interest point data; and if the distance between the coordinate positions corresponding to any two interest point data in the interest point data is smaller than a first threshold value, deleting the interest point data with lower comprehensive score in the interest point data. This step may eliminate redundant data that is not important, thereby achieving the technical effect of optimizing the multiple points of interest data that has been generated.

In a second aspect, a device for generating point of interest data is provided to implement the above method. The device for generating the point of interest data includes modules, units, or means (means) corresponding to the implementation of the method, and the modules, units, or means may be implemented by hardware, software, or by hardware executing corresponding software. The hardware or software includes one or more modules or units corresponding to the above functions.

With reference to the second aspect, in a possible implementation manner, the apparatus for generating point of interest data includes: a processing module; the processing module is used for determining sample data of a plurality of monitoring objects, wherein the sample data of each monitoring object in the plurality of monitoring objects comprises a coordinate position of each monitoring object and a comprehensive score of a user behavior corresponding to each monitoring object; the processing module is also used for calculating the pairwise distance between the coordinate positions of every two monitoring objects in the plurality of monitoring objects to obtain a distance matrix; the processing module is further configured to determine the point of interest data corresponding to the monitoring object according to the comprehensive score and the distance matrix.

With reference to the second aspect, in a possible implementation manner, the processing module is further configured to determine point of interest data corresponding to the monitored object according to the composite score and the distance matrix, and includes: the first parameter is used for determining a first parameter corresponding to the sample data of the multiple monitored objects, and the first parameter is a reference value required when the point of interest data corresponding to the monitored objects is determined; determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs, wherein the second parameter is used for representing the attention degree of a user to the region to which the coordinate position of each monitoring object belongs; and determining the point of interest data corresponding to the monitoring object according to the distance matrix, the comprehensive score, the first parameter and the second parameter.

With reference to the second aspect, in a possible implementation manner, the first parameter includes a preset radius distance and a cumulative probability distribution threshold; the processing module is further configured to determine, according to the distance matrix, the composite score, the first parameter, and the second parameter, point of interest data corresponding to the monitored object, and includes: the method is used for processing the sample data of any one of the plurality of monitoring objects according to the following mode for processing the sample data of the first monitoring object: generating a neighborhood sample of the sample data of the first monitored object according to the distance matrix and the preset radius distance, wherein the neighborhood sample is the sample data of other monitored objects of which the distance between the coordinate position of the monitored object in the multiple monitored objects and the coordinate position of the first monitored object is smaller than the preset radius distance; determining a cumulative probability distribution of the neighborhood samples; determining the quantity threshold of the neighborhood samples according to the comprehensive score of the user behavior corresponding to the first monitoring object and the second parameter; if the cumulative probability distribution of the neighborhood samples is not less than the cumulative probability distribution threshold and the number of the neighborhood samples is not less than the number threshold of the neighborhood samples, marking a cluster mark on the sample data of the first monitored object, and executing the operation on the sample data of other monitored objects in the neighborhood samples; otherwise, marking the sample data of the first monitoring object as noise; after the sample data of each monitored object in the multiple monitored objects is marked, the sample data of the monitored objects belonging to the same cluster is determined as the point of interest data corresponding to the monitored object.

With reference to the second aspect, in a possible implementation manner, the processing module is further configured to determine a second parameter corresponding to an area to which the coordinate position of each monitoring object belongs, and the determining includes: and determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs according to the quantity threshold of the neighborhood samples.

With reference to the second aspect, in a possible implementation manner, the cumulative probability distribution of the neighborhood samples satisfies the following first formula:

wherein a random variable x represents the composite score and the distribution of x is assumed to follow a gaussian distribution; the parameter mu of the Gaussian distribution is the mean value of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs, and the parameter sigma of the Gaussian distribution ² The variance of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs; the upper limit of integration scoresum represents the firstMonitoring a sum of the composite scores of the neighborhood samples of the sample data of the object.

With reference to the second aspect, in a possible implementation manner, the threshold of the number of neighborhood samples satisfies the following second formula:

wherein round (m,1) indicates that one decimal is reserved for m, C indicates a preset height factor, γ indicates a preset offset factor, k indicates the second parameter, and x indicates a comprehensive score of the user behavior corresponding to the first monitored object.

With reference to the second aspect, in a possible implementation manner, the composite score of the user behavior corresponding to each monitoring object satisfies the following third formula:

wherein, score _i A score, w, representing the ith user behavior corresponding to each monitored object _i And representing the weight corresponding to the score of the ith user behavior, wherein n represents the total number of the user behaviors corresponding to each monitoring object.

With reference to the second aspect, in a possible implementation manner, the score of the ith user behavior corresponding to each monitored object satisfies the following fourth formula:

wherein s is _i Representing the frequency of occurrence of the ith user behavior corresponding to each monitored object, s representing the frequency of occurrence of the ith user behavior corresponding to the multiple monitored objects, min(s) representing the minimum value of the frequencies of occurrence of the ith user behavior corresponding to the multiple monitored objects, and max(s) representing the maximum value of the frequencies of occurrence of the ith user behavior corresponding to the multiple monitored objectsThe value is obtained.

With reference to the second aspect, in a possible implementation manner, the point of interest data corresponding to the monitoring object includes multiple point of interest data; the processing module is further configured to, after determining the point of interest data corresponding to the monitored object, further include: the distance between the coordinate positions corresponding to any two interest point data in the plurality of interest point data is determined; and if the distance between the coordinate positions corresponding to any two interest point data in the interest point data is smaller than a first threshold value, deleting the interest point data with lower comprehensive score in the interest point data.

With reference to the second aspect, in a possible implementation manner, the processing module may be a processor.

In a third aspect, an apparatus for generating point of interest data is provided, including: a processor; the processor is configured to be coupled to the memory and to execute the method according to any one of the above aspects after reading the computer instructions stored in the memory.

With reference to the third aspect, in a possible implementation manner, the apparatus for generating point of interest data further includes a memory; the memory is for storing computer instructions.

With reference to the foregoing third aspect, in a possible implementation manner, the apparatus for generating point of interest data further includes a communication interface; the communication interface is used for the generating device of the point of interest data to communicate with other equipment. Illustratively, the communication interface may be an input/output interface, interface circuitry, output circuitry, input circuitry, pins or related circuitry, or the like.

With reference to the foregoing third aspect, in a possible implementation manner, the apparatus for generating point of interest data may be a chip or a chip system. When the generating device of the point of interest data is a chip system, the generating device of the point of interest data may be formed by a chip, and may also include a chip and other discrete devices.

With reference to the third aspect, in a possible implementation manner, when the apparatus for generating point of interest data is a chip or a chip system, the communication interface may be an input/output interface, an interface circuit, an output circuit, an input circuit, a pin, or a related circuit on the chip or the chip system. The processor may also be embodied as a processing circuit or a logic circuit.

In a fourth aspect, a computer-readable storage medium is provided, having stored therein instructions, which when run on a computer, cause the computer to perform the method of any of the above aspects.

In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of the above aspects.

For technical effects brought by any possible implementation manner of the second aspect to the fifth aspect, reference may be made to technical effects brought by different implementation manners of the first aspect, and details are not described here.

Drawings

Fig. 1 is a first schematic diagram of a data processing flow provided in an embodiment of the present application;

fig. 2 is a first flowchart of a method for generating point of interest data according to an embodiment of the present application;

fig. 3 is a second flowchart of a method for generating point of interest data according to an embodiment of the present application;

fig. 4 is a flow chart of a method for generating point of interest data according to the embodiment of the present application;

fig. 5 is a fourth flowchart of a method for generating point of interest data according to an embodiment of the present application;

FIG. 6 is a graph of a function between a threshold number of neighborhood samples and a composite score of user behavior provided by an embodiment of the present application;

FIG. 7 is a graph illustrating a number threshold of neighborhood samples as a function of a composite score of user behavior under different second parameters, as provided by an embodiment of the present application;

fig. 8 is a schematic diagram illustrating a data processing flow according to an embodiment of the present application;

fig. 9 is a distribution of the point of interest data obtained by the method for generating the point of interest data provided in this embodiment in beijing;

fig. 10 is a distribution of the point of interest data obtained by the method for generating the point of interest data provided in this embodiment in shenzhen;

fig. 11 is a distribution of the point of interest data obtained by the method for generating the point of interest data provided in this embodiment in the hangzhou;

fig. 12 is a schematic structural diagram of a device for generating point of interest data according to this embodiment;

fig. 13 is a schematic structural diagram of another device for generating point of interest data according to this embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Where in the description of the present application, "/" indicates a relationship where the objects associated before and after are an "or", unless otherwise stated, for example, a/B may indicate a or B; in the present application, "and/or" is only an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. Also, in the description of the present application, "a plurality" means two or more than two unless otherwise specified. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" are used to distinguish identical items or similar items with substantially identical functions and actions. Those skilled in the art will appreciate that the terms "first," "second," and the like do not denote any order or importance, but rather the terms "first," "second," and the like do not denote any order or importance. Also, in the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or illustrations. Any embodiment or design described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion for ease of understanding.

In addition, the service scenario described in the embodiment of the present application is for more clearly illustrating the technical solution in the embodiment of the present application, and does not form a limitation on the technical solution provided in the embodiment of the present application, and it can be known by a person skilled in the art that, with the occurrence of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems. A service scenario provided in the embodiment of the present application is described below.

When making a hotel purchase, a buyer of a company needs to know the distribution of hotels in which employees of the company are interested to target the purchase area for the hotel purchase. Conventional hotel purchasing relies on the subjective judgment and experience of the purchaser, that is, the purchaser subjectively determines the purchase area. The mode has the problems of non-needed acquisition, deviated purchasing area, untimely purchasing, non-compliance and the like, so that the staff has no hotel reservation or has no price advantage and the like. With the popularization of machine learning technology, traditional clustering algorithms such as DBSCAN and Kmeans can be adopted to cluster hotels according to their location coordinates, and a clustering center is used as Poi for identifying areas in which users are interested. However, the traditional clustering algorithm only takes the spatial distance between hotels as the basis for clustering. If a hotel is located in a remote suburban area, no matter how explosive the order of the hotel is, the traditional clustering algorithm still cannot cluster the hotel and the hotels nearby to a cluster due to the scattered distribution of the hotels nearby the hotel, so that Poi cannot be generated nearby the hotel, and the buyer of the company considers that the area nearby the hotel is not the area in which the staff are interested.

In the embodiment of the application, Poi data is generated by taking the actual user behavior of the employee on the travel platform as a data source, and the data processing flow is as shown in fig. 1. The left-most column is a user behavior track of the employee on the travel platform, and the user behavior track comprises a user login travel platform, a hotel list page, city input, date of entrance and exit, search keywords, a hotel list page, a hotel detail page, an order filling page, an order booking and the like. Furthermore, the data processing device can extract real user behavior data including search data, browsing data and booking data of the hotel according to the user behavior track of the employee on the travel platform, and respectively store the data into corresponding databases, for example, the search data is stored into the search database, the browsing data is stored into the browsing database, and the booking data is stored into the booking database. Then, the generating device of the point of interest data may read the user behavior data from the database and generate the point of interest data according to the user behavior data, where the point of interest data includes the location coordinates of the point of interest, the score of the point of interest, the distribution of hotels around the point of interest, and the like. The generation device of the point of interest data may automatically trigger generation operation at regular intervals, for example, setting a certain time for executing generation operation every day, every week or every month; the generating operation may also be executed when necessary, and this is not particularly limited in the embodiment of the present application. In addition, the specific implementation of the method for generating the point of interest data will be described in detail in the following method embodiments, and will not be described herein again. Finally, the generating device of the point of interest data can output the generated point of interest data to a background management interface for data display and downloading for reference of the buyer.

It should be noted that the present application is also applicable to industries such as e-commerce, internet, Online Travel Agency (OTA), business Travel Management Companies (TMC), and the like, and when there is historical data of user behavior, the technical solution of the embodiment of the present application may be adopted to achieve the purposes of reducing manual operation and improving user experience. The method for generating point of interest data provided in the embodiment of the present application will be described in detail below.

As shown in fig. 2, a method for generating point of interest data provided in an embodiment of the present application includes the following steps:

s201, the generation device of the point of interest data determines sample data of a plurality of monitoring objects, wherein the sample data of each monitoring object in the plurality of monitoring objects comprises a coordinate position of each monitoring object and a comprehensive score of a user behavior corresponding to each monitoring object.

S202, the generating device of the interest point data calculates pairwise distances between the coordinate positions of every two monitoring objects in the multiple monitoring objects to obtain a distance matrix.

S203, the generating device of the interest point data determines the interest point data corresponding to the monitoring object according to the comprehensive score and the distance matrix.

For the above step S201:

in one possible implementation manner, the composite score of the user behavior corresponding to each monitoring object satisfies the following formula (1):

wherein w _i Representing the weight corresponding to the score of the ith user behavior, n representing the total number of the user behaviors corresponding to each monitored object, score _i Score, representing the ith user behavior for each monitored object _i Satisfies the following formula (2):

wherein s is _i The method comprises the steps of representing the occurrence frequency of the ith user behavior corresponding to each monitoring object, s representing the occurrence frequency of the ith user behavior corresponding to a plurality of monitoring objects, min(s) representing the minimum value of the occurrence frequencies of the ith user behavior corresponding to the plurality of monitoring objects, and max(s) representing the maximum value of the occurrence frequencies of the ith user behavior corresponding to the plurality of monitoring objects.

Illustratively, in the scenario of hotel procurement described above, assume that the employee's user behavior on the travel platform is a search of hotel related pagesSolicit, browse, and subscribe. That is to say, the monitored object is a hotel, the user behavior corresponding to each hotel is search, browse and reservation, the total number of the user behaviors corresponding to each hotel is 3, and then the comprehensive score of the user behavior corresponding to each hotel is w ₁ ×score ₁ +w ₂ ×score ₂ +w ₃ ×score ₃ Wherein, score ₁ May represent a search score, score ₂ May represent a browsing score, score ₃ Can represent a subscription score, w ₁ 、w ₂ And w ₃ Weights, w, corresponding to the search score, browsing score and subscription score, respectively ₁ ×score ₁ Is a search weighted score, w ₂ ×score ₂ Is a browsing weighted score, w ₃ ×score ₃ Is a subscription weighting score.

Optionally, in consideration of the actual user behavior data of the employee on the travel platform from the data source, that is, the original data is the number of searches, the number of browses, and the amount of booked orders corresponding to each hotel, and the units and the magnitude of the original data are different, so that the original data may be unified as a score as a measurement standard. Taking browsing this user behavior as an example, the browsing score may satisfy the following formula (3):

wherein s is ₂ Denotes the number of times each hotel is viewed, s denotes the number of times a plurality of hotels are viewed, min(s) denotes the minimum value of the number of times a plurality of hotels are viewed, and max(s) denotes the maximum value of the number of times a plurality of hotels are viewed. The formula shows that the score of the user behavior has a value range of 0.01 to 10]A higher score indicates a more significant user behavior, that is, 0.01 indicates the slightest user behavior and 10 characterizes the most significant user behavior. In addition, the score of the user behavior may also be 0, which indicates that the user behavior does not exist, for example, when no user performs a search operation on a page related to a hotel, the search score corresponding to the hotel is0. Therefore, the value range of the score of the user behavior can be expanded to [0, 10]]。

Exemplarily, in w ₁ 、w ₂ And w ₃ The values of (a) are 0.3, 0.6 and 0.1 respectively as examples, and a plurality of hotel sample data can be shown in table 1. Wherein, x in the city number and the hotel number represents any one of 0-9, and each hotel only corresponds to one sample data.

TABLE 1

In summary, since the units and the magnitude of different user behaviors are different, for example, the browsing unit is the number of times, the reserved unit is the order amount, and the browsing times of the hotel related pages are often much greater than the order amount, the score of each user behavior corresponding to each monitoring object is normalized to the interval of [0, 10], which is beneficial to unifying the measurement standard of each user behavior. In addition, the comprehensive score of the user behavior corresponding to each monitoring object is the weighted sum of the scores of the user behaviors corresponding to the monitoring objects, so that the comprehensive score can reflect the significance degree of the user behavior more comprehensively and accurately.

For the above step S202:

in one possible implementation, as shown in FIG. 3, step S202 includes steps S202a-S202c, as described below.

S202a, the generating device of the point of interest data initializes the distance matrix of the plurality of monitoring objects.

For example, taking the sample data of 9 hotels in table 1 as an example, the size of the distance matrix corresponding to the 9 hotels is 9 × 9, and at this time, the generation apparatus of the point of interest data performs an initialization operation on the distance matrix of 9 × 9.

S202b, the generating device of the point of interest data calls a distance algorithm to calculate the pairwise distance between the coordinate positions of every two monitoring objects.

Optionally, the distance algorithm in the embodiment of the present application may be a spherical distance algorithm, and is configured to calculate a spherical distance between coordinate positions of any two monitoring objects; alternatively, the distance algorithm in the embodiment of the present application may also be a linear distance algorithm or other distance algorithms, which is not limited in the embodiment of the present application.

S202c, the generating device of the interest point data fills the distance between the coordinate positions of every two monitoring objects into the corresponding rows and columns of the matrix to obtain a distance matrix.

For step S203 described above:

in one possible implementation, as shown in FIG. 4, step S202 includes steps S203a-S203c, as described below.

S203a, the generating device of the point of interest data determines first parameters corresponding to the sample data of the multiple monitoring objects. The first parameter is a reference value required when the point of interest data corresponding to the monitored object is determined.

Illustratively, the first parameter includes at least one of: the method comprises the steps of presetting a radius distance, a cumulative probability distribution threshold value, or the mean and the variance of the comprehensive scores of user behaviors corresponding to a plurality of monitoring objects.

S203b, the generating device of the point of interest data determines a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs. The second parameter is used for representing the attention degree of the user to the region to which the coordinate position of each monitoring object belongs.

Specifically, since the attention degrees of the users to different regions are very different, for example, the attention degree of the users may be higher for a large city and lower for a medium-small city, and for example, in the aforementioned hotel purchasing scenario, the attention degree of the employee may be higher for a city where the company premises is located or a city where the company business is concentrated and lower for a city where the non-company premises is located or a city where the company business is less, therefore, different values of the second parameter need to be determined for different regions. Optionally, the second parameter value is smaller for a region with a higher user attention degree, and conversely, the second parameter value is larger for a region with a lower user attention degree.

Alternatively, in the embodiment of the present application, the step S203b may be implemented by invoking the following step S203f to calculate suitable second parameters for different regions. The specific implementation details are described in step S203f, and are not described herein again.

S203c, the generating device of the interest point data determines the interest point data corresponding to the monitoring object according to the distance matrix, the comprehensive score, the first parameter and the second parameter.

In one possible implementation manner, sample data of any one of the multiple monitored objects is processed in the manner of steps S203d-S203i as shown in fig. 5, where the sample data of the first monitored object is processed, and then step S203j is executed to generate point of interest data corresponding to the monitored object.

S203d, the generating device of the interest point data generates a neighborhood sample of the sample data of the first monitoring object according to the distance matrix and the preset radius distance. The neighborhood samples are sample data of other monitoring objects, wherein the distance between the coordinate position of the monitoring object in the multiple monitoring objects and the coordinate position of the first monitoring object is smaller than the preset radius distance.

S203e, the generating device of the interest point data determines the cumulative probability distribution of the neighborhood samples.

In the embodiment of the present application, the cumulative probability distribution of the neighborhood samples satisfies the following formula (4):

wherein a random variable x represents the composite score and the distribution of x is assumed to follow a gaussian distribution; the parameter mu of Gaussian distribution is the mean value of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs, and the parameter sigma of Gaussian distribution ² The variance of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs; the upper integration limit scoresum represents the sum of the composite scores of the neighborhood samples of the sample data of the first monitored object.

S203f, the generating device of the point of interest data determines the quantity threshold of the neighborhood samples according to the comprehensive score of the user behavior corresponding to the first monitoring object and the second parameter.

In the embodiment of the present application, the threshold of the number of neighborhood samples is used to characterize how many neighborhood samples the first monitoring object needs to have in its neighborhood to generate Po i under the condition of determining sample data. The value of the number threshold of the neighborhood samples satisfies the following formula (5):

wherein round (m,1) represents that one decimal is reserved for m, C represents a preset height factor, gamma represents a preset offset factor, k represents a second parameter, and x represents a comprehensive score of the user behavior corresponding to the first monitoring object.

Fig. 6 shows, by way of example, the function curve of equation (5) when C is 14, γ is 1, and k is 14.72. Thus, x and y are inversely proportional. Specifically, when the value of x is large, the value of y tends to 1, which means that when the composite score of the user behavior corresponding to the first monitoring object is high (i.e., the user behavior is significant), Poi can be generated by collecting only a small number of neighborhood samples in the neighborhood, whereas when the value of x is small, the value of y tends to increase, which means that when the composite score of the user behavior corresponding to the first monitoring object is low (i.e., the user behavior is slight), Poi can be generated by collecting more neighborhood samples in the neighborhood.

Optionally, in this embodiment of the application, step S203b may include: and determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs according to the quantity threshold of the neighborhood samples.

Exemplarily, fig. 7 shows a functional relationship curve between the threshold value y of the number of neighborhood samples and the composite score x of the user behavior corresponding to the first monitoring object when the value of the second parameter is 17.53, 14.72, and 8.99, respectively. Therefore, when the value of the second parameter is large (that is, the user attention degree of the area corresponding to the second parameter is low), the gradient of the functional relation curve is steep, and when the value of the second parameter is small (that is, the user attention degree of the area corresponding to the second parameter is high), the gradient of the functional relation curve is relatively gentle. In other words, for the same value of x, the value of y corresponding to the region with lower user attention is smaller, and the value of y corresponding to the region with higher user attention is larger, that is, for the same user behavior, in the region with lower user attention, only a small number of neighborhood samples need to be gathered to generate Poi, and in the region with higher user attention, more neighborhood samples need to be gathered to help generate Poi.

As described above, the technical solution provided by the embodiment of the present application can ensure that Poi can be clustered in the area with the low user attention degree, and can effectively avoid the situation of Poi flooding in the area with the high user attention degree. In the traditional clustering algorithm, the value of the second parameter cannot be adjusted according to different regions, so that the clustered Poi is not accurate and reasonable enough, and the situation that Poi cannot be clustered in a region with low user attention degree or the situation that Poi with high user attention degree is overflowed is easy to occur.

In the embodiment of the present application, as described above, the value of the second parameter may vary spatially according to different regions. In addition, since the value of the second parameter is determined by the number threshold of the neighborhood samples, and the value of the number threshold of the neighborhood samples is related to the composite score of the user behavior, the value of the second parameter also changes with the change of the user behavior in time. In summary, the value of the second parameter can dynamically adapt to the change of the data sample in both space and time. Since the change of the second parameter threshold value causes the change of the number threshold value of the neighborhood samples, the value of the number threshold value of the neighborhood samples can also dynamically adapt to the change of the data samples.

As described above, in the implementation of the present application, the value of the number threshold of the neighborhood samples can dynamically adapt to the change of the data samples, whereas in the conventional clustering algorithm such as DBSCAN, the number threshold of the neighborhood samples is a preset fixed value and cannot be adaptively adjusted according to the difference of regions and the change of user behaviors, which finally results in inaccurate and unreasonable clustered Poi.

S203g, the point-of-interest data generating means judges whether or not the condition: the cumulative probability distribution of the neighborhood samples is not less than the cumulative probability distribution threshold and the number of neighborhood samples is not less than the number of neighborhood samples threshold. If so, the following step S203h is performed, and if not, the following step S203i is performed.

In the embodiment of the application, the spatial distance between the data samples and the significance degree of the user behavior are simultaneously considered. The spatial distance aspect is controlled by the number threshold of the neighborhood samples, and the significance degree aspect of the user behavior is controlled by the cumulative probability distribution threshold.

S203h, the generating device of the interest point data marks the sample data of the first monitoring object with a cluster mark, and the previous steps S203d-S203g are repeatedly executed on the sample data of other monitoring objects in the neighborhood sample to expand the neighborhood.

In the embodiment of the application, a plurality of clusters can be automatically clustered, and in a traditional clustering algorithm such as Kmeans, the number of clusters to be clustered needs to be manually preset, so that the technical scheme provided by the application can reduce manual operation.

S203i, the generating device of the point of interest data marks the sample data of the first monitoring object as noise.

S203j, after the sample data of each of the plurality of monitoring objects is marked, the device for generating point of interest data determines the sample data of the monitoring objects belonging to the same cluster as the point of interest data corresponding to the monitoring object.

Alternatively, the position coordinates of the interest point may be an average value of the position coordinates of the sample data of the monitoring objects belonging to the same cluster, and the score of the interest point may be an average value of the scores of the sample data of the monitoring objects belonging to the same cluster; alternatively, the score of the interest point may be a highest value of the scores of the sample data of the monitoring objects belonging to the same cluster, and the position coordinate of the interest point may be a position coordinate of the sample data corresponding to the highest value of the scores, which is not limited in this embodiment of the present application. The score may be a score of any user behavior, or may be a comprehensive score, which is not specifically limited in this embodiment of the present application.

After step S203, S203c, or S203j, if the point of interest data corresponding to the monitoring object includes multiple point of interest data, the method for generating point of interest data provided in this embodiment of the present application may further include the following step S204:

s204, the generating device of the interest point data determines the distance between the coordinate positions corresponding to any two interest point data in the interest point data; and if the distance between the coordinate positions corresponding to any two interest point data in the interest point data is smaller than a first threshold value, deleting the interest point data with lower comprehensive score in any two interest point data.

In the embodiment of the application, the point of interest data with lower comprehensive score in any two point of interest data with the distance between the coordinate positions corresponding to any two point of interest data in the plurality of point of interest data smaller than the first threshold value is deleted, so that unimportant redundant data can be eliminated, and the technical effect of optimizing the generated plurality of point of interest data is achieved.

In summary, with reference to the method for generating point of interest data described in fig. 1 and fig. 2 to fig. 5, for example, as shown in fig. 8, in the embodiment of the present application, the generating device of point of interest data may read user behavior data from a database (including a search database, a browsing database, and a predetermined database), and further, the generating device of point of interest data calculates a distance matrix according to a distance algorithm and the user behavior data (corresponding to step S202 described above). After the generating means of the point of interest data obtains the distance matrix, the generating means of the point of interest data may generate the point of interest data (corresponding to the above step S203b) by taking the calculated second parameter (corresponding to the above step S203b), the neighborhood samples and their cumulative probability distribution (corresponding to the above steps S203d and S203e), the number threshold of the neighborhood samples (corresponding to the above step S203f) and the expanded neighborhood (corresponding to the above step S203h) as inputs, wherein the generating means of the point of interest data may calculate the second parameter according to the number threshold of the neighborhood samples, that is, the output of the step S203f may be taken as the input of the step S203 b. The generating means of the point of interest data may filter the point of interest data after the generating means of the point of interest data generates the point of interest data. And finally, the generating device of the interest point data outputs the position coordinates of the interest points, the scores of the interest points and the hotel distribution situation around the interest points.

In the embodiment of the application, the generation device of the point of interest data determines the point of interest data corresponding to the monitoring object according to the comprehensive score of the user behavior corresponding to each monitoring object and the distance matrix calculated by the coordinate position of each monitoring object. That is to say, in the process of clustering to generate the point-of-interest data, unlike traditional clustering algorithms such as DBSCAN and Kmeans, the method for generating the point-of-interest data provided in the embodiment of the present application not only considers the spatial distance between data samples, but also considers the significance of user behaviors, so that the region in which the user is interested can be more accurately located.

Exemplarily, fig. 9, fig. 10, and fig. 11 show the distribution of the point of interest data obtained by the generation method of the point of interest data provided in this embodiment in beijing, shenzhen, and hangzhou, respectively.

Optionally, in this embodiment of the application, a coefficient Φ shown in the following formula (6) may be used as an evaluation index of a clustering algorithm to measure how well a data sample is clustered, where a higher value indicates a better clustering effect.

Wherein, the first and the second end of the pipe are connected with each other,

the method is a contour coefficient in the prior art and is used for measuring the quality of clustering of data samples in space. i denotes the number of each data sample, and for data sample i, a (i) denotes the average of the distances between the position coordinate of data sample i and the position coordinates of the other data samples in the cluster to which it belongs, and b (i) denotes the position coordinate of data sample i and each cluster adjacent to the cluster to which it belongsThe minimum of the average of the distances between the position coordinates of all the data samples in (a). max { a (i), b (i) } denotes the maximum value of a (i) and b (i).

Denotes an average value of values of ζ (i) corresponding to all i, and ζ (i) may be expressed in formula (6)

The value range of the contour coefficient is [ -1,1]。

The coefficient provided by the embodiment of the application is used for measuring the clustering quality of the data samples on the comprehensive score. Wherein j represents the number of each cluster, c (j) represents the average value of the composite scores of all the monitoring objects in the cluster j, c (j) can be obtained by dividing the sum of the composite scores of all the monitoring objects in the cluster j by the number of the monitoring objects for generating the cluster j, as mentioned above, c (j) can also be used as the composite score of the interest point, and μ represents the average value of the composite scores of all the sample data in the region to which the coordinate position of the cluster j belongs. If c (j) > μ is satisfied, then I (c (j) > μ) ═ 1, otherwise, I (c) (j) > μ) ═ 0.

Represents the average of values of θ (j) corresponding to all j, and θ (j) may be I (c) (j) > μ in formula (6).

Has a value range of [0,1 ]]The value range of the contour coefficient is [ -1,1 [ ]]The value range of the coefficient phi can be obtained as [0,1 ]]. The following description takes part of the interest point data of the same city in the hotel purchase scenario shown in Table 2 as an example

The specific calculation process of (2) is as follows.

TABLE 2

As shown in table 2, taking cluster number 25 as an example, the sum of the combined scores of all hotels in the cluster corresponding to cluster number 25 is 10.32, which can be obtained by adding the sum of the search weighted scores, the sum of the browsing weighted scores and the sum of the booking weighted scores of all hotels in the cluster corresponding to cluster number 25, and the number of hotels generating the cluster corresponding to cluster number 25 is 5, so that the average combined score c (j) of each hotel can be obtained as 10.32/5 as 2.06. Assuming that the average value μ of the composite scores of all hotels in the city is 0.34, c (j) > μ is satisfied, and I (c (j) > μ) ═ 1. Further, taking cluster number 35 as an example, c (j) ═ 0.1, c (j) > μ, and I (c (j) > μ) can be obtained according to the above method. Similarly, for each of the 24 clusters in Table 2, the corresponding value of I (c (j) > μ) was determined, and finally,

the number of clusters of 1/total number of clusters is 10/24 ≈ 0.42. Assuming that the contour coefficient is-0.5 at this time, the coefficient Φ of the city is (-0.5+0.42+1)/3 ≈ 0.31.

From the above

The physical meaning of the calculation process is that the average value of the comprehensive scores of all the monitored objects in a cluster is higher than the average value of the comprehensive scores of all the sample data in the area to which the monitored objects belong, and a generated point of interest is more remarkable and needs to attract attention of a buyer. However, traditional clustering algorithms such as Kmeans, GMM, DBSCAN, etc. cannot embody information related to the comprehensive score of the monitored object. Exemplarily at randomIn six cities of different levels, values of coefficient phi obtained by using a traditional clustering algorithm and the method for generating the point of interest data provided by the embodiment of the application are shown in the following table 3, so that the clustering effect of the embodiment of the application is better than that of the Kmeans, the GMM and the DBSCAN algorithms.

TABLE 3

City	Kmeans	GMM	DBSCAN	The embodiments of the present application
					Front line city 1	0.5708	0.5217	0.3814	0.6350
Front line city 2	0.5794	0.5148	0.4001	0.7271
					Two-line city 1	0.6392	0.5815	0.4271	0.7579
Two-line city 2	0.5828	0.5584	0.5237	0.7803
					Three-wire city 1	0.6213	0.5916	0.4959	0.7611
Three-wire city 2	0.5503	0.4659	0.4986	0.7946

It is to be understood that, in the above embodiments, the method and/or the steps implemented by the apparatus for generating point of interest data may also be implemented by a component (e.g., a chip or a circuit) of the apparatus for generating point of interest data or a device containing the apparatus for generating point of interest data.

It is understood that the generating device of the point of interest data includes hardware structures and/or software modules for executing the functions in order to realize the functions. Those of skill in the art would readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed in hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiment of the present application, the generating device of the point of interest data may be divided into functional modules according to the method embodiment, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and another division manner may be available in actual implementation.

For example, fig. 12 shows a schematic structural diagram of the point-of-interest data generating apparatus 120. The generating apparatus 120 of the point of interest data includes a processing module 1201. Wherein:

the processing module 1201 is configured to determine sample data of a plurality of monitoring objects, where the sample data of each monitoring object in the plurality of monitoring objects includes a coordinate position of each monitoring object and a comprehensive score of a user behavior corresponding to each monitoring object. The processing module 1201 is further configured to calculate a pairwise distance between coordinate positions of every two monitoring objects in the multiple monitoring objects, so as to obtain a distance matrix. The processing module 1201 is further configured to determine the point of interest data corresponding to the monitoring object according to the comprehensive score and the distance matrix.

In a possible implementation manner, the processing module 1201 is further configured to determine, according to the composite score and the distance matrix, point of interest data corresponding to the monitored object, including: the method comprises the steps of determining a first parameter corresponding to sample data of a plurality of monitored objects, wherein the first parameter is a reference value required when point of interest data corresponding to the monitored objects is determined. And determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs, wherein the second parameter is used for representing the attention degree of the user to the region to which the coordinate position of each monitoring object belongs. And determining the point of interest data corresponding to the monitoring object according to the distance matrix, the comprehensive score, the first parameter and the second parameter.

In a possible implementation manner, the first parameter includes a preset radius distance and a cumulative probability distribution threshold, and the processing module 1201 is further configured to determine point of interest data corresponding to the monitoring object according to the distance matrix, the composite score, the first parameter, and the second parameter, including: the method is used for processing the sample data of any one of the monitoring objects according to the following mode for processing the sample data of the first monitoring object: and generating a neighborhood sample of the sample data of the first monitoring object according to the distance matrix and the preset radius distance, wherein the neighborhood sample is the sample data of other monitoring objects of which the distances between the coordinate position of the monitoring object in the multiple monitoring objects and the coordinate position of the first monitoring object are smaller than the preset radius distance. A cumulative probability distribution of the neighborhood samples is determined. And determining the quantity threshold of the neighborhood samples according to the comprehensive score of the user behavior corresponding to the first monitoring object and the second parameter. The processing module 1201 is further configured to, if the cumulative probability distribution of the neighborhood samples is not less than the cumulative probability distribution threshold and the number of the neighborhood samples is not less than the number threshold of the neighborhood samples, mark a cluster on the sample data of the first monitored object, and perform the above operation on the sample data of other monitored objects in the neighborhood samples; otherwise, marking the sample data of the first monitoring object as noise. After the sample data of each monitoring object in the multiple monitoring objects is marked, the sample data of the monitoring objects belonging to the same cluster is determined as the point of interest data corresponding to the monitoring objects.

In a possible implementation manner, the processing module 1201 is further configured to determine a second parameter corresponding to an area to which a coordinate position of each monitored object belongs, including: and determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs according to the quantity threshold of the neighborhood samples.

In a possible implementation manner, the point of interest data corresponding to the monitoring object includes multiple point of interest data, and the processing module 1201 is further configured to, after determining the point of interest data corresponding to the monitoring object, further include: the method is used for determining the distance between the coordinate positions corresponding to any two interest point data in the plurality of interest point data. And if the distance between the coordinate positions corresponding to any two interest point data in the interest point data is smaller than a first threshold value, deleting the interest point data with lower comprehensive score in any two interest point data.

All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

In the present embodiment, the generation means 120 of the point of interest data is presented in a form in which the respective functional modules are divided in an integrated manner. A "module" herein may refer to a particular ASIC, a circuit, a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other device that provides the described functionality.

For example, fig. 13 shows a schematic structural diagram of another point-of-interest data generating apparatus 130. The device 130 for generating the point of interest data includes one or more processors 131, a communication line 132, and at least one communication interface (fig. 13 is only exemplary and includes a communication interface 134 and a processor 131 for illustration), and optionally may further include a memory 133.

The processor 131 may be a CPU, a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs in accordance with the teachings of the present application.

The communication link 132 may include a path for connecting different components.

Communication interface 134, which may be a transceiver module, is used for communicating with other devices or communication networks, such as ethernet, RAN, Wireless Local Area Networks (WLAN), etc. For example, the transceiver module may be a transceiver, or the like. Optionally, the communication interface 134 may also be a transceiver circuit located in the processor 131, so as to realize signal input and signal output of the processor.

The memory 133 may be a device having a storage function. Such as, but not limited to, a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disk read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disks, laser disks, optical disks, digital versatile disks, blu-ray disks, etc.), magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via a communication line 132. The memory may also be integral to the processor.

The memory 133 is used for storing computer-executable instructions for executing the present application, and is controlled by the processor 131 to execute. The processor 131 is configured to execute computer-executable instructions stored in the memory 133, so as to implement the method for generating point of interest data provided in the embodiment of the present application.

Alternatively, in this embodiment of the present application, the processor 131 may also execute a function related to processing in the method for generating point of interest data provided in the foregoing embodiment of the present application, and the communication interface 134 is responsible for communicating with other devices or a communication network, which is not specifically limited in this embodiment of the present application.

The computer-executable instructions in the embodiments of the present application may also be referred to as application program codes, which are not specifically limited in the embodiments of the present application.

In particular implementations, processor 131 may include one or more CPUs such as CPU0 and CPU1 in fig. 13 as one embodiment.

In particular implementations, generating means 130 for the point of interest data may comprise a plurality of processors, such as processor 131 and processor 137 in fig. 13, as an embodiment. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In a specific implementation, the apparatus for generating point of interest data 130 may further include an output device 135 and an input device 136 as an embodiment. The output device 135 is in communication with the processor 131 and may display information in a variety of ways.

The point of interest data generating device 130 may be a general device or a special device. For example, the point of interest data generating apparatus 130 may be a desktop computer, a portable computer, a network server, a Personal Digital Assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, a vehicle-mounted terminal device, an embedded device, or a device having a similar structure as in fig. 13. The embodiment of the present application does not limit the type of the device 130 for generating point of interest data.

In connection with the above method embodiments, the actions of the generating means of the point of interest data in steps S201 to S204 may be executed by the

processor

131 or 137 in the generating means 130 of the point of interest data shown in fig. 13 calling the application program code stored in the memory 133 to instruct the generating means 130 of the point of interest data, which is not limited in any way by the embodiment.

In a simple embodiment, the generating means 120 of the point of interest data may take the form of the generating means 130 of the point of interest data shown in fig. 13, as will be appreciated by those skilled in the art.

For example, the

processor

131 or 137 in the generation apparatus 130 of point-of-interest data shown in fig. 13 may cause the generation apparatus 130 of point-of-interest data to execute the generation method of point-of-interest data in the above-described method embodiment by calling a computer-executable instruction stored in the memory 133. Specifically, the function/implementation procedure of the processing module 1201 in fig. 12 may be implemented by the

processor

131 or 137 in the point-of-interest data generating apparatus 130 shown in fig. 13 calling a computer execution instruction stored in the memory.

Since the apparatus for generating point of interest data provided in this embodiment can execute the method for generating point of interest data, the method embodiment can be referred to for obtaining technical effects, and details are not repeated here.

It should be noted that one or more of the above modules or units may be implemented in software, hardware or a combination of both. When any of the above modules or units are implemented in software, which is present as computer program instructions and stored in a memory, a processor may be used to execute the program instructions and implement the above method flows. The processor may be built in a SoC (system on chip) or ASIC, or may be a separate semiconductor chip. The processor may further include a necessary hardware accelerator such as a Field Programmable Gate Array (FPGA), a PLD (programmable logic device), or a logic circuit for implementing a dedicated logic operation, in addition to a core for executing software instructions to perform an operation or a process.

When the above modules or units are implemented in hardware, the hardware may be any one or any combination of a CPU, a microprocessor, a Digital Signal Processing (DSP) chip, a Micro Controller Unit (MCU), an artificial intelligence processor, an ASIC, an SoC, an FPGA, a PLD, a dedicated digital circuit, a hardware accelerator, or a discrete device that is not integrated, and may run necessary software or be independent of software to perform the above method flow.

Optionally, an embodiment of the present application further provides a chip system, including: at least one processor coupled with the memory through the interface, and an interface, the at least one processor causing the method of any of the above method embodiments to be performed when the at least one processor executes the computer program or instructions in the memory. In a possible implementation manner, the apparatus for generating point of interest data further includes a memory. Optionally, the chip system may be formed by a chip, and may also include the chip and other discrete devices, which is not specifically limited in this embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or can comprise one or more data storage devices, such as a server, a data center, etc., that can be integrated with the medium. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

While the present application has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Although the present application has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely illustrative of the present application as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the application. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for generating point of interest data, comprising:

determining sample data of a plurality of monitoring objects, wherein the sample data of each monitoring object in the plurality of monitoring objects comprises a coordinate position of each monitoring object and a comprehensive score of a user behavior corresponding to each monitoring object;

calculating the distance between every two coordinate positions of every two monitoring objects in the plurality of monitoring objects to obtain a distance matrix;

and determining the point of interest data corresponding to the monitoring object according to the comprehensive score and the distance matrix.

2. The method of claim 1, wherein determining the point of interest data corresponding to the monitored object according to the composite score and the distance matrix comprises:

determining a first parameter corresponding to sample data of the multiple monitored objects, wherein the first parameter is a reference value required when point of interest data corresponding to the monitored objects is determined;

determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs, wherein the second parameter is used for representing the attention degree of a user to the region to which the coordinate position of each monitoring object belongs;

and determining the point of interest data corresponding to the monitoring object according to the distance matrix, the comprehensive score, the first parameter and the second parameter.

3. The method of claim 2, wherein the first parameter comprises a preset radius distance and a cumulative probability distribution threshold; the determining the point of interest data corresponding to the monitoring object according to the distance matrix, the comprehensive score, the first parameter and the second parameter comprises:

and for sample data of any one of the plurality of monitoring objects, processing the sample data of the first monitoring object according to the following mode:

generating a neighborhood sample of the sample data of the first monitoring object according to the distance matrix and the preset radius distance, wherein the neighborhood sample is the sample data of other monitoring objects of which the distance between the coordinate position of the monitoring object in the plurality of monitoring objects and the coordinate position of the first monitoring object is smaller than the preset radius distance;

determining a cumulative probability distribution of the neighborhood samples;

determining a quantity threshold of the neighborhood samples according to the comprehensive score of the user behavior corresponding to the first monitoring object and the second parameter;

if the cumulative probability distribution of the neighborhood samples is not less than the cumulative probability distribution threshold and the number of the neighborhood samples is not less than the number threshold of the neighborhood samples, marking a cluster on the sample data of the first monitored object, and executing the operation on the sample data of the other monitored objects in the neighborhood samples; otherwise, marking the sample data of the first monitoring object as noise;

after the sample data of each monitoring object in the multiple monitoring objects is marked, determining the sample data of the monitoring objects belonging to the same cluster as the point of interest data corresponding to the monitoring objects.

4. The method of claim 3, wherein the determining a second parameter corresponding to a region to which the coordinate position of each monitoring object belongs comprises:

and determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs according to the quantity threshold of the neighborhood samples.

5. The method of claim 3 or 4, wherein the cumulative probability distribution of the neighborhood samples satisfies a first formula:

wherein a random variable x represents the composite score and the distribution of x is assumed to obey a gaussian distribution; the parameter mu of the Gaussian distribution is the mean value of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs, and the parameter sigma of the Gaussian distribution ² The variance of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs; the upper integration limit scoresum represents a sum of the integrated scores of the neighborhood samples of the sample data of the first monitored object.

6. The method of claim 3 or 4, wherein the neighborhood samples number threshold satisfies the following second formula:

wherein round (m,1) represents that one decimal is reserved for m, C represents a preset height factor, γ represents a preset offset factor, k represents the second parameter, and x represents a comprehensive score of the user behavior corresponding to the first monitoring object.

7. The method according to any one of claims 1 to 6, wherein the composite score of the user behavior corresponding to each monitoring object satisfies the following third formula:

wherein, score _i A score, w, representing the ith user behavior corresponding to each of the monitored objects _i And representing the weight corresponding to the score of the ith user behavior, wherein n represents the total number of the user behaviors corresponding to each monitoring object.

8. The method according to claim 7, wherein the score of the ith user behavior corresponding to each monitoring object satisfies the following fourth formula:

wherein s is _i The frequency of occurrence of the ith user behavior corresponding to each monitoring object is represented, s represents the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects, min(s) represents the minimum value of the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects, and max(s) represents the maximum value of the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects.

9. The method according to any one of claims 1 to 8, wherein the point of interest data corresponding to the monitoring object comprises a plurality of point of interest data; after the point of interest data corresponding to the monitoring object is determined, the method further includes:

determining the distance between the coordinate positions corresponding to any two points of interest data in the plurality of points of interest data;

and if the distance between the coordinate positions corresponding to any two interest point data in the interest point data is smaller than a first threshold value, deleting the interest point data with lower comprehensive score in the interest point data.

10. An apparatus for generating point of interest data, comprising: a processing module;

the processing module is used for determining sample data of a plurality of monitoring objects, wherein the sample data of each monitoring object in the plurality of monitoring objects comprises a coordinate position of each monitoring object and a comprehensive score of a user behavior corresponding to each monitoring object;

the processing module is further configured to calculate pairwise distances between coordinate positions of every two monitoring objects in the multiple monitoring objects to obtain a distance matrix;

the processing module is further used for determining the point of interest data corresponding to the monitoring object according to the comprehensive score and the distance matrix.

11. The apparatus of claim 10, wherein the processing module is further configured to determine point of interest data corresponding to the monitored object according to the composite score and the distance matrix, and includes:

the first parameter is used for determining a first parameter corresponding to sample data of the plurality of monitoring objects, and the first parameter is a reference value required when point of interest data corresponding to the monitoring objects is determined; determining a second parameter corresponding to the region to which the coordinate position of each monitoring object belongs, wherein the second parameter is used for representing the attention degree of a user to the region to which the coordinate position of each monitoring object belongs; and determining the point of interest data corresponding to the monitoring object according to the distance matrix, the comprehensive score, the first parameter and the second parameter.

12. The apparatus of claim 11, wherein the first parameter comprises a preset radius distance and a cumulative probability distribution threshold; the processing module is further configured to determine, according to the distance matrix, the composite score, the first parameter, and the second parameter, point of interest data corresponding to the monitored object, and includes:

the method comprises the following steps of processing sample data of any one of the plurality of monitoring objects according to the following mode for the sample data of a first monitoring object: generating a neighborhood sample of the sample data of the first monitoring object according to the distance matrix and the preset radius distance, wherein the neighborhood sample is the sample data of other monitoring objects of which the distance between the coordinate position of the monitoring object in the plurality of monitoring objects and the coordinate position of the first monitoring object is smaller than the preset radius distance; determining a cumulative probability distribution of the neighborhood samples; determining a quantity threshold of the neighborhood samples according to the comprehensive score of the user behavior corresponding to the first monitoring object and the second parameter; if the cumulative probability distribution of the neighborhood samples is not less than the cumulative probability distribution threshold value and the quantity of the neighborhood samples is not less than the quantity threshold value of the neighborhood samples, marking cluster marks on the sample data of the first monitored object, and executing the operation on the sample data of the other monitored objects in the neighborhood samples; otherwise, marking the sample data of the first monitoring object as noise; and after the sample data of each monitoring object in the plurality of monitoring objects is marked, determining the sample data of the monitoring objects belonging to the same cluster as the point of interest data corresponding to the monitoring object.

13. The apparatus of claim 12, wherein the processing module is further configured to determine a second parameter corresponding to an area to which the coordinate position of each of the monitoring objects belongs, and the second parameter includes:

14. The apparatus of claim 12 or 13, wherein the cumulative probability distribution of the neighborhood samples satisfies a first formula as follows:

wherein a random variable x represents the composite score and the distribution of x is assumed to obey a gaussian distribution; the parameter mu of the Gaussian distribution is the mean value of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs, and the parameter sigma of the Gaussian distribution ² The variance of the comprehensive scores of all sample data in the region to which the coordinate position of the first monitoring object belongs; is integrated withThe limit scoresum represents a sum of composite scores of neighborhood samples of the sample data of the first monitored object.

15. The apparatus of claim 12 or 13, wherein the neighborhood samples number threshold satisfies the following second formula:

16. The apparatus according to any one of claims 10-15, wherein the composite score of the user behavior corresponding to each monitoring object satisfies the following third formula:

17. The apparatus of claim 16, wherein the score of the ith user behavior corresponding to each monitored object satisfies the following fourth formula:

wherein s is _i Indicating the frequency of occurrence of the ith user behavior corresponding to each monitored object,s represents the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects, min(s) represents the minimum value of the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects, and max(s) represents the maximum value of the frequency of occurrence of the ith user behavior corresponding to the multiple monitoring objects.

18. The device according to any one of claims 10-17, wherein the point of interest data corresponding to the monitoring object comprises a plurality of point of interest data; the processing module is further configured to, after the determining of the point of interest data corresponding to the monitored object, further include:

the distance between the coordinate positions corresponding to any two interest point data in the plurality of interest point data is determined; and if the distance between the coordinate positions corresponding to any two interest point data in the interest point data is smaller than a first threshold value, deleting the interest point data with lower comprehensive score in the interest point data.