CN108108455B - Destination pushing method and device, storage medium and electronic equipment - Google Patents

Destination pushing method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN108108455B
CN108108455B CN201711461519.XA CN201711461519A CN108108455B CN 108108455 B CN108108455 B CN 108108455B CN 201711461519 A CN201711461519 A CN 201711461519A CN 108108455 B CN108108455 B CN 108108455B
Authority
CN
China
Prior art keywords
sample set
destination
sample
classification
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711461519.XA
Other languages
Chinese (zh)
Other versions
CN108108455A (en
Inventor
陈岩
刘耀勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201711461519.XA priority Critical patent/CN108108455B/en
Publication of CN108108455A publication Critical patent/CN108108455A/en
Application granted granted Critical
Publication of CN108108455B publication Critical patent/CN108108455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a destination pushing method, a destination pushing device, a storage medium and electronic equipment, wherein the destination pushing method comprises the steps of collecting multidimensional characteristics corresponding to a destination as a sample when a user is detected to determine the destination, and constructing a sample set corresponding to the destination; carrying out sample classification on the sample set according to the information gain of the sample classification of the characteristics to construct a decision tree model of a destination, wherein the output of the decision tree model is the corresponding destination; when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples; and predicting a corresponding destination according to the prediction sample and the decision tree model. Therefore, automatic pushing of the destination is achieved, and the pushing accuracy of the destination is improved.

Description

Destination pushing method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of communications technologies, and in particular, to a destination push method and apparatus, a storage medium, and an electronic device.
Background
At present, with the rapid development of terminal technology, for example, smart phones are more and more deeply inserted into lives of people, users often install a large number of applications on smart phones, such as chat applications, game applications, map applications, and the like.
When a user opens a map application, the user often needs to manually search or search for a destination, which wastes time of the user, and the operation process is relatively complicated.
Disclosure of Invention
In view of this, embodiments of the present application provide a method and an apparatus for pushing a destination, a storage medium, and an electronic device, which can improve the accuracy of pushing the destination.
In a first aspect, an embodiment of the present application provides a destination push method, including:
when a destination is determined by a user, collecting multidimensional characteristics corresponding to the destination as samples, and constructing a sample set corresponding to the destination;
carrying out sample classification on the sample set according to the information gain of the characteristic on the sample classification to construct a decision tree model of the destination, wherein the output of the decision tree model is the corresponding destination;
when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples;
and predicting a corresponding destination according to the prediction sample and the decision tree model.
In a second aspect, an embodiment of the present application provides a destination push apparatus, including:
the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring multidimensional characteristics corresponding to a destination as samples and constructing a sample set corresponding to the destination when a user is detected to determine the destination;
the construction unit is used for carrying out sample classification on the sample set according to the information gain of the sample classification of the features so as to construct a decision tree model of the destination, and the output of the decision tree model is the corresponding destination;
the second acquisition unit is used for acquiring the current corresponding multi-dimensional characteristics as a prediction sample when the map application is detected to be opened by the user;
and the prediction unit is used for predicting a corresponding destination according to the prediction sample and the decision tree model.
In a third aspect, a storage medium is provided in this application, and a computer program is stored thereon, and when the computer program runs on a computer, the computer is caused to execute the method for pushing a destination as provided in any of the embodiments of this application.
In a fourth aspect, an electronic device provided in this embodiment of the present application includes a processor and a memory, where the memory has a computer program, and the processor is configured to execute the method for pushing a destination provided in any embodiment of the present application by calling the computer program.
According to the method and the device, when the user is detected to determine the destination, the multi-dimensional features corresponding to the destination are collected to be used as samples, and a sample set corresponding to the destination is constructed; carrying out sample classification on the sample set according to the information gain of the sample classification of the characteristics to construct a decision tree model of a destination, wherein the output of the decision tree model is the corresponding destination; when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples; and predicting a corresponding destination according to the prediction sample and the decision tree model. Therefore, automatic pushing of the destination is achieved, and the pushing accuracy of the destination is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic application scenario diagram of a destination push method according to an embodiment of the present application.
Fig. 2 is a flowchart of a method for pushing a destination according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a decision tree according to an embodiment of the present application.
Fig. 4 is a schematic diagram of another decision tree provided in an embodiment of the present application.
Fig. 5 is another flowchart of a pushing method for a destination according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of a pushing device of a destination according to an embodiment of the present application.
Fig. 7 is another schematic structural diagram of a pushing device of a destination according to an embodiment of the present application.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Fig. 9 is another schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.
In the description that follows, specific embodiments of the present application will be described with reference to steps and symbols executed by one or more computers, unless otherwise indicated. Accordingly, these steps and operations will be referred to, several times, as being performed by a computer, the computer performing operations involving a processing unit of the computer in electronic signals representing data in a structured form. This operation transforms the data or maintains it at locations in the computer's memory system, which may be reconfigured or otherwise altered in a manner well known to those skilled in the art. The data maintains a data structure that is a physical location of the memory that has particular characteristics defined by the data format. However, while the principles of the application have been described in language specific to above, it is not intended to be limited to the specific form set forth herein, and it will be recognized by those of ordinary skill in the art that various of the steps and operations described below may be implemented in hardware.
The term module, as used herein, may be considered a software object executing on the computing system. The various components, modules, engines, and services described herein may be viewed as objects implemented on the computing system. The apparatus and method described herein may be implemented in software, but may also be implemented in hardware, and are within the scope of the present application.
The terms "first", "second", and "third", etc. in this application are used to distinguish between different objects and not to describe a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules listed, but rather, some embodiments may include other steps or modules not listed or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The embodiment of the application provides a destination push method, and an execution subject of the destination push method may be a destination push device provided in the embodiment of the application, or an electronic device integrated with the destination push device, where the destination push device may be implemented in a hardware or software manner. The electronic device may be a smart phone, a tablet computer, a palm computer, a notebook computer, or a desktop computer.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a destination push method provided in an embodiment of the present application, taking a destination push device integrated in an electronic device as an example, when the electronic device detects that a user determines a destination, acquiring a multidimensional feature corresponding to the destination as a sample, and constructing a sample set corresponding to the destination; carrying out sample classification on the sample set according to the information gain of the sample classification of the characteristics so as to construct a decision tree model of a destination; when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples; and predicting a corresponding destination according to the prediction sample and the decision tree model.
Specifically, for example, as shown in fig. 1, in a historical time period, when the electronic device detects that a user determines a destination, multi-dimensional features (weather features, time period features, initial features, and the like) corresponding to the destination may be collected as samples, and a sample set corresponding to the destination may be constructed; sample classification is carried out on the sample set according to the information gain of the sample classification of the characteristics (weather characteristics, time period characteristics, initial characteristics and the like) so as to construct a decision tree model of a destination; when the map application a is detected to be opened by a user, collecting the current corresponding multidimensional characteristics (weather characteristics, time period characteristics, initial characteristics and the like) as prediction samples, and predicting the corresponding destinations according to the prediction samples and the decision tree model.
Referring to fig. 2, fig. 2 is a flowchart illustrating a destination push method according to an embodiment of the present disclosure. The specific flow of the destination push method provided by the embodiment of the application may be as follows:
201. when the fact that the user determines the destination is detected, the multi-dimensional features corresponding to the destination are collected to serve as samples, and a sample set corresponding to the destination is constructed.
The destination mentioned in this embodiment is a destination input by opening a map application for a user, and specifically may be an XX building, an XX station, an XX supermarket, or the like, and when the electronic device detects that the user selects a destination, the corresponding multidimensional feature at the time of selecting the destination is collected as a sample.
The multidimensional feature has dimensions with a certain length, and parameters in each dimension correspond to feature information representing a destination, namely the multidimensional feature information is composed of a plurality of features. The plurality of characteristics may include relevant characteristic information when a destination is selected, such as: the current weather state; a current time period; initially information, etc.
Wherein the sample set of destinations may include a plurality of samples, each sample including a multidimensional feature corresponding to a destination. The sample set of destinations may include a plurality of samples of destinations collected over a historical period of time. The historical time period may be, for example, the past 7 days, 14 days, etc. It is understood that the multi-dimensional feature data of the destination collected at one time constitutes one sample, and a plurality of samples constitute a sample set.
After the sample set is constructed, each sample in the sample set may be labeled to obtain a sample label of each sample, and since the implementation is to predict a destination of the map application, the labeled sample labels include various destinations, that is, destinations with different sample categories.
202. And carrying out sample classification on the sample set according to the information gain of the sample classification of the features so as to construct a decision tree model of a destination.
The embodiment of the application can perform sample classification on the sample set based on the information gain of the sample classification of the features so as to construct a decision tree model of a destination. For example, the decision tree model may be constructed based on the ID3 algorithm.
The decision tree is a tree built by means of decision. In machine learning, a decision tree is a predictive model representing a mapping between object attributes and object values, each node representing an object, each diverging path in the tree representing a possible attribute value, and each leaf node corresponding to the value of the object represented by the path traversed from the root node to the leaf node. The decision tree has only a single output, and if there are multiple outputs, separate decision trees can be built to handle the different outputs.
Among them, the ID3(Iterative Dichotomiser 3, Iterative binary tree 3 generation) algorithm is one of decision trees, which is based on the principle of the alcham razor, i.e. it uses as little things as possible to do more things. In information theory, the smaller the desired information, the greater the information gain and thus the higher the purity. The core idea of the ID3 algorithm is to measure the selection of attributes by information gain, and select the attribute with the largest information gain after splitting for splitting. The algorithm traverses the possible decision space using a top-down greedy search.
The information gain is for a feature, that is, looking at a feature t, what the amount of information the system has and does not have is, and the difference between the two is the amount of information the feature brings to the system, that is, the information gain.
The process of classifying the sample set based on the information gain will be described in detail below, for example, the classification process may include the following steps:
generating a corresponding root node, and taking the sample set as node information of the root node;
determining a sample set of the root node as a target sample set to be classified currently;
obtaining information gain of the features in the target sample set for sample set classification;
selecting current division characteristics from the characteristics according to the information gain;
dividing the sample set according to the dividing characteristics to obtain a plurality of sub-sample sets;
removing the division characteristics of the samples in the sub-sample set to obtain a removed sub-sample set
Generating child nodes of the current node, and taking the removed child sample set as node information of the child nodes;
judging whether the child nodes meet preset classification termination conditions or not;
if not, updating the target sample set into the removed sub-sample set, and returning to execute the information gain of the features in the target sample set for sample set classification;
and if so, taking the child nodes as leaf nodes, and setting the output of the leaf nodes according to the types of the samples in the removed child sample set, wherein the types of the samples are corresponding destinations.
The division features are features selected from the features according to information gain of each feature for sample set classification and are used for classifying the sample sets. There are various ways to select the partition features according to the information gain, for example, to improve the accuracy of sample classification, the feature corresponding to the maximum information gain may be selected as the partition feature.
Wherein the sample categories are a plurality of corresponding destination categories.
When the child node meets the preset classification termination condition, the child node can be used as a leaf node, namely, the classification of the sample set of the child node is stopped, and the output of the leaf node can be set based on the class of the samples in the removed child sample set. There are various ways to set the output of the leaf nodes based on the class of the sample. For example, the category with the largest number of samples in the removed sample set may be used as the output of the leaf node.
The preset classification termination condition can be set according to actual requirements, when the child node meets the preset classification termination condition, the current child node is used as a leaf node, and word segmentation classification is stopped on a sample set corresponding to the child node; and when the child node does not meet the preset classification termination condition, continuously classifying the sample set corresponding to the child node. For example, the preset classification termination condition may include: the step of determining whether the child node satisfies the predetermined classification termination condition may include:
judging whether the category number of the removed samples in the sub-sample set corresponding to the sub-node is a preset number or not;
if so, determining that the child node meets a preset classification termination condition;
if not, determining that the child node is not satisfied with the preset classification terminal termination condition.
For example, the preset classification termination condition may include: the number of the types of the samples in the removed sub-sample set corresponding to the sub-node is 1, that is, only one type of sample is in the sample set of the sub-node. At this time, if the child node satisfies the preset classification termination condition, the class of the sample in the child sample set is used as the output of the leaf node. If only samples of category "XX building" are in the removed subset, then "XX building" may be taken as the output of the leaf node.
In an embodiment, in order to improve the decision accuracy of the decision tree model, a gain threshold may be further set; when the maximum information gain is larger than the threshold value, the characteristic corresponding to the information gain is selected as the dividing characteristic. That is, the step of "selecting the current division feature from the features according to the information gain selection" may include:
selecting a maximum target information gain from the information gains;
judging whether the target information gain is larger than a preset threshold value or not;
and if so, selecting the characteristic corresponding to the target information gain as the current division characteristic.
In an embodiment, when the target information gain is not greater than the preset threshold, the current node may be used as a leaf node, and the sample class with the largest number of samples is selected as the output of the leaf node. Wherein the sample category is the corresponding destination.
The preset threshold may be set according to actual requirements, such as 0.9, 0.8, and the like.
For example, when the information gain 0.9 of the feature 1 for the sample classification is the maximum information gain, and the preset threshold value is 0.8, since the maximum information gain is greater than the preset threshold value, at this time, the feature 1 may be taken as the division feature.
For another example, when the preset threshold is 1, the maximum information gain is smaller than the preset threshold, at this time, the current node may be used as a leaf node, and the analysis of the sample set may find that the number of samples of the category "XX station" is the largest and is greater than the number of samples of the other category "XX building", and at this time, the category "XX station" may be used as the output of the leaf node.
There are various ways of classifying and dividing the samples according to the dividing features, for example, the sample set may be divided based on the feature values of the dividing features. That is, the step of "dividing the sample set according to the dividing features" may include:
obtaining a characteristic value of a dividing characteristic in a sample set;
and dividing the sample set according to the characteristic values.
For example, samples having the same dividing characteristic value in the sample set may be divided into the same sub-sample set. For example, the feature values of the division feature include: 0. 1, 2, then, at this time, the samples with the feature value of 0 of the feature can be classified into one class, the samples with the feature value of 1 can be classified into one class, and the samples with the feature value of 2 can be classified into one class.
For example, for sample set a { sample 1, sample 2 … …, sample i … …, sample n }, where sample 1 includes feature 1, feature 2 … …, sample i includes feature 1, feature 2 … …, feature m, and sample n includes feature 1, feature 2 … …, feature m.
First, all samples in the sample set are initialized, and then, one root node a is generated, and the sample set is used as node information of the root node a, as described with reference to fig. 3.
Calculating information gains g1 and g2 … … gm of each feature, such as feature 1 and feature 2 … …, feature m, for the classification of the sample set; the maximum information gain gmax, e.g., gi, is selected as the maximum information gain.
And when the maximum information gain gmax is smaller than a preset threshold epsilon, the current node is used as a leaf node, and the sample type with the largest number of samples is selected as the output of the leaf node.
When the maximum information gain gmax is greater than the preset threshold epsilon, the feature i corresponding to the information gain gmax may be selected as the partition feature t, and the sample set a { sample 1, sample 2 … … sample i … … sample n } is partitioned according to the feature i, for example, the sample set is partitioned into two sub-sample sets a1{ sample 1, sample 2 … … sample k } and a2{ sample k +1 … … sample n }.
The dividing feature t in the subsample sets a1 and a2 is removed, and at this time, the samples in the subsample sets a1 and a2 include { feature 1, feature 2 … …, feature i-1, feature i +1 … …, feature n }. The child nodes a1 and a2 of the root node a are generated with reference to fig. 3, and the child sample set a1 is taken as the node information of the child node a1, and the child sample set a2 is taken as the node information of the child node a 2.
Then, for each child node, taking the child node a1 as an example, determining whether the child node meets a preset classification termination condition, if so, taking the current child node a1 as a leaf node, and setting the leaf node output according to the type of the sample in the child sample set corresponding to the child node a 1.
When the child node does not meet the preset classification termination condition, the child sample sets corresponding to the child nodes are continuously classified by adopting the information gain classification-based mode, for example, the information gain g of each feature in the a2 sample set relative to the sample classification can be calculated by taking the child node a2 as an example, the maximum information gain gmax is selected, when the maximum information gain gmax is greater than the preset threshold epsilon, the feature corresponding to the information gain gmax can be selected as the division feature t, the a2 is divided into a plurality of child sample sets based on the division feature t, for example, the a2 can be divided into the child sample sets a21, a22 and a23, then the division feature t in the child sample sets a21, a22 and a23 is removed, the child nodes a21, a22 and a23 of the current node a2 are generated, and the sample sets a21, a22 and a23 with the division feature t removed are respectively used as the node information of the child nodes a21, a22 and a 23.
By analogy, a decision tree as shown in fig. 4 can be constructed by using the above-mentioned information gain-based classification, and the output of the leaf node of the decision tree includes "XX destination".
In the embodiment of the application, the information gain of the feature for the sample set classification can be obtained based on the empirical entropy of the sample classification and the conditional entropy of the feature for the sample set classification result. That is, the step of obtaining the information gain of the features in the target sample set for the sample set classification may include:
acquiring experience entropy of sample classification;
acquiring conditional entropy of the characteristics on the sample set classification result;
and acquiring information gain of the features for sample set classification according to the conditional entropy and the empirical entropy.
The probability of each destination sample appearing in the sample set can be obtained, the destination sample is a corresponding destination of the sample type, and the empirical entropy of the sample is obtained according to each probability.
For example, for sample set Y { sample 1, sample 2 … … sample i … … sample n }, if the sample category is XX, the number of samples for the building sample is j, and the number of samples for the XX station sample is n-j; at this time, the probability p1 of occurrence of the XX building sample in the sample set Y is j/n, and the probability p2 of occurrence of the XX station in the sample set Y is n-j/n. Then, the empirical entropy h (y) of the sample classification is calculated based on the following calculation formula of the empirical entropy:
Figure BDA0001530360230000091
where pi is the probability of occurrence of a sample in sample set Y. In the decision tree classification problem, the information gain is the difference between the information before and after attribute selection and division of the decision tree.
In an embodiment, a sample set may be divided into a plurality of sub-sample sets according to a feature t, then, an information entropy of each sub-sample set classification and a probability of each feature value of the feature t appearing in the sample set are obtained, and according to the information entropy and the probability, a divided information entropy, that is, a conditional entropy of the feature t on a sample set classification result, may be obtained.
For example, for a sample feature X, the conditional entropy of the sample feature X on the classification result of the sample set Y can be calculated by the following formula:
Figure BDA0001530360230000101
wherein n is the number of the characteristic X, namely the number of the characteristic value types. At this time, pi is the probability that the sample with the X characteristic value being the ith value appears in the sample set Y, and xi is the ith value of X. H (Y | X ═ xi) is the empirical entropy of classification of the sub-sample set Yi, and the X eigenvalues of the samples in the sub-sample set i are all the ith values.
For example, taking the number of samples of feature X as 3, that is, X1, X2, and X3 as an example, in this case, the sample set Y { sample 1, sample 2 … … sample i … … sample n } may be divided into three sub-sample sets by feature X, Y1{ sample 1, sample 2 … … sample d } with a feature value of X1, Y2{ sample d +1 … … sample e } with a feature value of X2, and Y3{ sample e +1 … … sample n } with a feature value of X3. d. e is a positive integer and is less than n.
At this time, the conditional entropy of the feature X on the classification result of the sample set Y is:
H(Y|X)=p1H(Y|x1)+p2H(Y|x2)+p3H(Y|x3);
wherein, p1 ═ Y1/Y, p2 ═ Y2/Y, p2 ═ Y3/Y;
h (Y | x1) is the information entropy of the sub-sample set Y1 classification, i.e. the empirical entropy, and can be calculated by the above calculation formula of the empirical entropy.
After the empirical entropy H (Y) of the sample classification and the conditional entropy H (Y | X) of the classification result of the feature X for the sample set Y are obtained, the information gain of the feature X for the sample set Y can be calculated, as calculated by the following formula:
g(Y,X)=H(Y)-H(Y|X)
that is, the information gain of the classification of the sample set Y by the feature X is: the difference between the empirical entropy H (Y) and the conditional entropy H (Y | X) of the feature X for the sample set Y classification result.
203. And when the map application is detected to be opened by the user, acquiring the current corresponding multi-dimensional features as prediction samples.
When the map application is detected to be opened by the user, the fact that the user needs to search for the destination is explained. Correspondingly, the current corresponding multi-dimensional features are collected as preset samples.
It should be particularly noted that, in the embodiment of the present application, the multidimensional features acquired in steps 201 and 203 are the same features, for example: the current weather state; a current time period; initially information, etc.
204. And predicting a corresponding destination according to the prediction sample and the decision tree model.
Specifically, a corresponding output result is obtained according to the prediction sample and the decision tree model, and a corresponding destination is determined according to the output result. Wherein the output result includes each destination.
For example, a corresponding leaf node may be determined according to the feature of the prediction sample and the decision tree model, and the output of the leaf node may be used as the prediction output result. If the feature of the prediction sample is used to determine the current leaf node according to the branch condition of the decision tree (i.e. the feature value of the partition feature), the output of the leaf node is taken as the prediction result. Since the output of a leaf node includes multiple destinations.
For example, after the current multidimensional feature is collected, the corresponding leaf node an1 can be found in the decision tree shown in fig. 4 according to the branch condition of the decision tree, and the output of the leaf node an1 is the destination 1, at this time, it is determined that the destination of the push is the destination 1.
As can be seen from the above, in the embodiment of the application, when it is detected that the user determines the destination, the multidimensional feature corresponding to the destination is collected as a sample, and a sample set corresponding to the destination is constructed; carrying out sample classification on the sample set according to the information gain of the sample classification of the characteristics to construct a decision tree model of a destination, wherein the output of the decision tree model is the corresponding destination; when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples; and predicting a corresponding destination according to the prediction sample and the decision tree model. Therefore, automatic pushing of the destination is achieved, and the pushing accuracy of the destination is improved.
Further, each sample of the sample set comprises a plurality of characteristic information reflecting the behavior habit of the user for selecting the destination in common, so that the pushing of the destination can be personalized and intelligent.
Furthermore, the pushing prediction of the destination is realized based on the decision tree prediction model, the accuracy of the pushing of the destination can be improved, and the use habits of users can be better fitted.
The classification method of the present application will be further described below on the basis of the methods described in the above embodiments. Referring to fig. 5, the push method of the destination may include:
301. when the fact that the user determines the destination is detected, the multi-dimensional features corresponding to the destination are collected to serve as samples, and a sample set corresponding to the destination is constructed.
When a user is detected to input a destination through a map application, such as the user inputting 'XX building', the corresponding multi-dimensional features when the destination is selected are collected as samples.
The applied multidimensional characteristic information has dimensions with a certain length, and the parameter of each dimension corresponds to one characteristic information for representing the destination, namely the multidimensional characteristic information is composed of a plurality of characteristic information. The plurality of characteristic information may include relevant characteristic information when the destination is selected, such as: the current weather state; a current time period; initial information, date information, etc.
The sample set of destinations may include a plurality of samples taken over a historical period of time. The historical time period may be, for example, the past 7 days, 14 days, etc. It is understood that the multi-dimensional feature data of the destination is acquired at one time to form one sample, and a plurality of samples form a sample set.
A specific sample may be as shown in table 1 below, and includes feature information of multiple dimensions, it should be noted that the feature information shown in table 1 is merely an example, and in practice, the number of feature information included in a sample may be greater than that shown in table 1, or may be less than that shown in table 1, and the specific feature information may be different from that shown in table 1, and is not limited herein.
Dimension (d) of Characteristic information
1 Current state of the sky
2 Current time period
3 Information of origin
4 Date information
5 Current wireless network status, e.g. wifi connection status
TABLE 1
302. And marking the samples in the sample set to obtain a sample label of each sample.
Since what the present implementation is to achieve is a predicted destination, the labeled sample label includes each destination. The sample label of the sample characterizes the sample class of the sample. At this time, the sample category may be "XX building", "XX station", or the like.
In one embodiment, after the labeling the samples in the sample set to obtain the sample label of each sample, the method further includes:
(1) detecting whether samples with the same characteristic value and different destination categories corresponding to the multi-dimensional characteristics exist in the sample set;
when the characteristic values corresponding to the multidimensional features in the sample set are completely consistent, but the sample labels are not consistent, that is, the samples with different destination categories are present, for example, the current weather state, the current time period, the initial location information, the date information, and the characteristic information corresponding to the current wireless network state in the sample 1 are completely consistent with the sample 2, but the destination category corresponding to the sample 1 is "XX building", and the destination category corresponding to the sample 2 is "XX station", it is determined that it is detected that the samples with the same characteristic values corresponding to the multidimensional features but different destination categories exist in the sample set, and the step (2) is executed.
(2) The sample with the largest number of destination category samples is retained.
When different samples with the same feature value and different destination categories corresponding to the multi-dimensional features are detected, the sample with the largest number of destination category samples is reserved, and other samples are deleted, and if the number of the samples with the destination category of 'XX building' is larger than that of the samples with the destination category of 'XX station', all the samples with the destination category of 'XX building' are reserved.
In one embodiment, if the number of samples of different destination categories is consistent, the destination category sample whose storage time is closest to the current time is retained according to the time when the samples are stored.
303. And generating a root node of the decision tree model, and taking the sample set as node information of the root node.
For example, referring to fig. 3, for a sample set a { sample 1, sample 2 … …, sample i … …, sample n }, a root node a of the decision tree may be generated first, and the sample set a is used as node information of the root node a.
304. And determining the sample set as a target sample set to be classified currently.
Namely, determining the sample set of the root node as the target sample set to be classified currently.
305. And obtaining the information gain of each feature in the target sample set for the sample set classification, and determining the maximum information gain.
For example, for sample set a, the information gains g1, g2 … … gm for each feature, such as feature 1, feature 2 … …, feature m, for sample set classification may be calculated; the maximum information gain gmax is chosen.
The information gain of the feature for the sample set classification can be obtained by adopting the following method:
acquiring experience entropy of sample classification; acquiring conditional entropy of the characteristics on the sample set classification result; and acquiring information gain of the features for sample set classification according to the conditional entropy and the empirical entropy.
For example, each destination category may be obtained.
The probability of occurrence of each destination sample in the sample set, which is a sample class as a corresponding destination, may be obtained, and the empirical entropy of the sample may be obtained according to each probability.
For example, taking the sample class of the destination sample as only "XX building" and "XX station as an example," for sample set Y { sample 1, sample 2 … … sample i … … sample n }, if the sample class is the number of samples of "XX building" is j, the number of samples of "XX station" is n-j; at this time, the probability p1 of occurrence of "XX building" in the sample set Y is j/n, and the probability p2 of occurrence of "XX station" in the sample set Y is n-j/n. Then, the empirical entropy h (y) of the sample classification is calculated based on the following calculation formula of the empirical entropy:
Figure BDA0001530360230000141
in the decision tree classification problem, the information gain is the difference between the information before and after attribute selection and division of the decision tree.
In an embodiment, a sample set may be divided into a plurality of sub-sample sets according to a feature t, then, an information entropy of each sub-sample set classification and a probability of each feature value of the feature t appearing in the sample set are obtained, and according to the information entropy and the probability, a divided information entropy, that is, a conditional entropy of the feature t on a sample set classification result, may be obtained.
For example, for a sample feature X, the conditional entropy of the sample feature X on the classification result of the sample set Y can be calculated by the following formula:
Figure BDA0001530360230000142
wherein n is the number of the characteristic X, namely the number of the characteristic value types. At this time, pi is the probability that the sample with the X characteristic value being the ith value appears in the sample set Y, and xi is the ith value of X. H (Y | X ═ xi) is the empirical entropy of classification of the sub-sample set Yi, and the X eigenvalues of the samples in the sub-sample set i are all the ith values.
For example, taking the number of samples of feature X as 3, that is, X1, X2, and X3 as an example, in this case, the sample set Y { sample 1, sample 2 … … sample i … … sample n } may be divided into three sub-sample sets by feature X, Y1{ sample 1, sample 2 … … sample d } with a feature value of X1, Y2{ sample d +1 … … sample e } with a feature value of X2, and Y3{ sample e +1 … … sample n } with a feature value of X3. d. e is a positive integer and is less than n.
At this time, the conditional entropy of the feature X on the classification result of the sample set Y is:
H(Y|X)=p1H(Y|x1)+p2H(Y|x2)+p3H(Y|x3);
wherein, p1 ═ Y1/Y, p2 ═ Y2/Y, p3 ═ Y3/Y;
h (Y | x1) is the information entropy of the sub-sample set Y1 classification, i.e. the empirical entropy, and can be calculated by the above calculation formula of the empirical entropy.
After the empirical entropy H (Y) of the sample classification and the conditional entropy H (Y | X) of the classification result of the feature X for the sample set Y are obtained, the information gain of the feature X for the sample set Y can be calculated, as calculated by the following formula:
g(Y,X)=H(Y)-H(Y|X)
that is, the information gain of the classification of the sample set Y by the feature X is: the difference between the empirical entropy H (Y) and the conditional entropy H (Y | X) of the feature X for the sample set Y classification result.
306. And judging whether the maximum information gain is larger than a preset threshold value, if so, executing step 307, and if not, executing step 313.
For example, it may be determined whether the maximum information gain gmax is greater than a preset threshold value epsilon, which may be set according to actual requirements.
307. And selecting the features corresponding to the maximum information gain as the division features, and dividing the sample set according to the feature values of the division features to obtain a plurality of sub-sample sets.
For example, when the feature corresponding to the maximum information gain gmax is the feature i, the feature i may be selected as the division feature.
Specifically, the sample set may be divided into a plurality of sub-sample sets according to the number of eigenvalues of the divided features, and the number of the sub-sample sets is the same as the number of eigenvalues. For example, samples having the same dividing feature value in the sample set may be divided into the same sub-sample set. For example, the feature values of the division feature include: 0. 1, 2, then, at this time, the samples with the feature value of 0 of the feature can be classified into one class, the samples with the feature value of 1 can be classified into one class, and the samples with the feature value of 2 can be classified into one class.
308. And removing the division characteristics of the samples in the sub-sample set to obtain the removed sub-sample set.
For example, when the value of the partition feature i is two, the sample set a may be partitioned into a1{ sample 1, sample 2 … … sample k } and a2{ sample k +1 … … sample n }. Then, the partition characteristics i in the sub-sample sets a1 and a2 may be removed.
309. And generating child nodes of the current node, and taking the removed child sample set as node information of the corresponding child nodes.
Wherein one subsample set corresponds to one child node. For example, referring to fig. 3, child nodes a1 and a2 of the root node a are generated, and the child sample set a1 is taken as the node information of the child node a1, and the child sample set a2 is taken as the node information of the child node a 2.
310. And judging whether the sub sample set of the child node meets a preset classification termination condition, if so, executing step 311, and if not, executing step 312.
The preset classification termination condition can be set according to actual requirements, when the child node meets the preset classification termination condition, the current child node is used as a leaf node, and word segmentation classification is stopped on a sample set corresponding to the child node; and when the child node does not meet the preset classification termination condition, continuously classifying the sample set corresponding to the child node. For example, the preset classification termination condition may include: and after the child nodes are removed, the number of the types of the samples in the child sample set is equal to the preset number.
For example, the preset classification termination condition may include: the number of the types of the samples in the removed sub-sample set corresponding to the sub-node is 1, that is, only one type of sample is in the sample set of the sub-node.
311. The target sample set is updated to the child sample set of the child node and the execution returns to step 305.
312. And taking the child node as a leaf node, and setting the output of the leaf node according to the sample category in the child sample set of the child node.
For example, the preset classification termination condition may include: the number of the types of the samples in the removed sub-sample set corresponding to the sub-node is 1, that is, only one type of sample is in the sample set of the sub-node.
At this time, if the child node satisfies the preset classification termination condition, the destination class of the sample in the child sample set is taken as the output of the leaf node. If only the sample with the destination category of "XX station" in the sub-sample set is removed, then "XX station" can be used as the output of the leaf node
313. And taking the current node as a leaf node, and selecting the sample type with the maximum sample number as the output of the leaf node.
Wherein the sample class includes each destination.
For example, when the sub-sample set a1 of the sub-node a1 is classified, if the maximum information gain is smaller than a preset threshold, the sample class with the largest number of samples in the sub-sample set a1 may be used as the output of the leaf node. If the number of samples for "XX building" is the greatest, then "XX building" may be taken as the output of leaf node a 1.
314. After the decision tree model is built, when the situation that a map application is opened by a user is detected, collecting the current corresponding multi-dimensional features as prediction samples.
When the map application is detected to be opened by the user, the fact that the user needs to search for the destination is explained. Correspondingly, the current corresponding multi-dimensional features are collected as preset samples.
315. And predicting a corresponding destination according to the prediction sample and the decision tree model.
For example, a corresponding leaf node may be determined according to the feature of the prediction sample and the decision tree model, and the output of the leaf node may be used as the prediction output result. If the feature of the prediction sample is used to determine the current leaf node according to the branch condition of the decision tree (i.e. the feature value of the partition feature), the output of the leaf node is taken as the prediction result. Since the output of the leaf node includes each destination, the destination that needs to be pushed can be determined at this time based on the decision tree.
For example, after the current multidimensional feature is collected, the corresponding leaf node an2 can be found in the decision tree shown in fig. 4 according to the branch condition of the decision tree, and the output of the leaf node an2 is the destination 2, at this time, it is determined that the destination of the push is the destination 2.
As can be seen from the above, in the embodiment of the application, when it is detected that the user determines the destination, the multidimensional feature corresponding to the destination is collected as a sample, and a sample set corresponding to the destination is constructed; carrying out sample classification on the sample set according to the information gain of the sample classification of the characteristics to construct a decision tree model of a destination, wherein the output of the decision tree model is the corresponding destination; when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples; and predicting a corresponding destination according to the prediction sample and the decision tree model. Therefore, automatic pushing of the destination is achieved, and the pushing accuracy of the destination is improved. .
Further, each sample of the sample set comprises a plurality of characteristic information reflecting the behavior habit of the user for selecting the destination in common, so that the pushing of the destination can be personalized and intelligent.
Furthermore, the pushing prediction of the destination is realized based on the decision tree prediction model, the accuracy of the pushing of the destination can be improved, and the use habits of users can be better fitted.
In one embodiment, a pushing device for a destination is also provided. Referring to fig. 6, fig. 6 is a schematic structural diagram of a destination pushing device according to an embodiment of the present application. The push device of the destination is applied to the electronic device, and includes a first acquisition unit 401, a construction unit 402, a second acquisition unit 403, and a prediction unit 404, as follows:
a first collecting unit 401, configured to, when it is detected that a destination is determined by a user, collect a multidimensional feature corresponding to the destination as a sample, and construct a sample set corresponding to the destination;
a constructing unit 402, configured to perform sample classification on the sample set according to the information gain of the sample classification according to the feature, so as to construct a decision tree model of the destination, where an output of the decision tree model is a corresponding destination;
a second collecting unit 403, configured to, when it is detected that a user opens a map application, collect a currently corresponding multidimensional feature as a prediction sample;
a prediction unit 404, configured to predict a corresponding destination according to the prediction sample and the decision tree model.
In an embodiment, referring to fig. 7, the building unit 402 may include:
a first node generating subunit 4021, configured to generate a corresponding root node, and use the sample set as node information of the root node; determining the sample set of the root node as a target sample set to be classified currently;
a gain obtaining subunit 4022, configured to obtain information gains of the features in the target sample set for sample set classification;
a feature determination subunit 4023, configured to select a current partition feature from the features according to the information gain selection;
a classification subunit 4024, configured to classify the sample set according to the classification features to obtain a plurality of sub-sample sets;
a second node generating subunit 4025, configured to remove the partition characteristics of the samples in the sub-sample set, to obtain a removed sub-sample set; generating child nodes of the current node, and taking the removed child sample set as node information of the child nodes;
a determining subunit 4026, configured to determine whether a child node meets a preset classification termination condition, if not, update the target sample set to the removed child sample set, and trigger the gain acquiring subunit to perform the step of acquiring information gain of the feature in the target sample set for sample set classification; and if so, taking the child node as a leaf node, and setting the output of the leaf node according to the category of the removed sample in the child sample set, wherein the category of the sample is a corresponding destination.
The classification subunit 4024 may be configured to obtain a feature value of the partition feature in the sample set;
and dividing the sample set according to the characteristic value. The same samples are divided into the same set of subsamples.
Among other things, the feature determination subunit 4023 may be configured to:
selecting a maximum target information gain from the information gains;
judging whether the target information gain is larger than a preset threshold value or not;
and if so, selecting the characteristic corresponding to the target information gain as the current division characteristic.
In an embodiment, the gain obtaining subunit 4022 may be configured to:
acquiring experience entropy of sample classification;
acquiring conditional entropy of the features on sample set classification results;
and acquiring the information gain of the feature for the sample set classification according to the conditional entropy and the empirical entropy.
For example, the gain obtaining subunit 4022 may be configured to: and acquiring the probability of each destination sample appearing in the sample set, wherein the destination sample is a corresponding destination of the sample type, and acquiring the empirical entropy of the sample according to each probability.
In an embodiment, the determining subunit 4025 is configured to determine whether the number of classes of the removed samples in the set of sub-samples corresponding to the sub-node is a preset number;
if so, determining that the child node meets a preset classification termination condition.
In an embodiment, the characteristic determining subunit 4023 may be further configured to, when the target information gain is not greater than a preset threshold, use the current node as a leaf node, and select a sample category with the largest number of samples as an output of the leaf node.
In an embodiment, referring to fig. 7, the pushing device of the destination further includes:
a detecting unit 405, configured to detect whether there are samples in the sample set that have the same feature value and different destination categories and correspond to the multidimensional features.
A retaining unit 406, configured to retain the sample with the largest number of destination category samples when it is detected that there are samples in the sample set, where feature values corresponding to the multidimensional features are the same and destination categories are different.
The steps performed by each unit in the push apparatus of the destination may refer to the method steps described in the above method embodiments. The push device of the destination can be integrated in an electronic device, such as a mobile phone, a tablet computer, and the like.
In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing embodiments, which are not described herein again.
As can be seen from the above, the pushing device of the destination in this embodiment may acquire the multidimensional feature corresponding to the destination as a sample when the first acquiring unit 401 detects that the user determines the destination, and construct a sample set corresponding to the destination; the constructing unit 402 performs sample classification on the sample set according to the information gain of the sample classification according to the features to construct a decision tree model of a destination, and the output of the decision tree model is the corresponding destination; when the second acquisition unit 403 detects that the user opens the map application, acquiring a currently corresponding multi-dimensional feature as a prediction sample; the prediction unit 404 predicts a corresponding destination according to the prediction samples and the decision tree model. Therefore, automatic pushing of the destination is achieved, and the pushing accuracy of the destination is improved.
The embodiment of the application also provides the electronic equipment. Referring to fig. 8, an electronic device 500 includes a processor 501 and a memory 502. The processor 501 is electrically connected to the memory 502.
The processor 500 is a control center of the electronic device 500, connects various parts of the whole electronic device by using various interfaces and lines, executes various functions of the electronic device 500 and processes data by running or loading a computer program stored in the memory 502 and calling data stored in the memory 502, thereby performing overall monitoring of the electronic device 500.
The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by operating the computer programs and modules stored in the memory 502. The memory 502 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, a computer program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.
In this embodiment, the processor 501 in the electronic device 500 loads instructions corresponding to one or more processes of the computer program into the memory 502, and the processor 501 runs the computer program stored in the memory 502, so as to implement various functions as follows:
when a destination is determined by a user, collecting multidimensional characteristics corresponding to the destination as samples, and constructing a sample set corresponding to the destination;
carrying out sample classification on the sample set according to the information gain of the characteristic on the sample classification to construct a decision tree model of the destination, wherein the output of the decision tree model is the corresponding destination;
when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples;
and predicting a corresponding destination according to the prediction sample and the decision tree model.
In some embodiments, when performing sample classification on the sample set according to the information gain of the feature for sample classification to construct the applied decision tree model, the processor 501 may specifically perform the following steps:
generating a corresponding root node, and taking the sample set as node information of the root node;
determining the sample set of the root node as a target sample set to be classified currently;
obtaining the information gain of the features in the target sample set for the sample set classification;
selecting a current division feature from the features according to the information gain;
dividing the sample set according to the dividing characteristics to obtain a plurality of sub-sample sets;
removing the dividing characteristics of the samples in the sub-sample set to obtain a removed sub-sample set;
generating child nodes of the current node, and taking the removed child sample set as node information of the child nodes;
judging whether the child nodes meet preset classification termination conditions or not;
if not, updating the target sample set into the removed sub-sample set, and returning to execute the step of obtaining the information gain of the characteristics in the target sample set for sample set classification;
and if so, taking the child node as a leaf node, and setting the output of the leaf node according to the category of the removed sample in the child sample set, wherein the category of the sample is a corresponding destination.
In some embodiments, after acquiring the multidimensional feature corresponding to the destination as a sample and constructing a sample set corresponding to the destination, the processor 501 may further specifically perform the following steps:
detecting whether samples with the same feature value and different destination categories corresponding to the multi-dimensional features exist in the sample set;
when the samples with the same feature value and different destination categories corresponding to the multi-dimensional features exist in the sample set, the sample with the largest number of destination category samples is reserved.
In some embodiments, when selecting the current partition feature from the features according to the information gain, the processor 501 may specifically perform the following steps:
selecting a maximum target information gain from the information gains;
judging whether the target information gain is larger than a preset threshold value or not;
and if so, selecting the characteristic corresponding to the target information gain as the current division characteristic.
In some embodiments, the processor 501 may further specifically perform the following steps:
and when the target information gain is not greater than a preset threshold value, taking the current node as a leaf node, and selecting the sample type with the largest number of samples as the output of the leaf node.
In some embodiments, in obtaining the information gain of the feature in the target sample set for the sample set classification, the processor 501 may specifically perform the following steps:
acquiring experience entropy of sample classification;
acquiring conditional entropy of the features on sample set classification results;
and acquiring the information gain of the feature for the sample set classification according to the conditional entropy and the empirical entropy.
As can be seen from the above, in the electronic device according to the embodiment of the application, when it is detected that the user determines the destination, the multidimensional feature corresponding to the destination is collected as a sample, and a sample set corresponding to the destination is constructed; carrying out sample classification on the sample set according to the information gain of the sample classification of the characteristics to construct a decision tree model of a destination, wherein the output of the decision tree model is the corresponding destination; when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples; and predicting a corresponding destination according to the prediction sample and the decision tree model. Therefore, automatic pushing of the destination is achieved, and the pushing accuracy of the destination is improved.
Referring to fig. 9, in some embodiments, the electronic device 500 may further include: a display 503, radio frequency circuitry 504, audio circuitry 505, and a power supply 506. The display 503, the rf circuit 504, the audio circuit 505, and the power source 506 are electrically connected to the processor 501.
The display 503 may be used to display information entered by or provided to the user as well as various graphical user interfaces, which may be made up of graphics, text, icons, video, and any combination thereof. The display 503 may include a display panel, and in some embodiments, the display panel may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The rf circuit 504 may be used for transceiving rf signals to establish wireless communication with a network device or other electronic devices via wireless communication, and for transceiving signals with the network device or other electronic devices.
The audio circuit 505 may be used to provide an audio interface between a user and an electronic device through a speaker, microphone.
The power source 506 may be used to power various components of the electronic device 500. In some embodiments, power supply 506 may be logically coupled to processor 501 through a power management system, such that functions of managing charging, discharging, and power consumption are performed through the power management system.
Although not shown in fig. 9, the electronic device 500 may further include a camera, a bluetooth module, and the like, which are not described in detail herein.
An embodiment of the present application further provides a storage medium, where the storage medium stores a computer program, and when the computer program runs on a computer, the computer is caused to execute a method for pushing a destination in any one of the above embodiments, such as: when a destination is determined by a user, collecting multidimensional characteristics corresponding to the destination as samples, and constructing a sample set corresponding to the destination; carrying out sample classification on the sample set according to the information gain of the characteristic on the sample classification to construct a decision tree model of the destination, wherein the output of the decision tree model is the corresponding destination; when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples; and predicting a corresponding destination according to the prediction sample and the decision tree model.
In the embodiment of the present application, the storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It should be noted that, for the pushing method of the destination in the embodiment of the present application, it can be understood by a person skilled in the art that all or part of the flow of the pushing method of the destination in the embodiment of the present application can be completed by controlling the relevant hardware through a computer program, where the computer program can be stored in a computer readable storage medium, such as a memory of an electronic device, and executed by at least one processor in the electronic device, and during the execution process, the flow of the embodiment of the pushing method of the destination can be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.
In the push device of the destination in the embodiment of the present application, each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.
The above detailed descriptions of the method, the apparatus, the storage medium, and the electronic device for pushing a destination provided in the embodiments of the present application are provided, and a specific example is applied in the present application to explain the principle and the implementation of the present application, and the descriptions of the above embodiments are only used to help understanding the method and the core idea of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (16)

1. A method for pushing a destination, comprising:
when a destination is determined by a user, collecting multidimensional characteristics corresponding to the destination as samples, and constructing a sample set corresponding to the destination, wherein the multidimensional characteristics reflect the behavior habit of the user for selecting the destination;
generating a corresponding root node, and taking the sample set as node information of the root node;
determining the sample set of the root node as a target sample set to be classified currently;
obtaining the information gain of the features in the target sample set for the sample set classification;
selecting a current division feature from the features according to the information gain;
dividing the sample set according to the dividing characteristics to obtain a plurality of sub-sample sets;
removing the dividing characteristics of the samples in the sub-sample set to obtain a removed sub-sample set;
generating child nodes of the current node, and taking the removed child sample set as node information of the child nodes;
judging whether the child nodes meet preset classification termination conditions or not;
if not, updating the target sample set into the removed sub-sample set, and returning to execute the step of obtaining the information gain of the characteristics in the target sample set for sample set classification;
if so, taking the child nodes as leaf nodes, and setting the output of the leaf nodes according to the types of the samples in the removed child sample set, wherein the types of the samples are corresponding destinations so as to construct a decision tree model of the destinations, and the output of the decision tree model is the corresponding destination;
when the map application is detected to be opened by a user, acquiring the current corresponding multi-dimensional features as prediction samples;
and predicting a corresponding destination according to the prediction sample and the decision tree model, and pushing the predicted destination to a user.
2. A method of pushing a destination as claimed in claim 1, wherein sample classification of said sample set based on information gain of said feature for sample classification to construct said applied decision tree model comprises:
generating a corresponding root node, and taking the sample set as node information of the root node;
determining the sample set of the root node as a target sample set to be classified currently;
obtaining the information gain of the features in the target sample set for the sample set classification;
selecting a current division feature from the features according to the information gain;
dividing the sample set according to the dividing characteristics to obtain a plurality of sub-sample sets;
removing the dividing characteristics of the samples in the sub-sample set to obtain a removed sub-sample set;
generating child nodes of the current node, and taking the removed child sample set as node information of the child nodes;
judging whether the child nodes meet preset classification termination conditions or not;
if not, updating the target sample set into the removed sub-sample set, and returning to execute the step of obtaining the information gain of the characteristics in the target sample set for sample set classification;
and if so, taking the child node as a leaf node, and setting the output of the leaf node according to the category of the removed sample in the child sample set, wherein the category of the sample is a corresponding destination.
3. The method as claimed in claim 2, wherein after acquiring the multidimensional feature corresponding to the destination as a sample and constructing a sample set corresponding to the destination, the method further comprises:
detecting whether samples with the same feature value and different destination categories corresponding to the multi-dimensional features exist in the sample set;
when the samples with the same feature value and different destination categories corresponding to the multi-dimensional features exist in the sample set, the sample with the largest number of destination category samples is reserved.
4. The push method of a destination of claim 2, wherein selecting a current partition characteristic from the characteristics according to the information gain comprises:
selecting a maximum target information gain from the information gains;
judging whether the target information gain is larger than a preset threshold value or not;
and if so, selecting the characteristic corresponding to the target information gain as the current division characteristic.
5. The push method of a destination according to claim 4, wherein the push method of a destination further comprises:
and when the target information gain is not greater than a preset threshold value, taking the current node as a leaf node, and selecting the sample type with the largest number of samples as the output of the leaf node.
6. The method as claimed in claim 2, wherein the step of determining whether the child node satisfies a predetermined classification termination condition includes:
judging whether the category number of the removed samples in the sub-sample set corresponding to the sub-node is a preset number or not;
if so, determining that the child node meets a preset classification termination condition.
7. A method for pushing a destination according to any one of claims 2 to 6, wherein obtaining an information gain of said feature for sample set classification in a target sample set comprises:
acquiring experience entropy of sample classification;
acquiring conditional entropy of the features on sample set classification results;
and acquiring the information gain of the feature for the sample set classification according to the conditional entropy and the empirical entropy.
8. The method of pushing of a destination according to claim 7, wherein obtaining an information gain of the feature for the sample set classification based on the conditional entropy and the empirical entropy comprises:
g(Y,X)=H(Y)-H(Y|X)
wherein g (Y, X) is the information gain of the feature X for the classification of the sample set Y, H (Y) is the empirical entropy of the classification of the sample set Y, and H (Y | X) is the conditional entropy of the classification result of the feature X for the sample set Y.
9. The method of pushing of a destination of claim 8, wherein obtaining empirical entropy of sample classification comprises:
obtaining the probability of each destination sample appearing in the sample set, wherein the destination sample is a corresponding destination in the sample category;
and acquiring the empirical entropy of the sample according to the probability.
10. A pushing device for a destination, comprising:
the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring multidimensional characteristics corresponding to a destination as samples and constructing a sample set corresponding to the destination when a user determines the destination, and the multidimensional characteristics reflect the behavior habit of the user for selecting the destination;
the construction unit is used for generating a corresponding root node and taking the sample set as the node information of the root node; determining the sample set of the root node as a target sample set to be classified currently; obtaining the information gain of the features in the target sample set for the sample set classification; selecting a current division feature from the features according to the information gain; dividing the sample set according to the dividing characteristics to obtain a plurality of sub-sample sets; removing the dividing characteristics of the samples in the sub-sample set to obtain a removed sub-sample set; generating child nodes of the current node, and taking the removed child sample set as node information of the child nodes; judging whether the child nodes meet preset classification termination conditions or not; if not, updating the target sample set into the removed sub-sample set, and returning to execute the step of obtaining the information gain of the characteristics in the target sample set for sample set classification; if so, taking the child nodes as leaf nodes, and setting the output of the leaf nodes according to the types of the samples in the removed child sample set, wherein the types of the samples are corresponding destinations so as to construct a decision tree model of the destinations, and the output of the decision tree model is the corresponding destination;
the second acquisition unit is used for acquiring the current corresponding multi-dimensional characteristics as a prediction sample when the map application is detected to be opened by the user;
and the prediction unit is used for predicting a corresponding destination according to the prediction sample and the decision tree model and pushing the predicted destination to a user.
11. The push device of a destination of claim 10, wherein the build unit comprises:
the first node generation subunit is used for generating a corresponding root node and taking the sample set as the node information of the root node; determining the sample set of the root node as a target sample set to be classified currently;
the gain acquisition subunit is used for acquiring the information gain of the feature in the target sample set for the sample set classification;
the characteristic determining subunit is used for selecting the current division characteristic from the characteristics according to the information gain selection;
the classification subunit is used for dividing the sample set according to the division characteristics to obtain a plurality of sub-sample sets;
the second node generation subunit is used for removing the division characteristics of the samples in the sub-sample set to obtain a removed sub-sample set; generating child nodes of the current node, and taking the removed child sample set as node information of the child nodes;
a judging subunit, configured to judge whether a child node meets a preset classification termination condition, if not, update the target sample set to the removed child sample set, and trigger the gain obtaining subunit to perform a step of obtaining information gain of the feature in the target sample set for sample set classification; and if so, taking the child node as a leaf node, and setting the output of the leaf node according to the category of the removed sample in the child sample set, wherein the category of the sample is a corresponding destination.
12. A pushing device for a destination according to claim 11, characterized in that the device further comprises:
the detection unit is used for detecting whether samples which have the same characteristic value and different destination categories and correspond to the multi-dimensional characteristics exist in the sample set;
and the reserving unit is used for reserving the sample with the largest number of destination category samples when detecting that the sample set has samples which have the same feature value and different destination categories and correspond to the multi-dimensional features.
13. The push device of a destination according to claim 11, characterized by a feature determination subunit for:
selecting a maximum target information gain from the information gains;
judging whether the target information gain is larger than a preset threshold value or not;
and if so, selecting the characteristic corresponding to the target information gain as the current division characteristic.
14. The push device of a destination of claim 11, wherein the gain acquisition subunit is configured to:
acquiring experience entropy of sample classification;
acquiring conditional entropy of the features on sample set classification results;
and acquiring the information gain of the feature for the sample set classification according to the conditional entropy and the empirical entropy.
15. A storage medium having stored thereon a computer program, characterized in that, when the computer program runs on a computer, it causes the computer to execute a push method of a destination according to any one of claims 1 to 9.
16. An electronic device comprising a processor and a memory, the memory having a computer program, wherein the processor is configured to execute a push method of a destination according to any one of claims 1 to 9 by calling the computer program.
CN201711461519.XA 2017-12-28 2017-12-28 Destination pushing method and device, storage medium and electronic equipment Active CN108108455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711461519.XA CN108108455B (en) 2017-12-28 2017-12-28 Destination pushing method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711461519.XA CN108108455B (en) 2017-12-28 2017-12-28 Destination pushing method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN108108455A CN108108455A (en) 2018-06-01
CN108108455B true CN108108455B (en) 2020-06-16

Family

ID=62214189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711461519.XA Active CN108108455B (en) 2017-12-28 2017-12-28 Destination pushing method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN108108455B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689158B (en) * 2018-07-05 2022-05-31 北京嘀嘀无限科技发展有限公司 Method, device and storage medium for predicting destination
CN109242012A (en) * 2018-08-27 2019-01-18 平安科技(深圳)有限公司 It is grouped inductive method and device, electronic device and computer readable storage medium
CN109635069B (en) * 2018-12-21 2021-08-10 北京航天泰坦科技股份有限公司 Geographic space data self-organizing method based on information entropy
CN110347760B (en) * 2019-05-30 2021-07-09 中国地质大学(武汉) Data analysis method for lost crowd space-time positioning service
CN110363304A (en) * 2019-06-18 2019-10-22 深圳壹账通智能科技有限公司 Location model construction method, device, computer equipment and storage medium
CN110647929B (en) * 2019-09-19 2021-05-04 北京京东智能城市大数据研究院 Method for predicting travel destination and method for training classifier
CN110986985B (en) * 2019-12-17 2022-07-12 广州小鹏汽车科技有限公司 Vehicle travel pushing method and device, medium, control terminal and automobile

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214213A (en) * 2011-05-31 2011-10-12 中国科学院计算技术研究所 Method and system for classifying data by adopting decision tree
CN105868298A (en) * 2016-03-23 2016-08-17 华南理工大学 Mobile phone game recommendation method based on binary decision tree
CN106407406A (en) * 2016-09-22 2017-02-15 国信优易数据有限公司 A text processing method and system
CN106557846A (en) * 2016-11-30 2017-04-05 成都寻道科技有限公司 Based on university students school data graduation whereabouts Forecasting Methodology
CN106934412A (en) * 2015-12-31 2017-07-07 中国科学院深圳先进技术研究院 A kind of user behavior sorting technique and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102214213A (en) * 2011-05-31 2011-10-12 中国科学院计算技术研究所 Method and system for classifying data by adopting decision tree
CN106934412A (en) * 2015-12-31 2017-07-07 中国科学院深圳先进技术研究院 A kind of user behavior sorting technique and system
CN105868298A (en) * 2016-03-23 2016-08-17 华南理工大学 Mobile phone game recommendation method based on binary decision tree
CN106407406A (en) * 2016-09-22 2017-02-15 国信优易数据有限公司 A text processing method and system
CN106557846A (en) * 2016-11-30 2017-04-05 成都寻道科技有限公司 Based on university students school data graduation whereabouts Forecasting Methodology

Also Published As

Publication number Publication date
CN108108455A (en) 2018-06-01

Similar Documents

Publication Publication Date Title
CN108108455B (en) Destination pushing method and device, storage medium and electronic equipment
CN107704070B (en) Application cleaning method and device, storage medium and electronic equipment
CN108197225B (en) Image classification method and device, storage medium and electronic equipment
CN107894827B (en) Application cleaning method and device, storage medium and electronic equipment
CN108280458B (en) Group relation type identification method and device
CN107678531B (en) Application cleaning method and device, storage medium and electronic equipment
CN106792003B (en) Intelligent advertisement insertion method and device and server
CN111813532B (en) Image management method and device based on multitask machine learning model
CN107870810B (en) Application cleaning method and device, storage medium and electronic equipment
WO2019128598A1 (en) Application processing method, electronic device, and computer readable storage medium
CN111222563B (en) Model training method, data acquisition method and related device
CN108595573B (en) Page display method and device, storage medium and electronic equipment
CN107943582B (en) Feature processing method, feature processing device, storage medium and electronic equipment
WO2019120007A1 (en) Method and apparatus for predicting user gender, and electronic device
CN111797870A (en) Optimization method and device of algorithm model, storage medium and electronic equipment
CN107943537B (en) Application cleaning method and device, storage medium and electronic equipment
CN114117056A (en) Training data processing method and device and storage medium
CN108234758B (en) Application display method and device, storage medium and electronic equipment
CN109961163A (en) Gender prediction's method, apparatus, storage medium and electronic equipment
CN107741867B (en) Application program management method and device, storage medium and electronic equipment
CN114647703A (en) Data processing method and device, electronic equipment and storage medium
CN112948763B (en) Piece quantity prediction method and device, electronic equipment and storage medium
CN111800535B (en) Terminal running state evaluation method and device, storage medium and electronic equipment
CN111667028A (en) Reliable negative sample determination method and related device
CN107797831B (en) Background application cleaning method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Changan town in Guangdong province Dongguan 523860 usha Beach Road No. 18

Applicant after: GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP., Ltd.

Address before: Changan town in Guangdong province Dongguan 523860 usha Beach Road No. 18

Applicant before: GUANGDONG OPPO MOBILE TELECOMMUNICATIONS CORP., Ltd.

GR01 Patent grant
GR01 Patent grant