CN105389714B - Method for identifying user characteristics from behavior data - Google Patents
Method for identifying user characteristics from behavior data Download PDFInfo
- Publication number
- CN105389714B CN105389714B CN201510701305.XA CN201510701305A CN105389714B CN 105389714 B CN105389714 B CN 105389714B CN 201510701305 A CN201510701305 A CN 201510701305A CN 105389714 B CN105389714 B CN 105389714B
- Authority
- CN
- China
- Prior art keywords
- user
- behavior
- distribution
- characteristic
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0255—Targeted advertisements based on user history
Landscapes
- Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for identifying user characteristics from behavior data, which comprises the following steps: and establishing a behavior characteristic database, calculating the distribution information of certain behavior characteristics appearing in the user behavior data, obtaining personal distribution, classified distribution and global distribution corresponding to the behavior characteristics, and comprehensively calculating the final distribution result of the behavior characteristics. A likelihood evaluation value of the associated user characteristic is evaluated. Completing calculation of shallow user characteristics; and calculating the final evaluation result of the deep label possessed by the user. All the obtained labels are the user characteristics which are finally analyzed. The method has the advantages of simple model structure and parameters, low algorithm complexity, good performance and spam webpage identification effect on experimental test data, good popularization and adaptability, and objective, reliable and comprehensive identification effect.
Description
Technical Field
The invention relates to the field of Internet, in particular to a method for identifying user characteristics from behavior data.
Background
1. User behavior data
The user behavior data refers to the digital recording result of all daily behaviors of a person as a behavior individual. With the rapid development of the internet and the mobile internet, the online behavior becomes an important component of the daily behavior of human beings, and the corresponding online behavior data accounts for more than 90% of the total amount of the daily recordable user behavior data, and from this viewpoint, the online behavior data can be used to represent the user behavior data.
The online behavior data can be divided into several categories by the behavior scene to which the online behavior data belongs: mobile App behavior, location change behavior, search behavior, web browsing behavior, shopping transaction behavior, social behavior, and the like. The source scene, the attribute and the generation mode of each type of data are different. With the development of internet/mobile internet services, the online user group is large in size (more than 7 times of the daily population), and the amount of generated behavior data is huge. For each user, daily behavior data can reach thousands, more than one hundred thousand per year. The recorded user search behavior data is in the neighborhood of billions a day.
Such rich/large-scale behavioral data can reveal many personal characteristics of the user, and has great commercial value. For example, shopping characteristics (purchased products and brand preferences) of the user can be found through searching and shopping transaction behavior data, and the e-commerce enterprise can perform accurate personalized commodity recommendation based on the shopping characteristics. Social characteristics (such as interest and value) of the user can be found through social behavior data, and a large number of enterprises can provide more matched services (such as intelligent friend making) for the user based on interest and hobbies.
2. User characteristics
The user characteristics refer to characteristics of a user based on self background and behaviors in the user research field. This feature may define/describe a certain side and inclination of the user. User characteristics include many aspects such as nature (e.g., male, 90 th, old, fat, beijing), life characteristics (job title, occupation, car with private …), interests (like basketball, love to see movies …), shopping preferences (like brand, type of cosmetics used), value, and lifestyle (e.g., like branding, pursuit of quality, small funding, high consumer ability).
The user characteristics come from a qualitative (non-quantitative), multi-dimensional description of the user after long-term observation. The method is from original attribute information and long-term behaviors of the user, but hides original attribute details, so that the privacy of the user is protected (for example, from the identity card information of the user, the user characteristics which can be obtained are female and 80 days later, but do not correspond to a specific birthday), and the method has a generalization and popularization value.
Currently, the user characteristics draw from the idea of the internet and define specific attributes in a tagging manner. Each user characteristic may be considered a tag of the user such that all characteristics of the user may be defined by a series of tags combined. The analysis of the user's characteristics becomes an analysis of the user's tags. The user property is replaced by the primary user tag hereafter.
3. User characteristic (tag) analysis recognition
Since the user tags (user characteristics) embody a large amount of user intrinsic information (such as interest preferences) and can bring huge commercial values (such as corresponding commodity service recommendations for user interest type tags), how to analyze and accurately identify the user tags, and the related methods have been widely regarded by the fields of user research and commercial application since 2014.
User profile analysis is mainly through two mechanisms. (1) Based on a large amount of basic attribute information (such as identity card numbers/positions/residential addresses and the like) of users, the method has the advantages of narrow data coverage range, limited analyzable user characteristics and less use because of the problem of revealing user privacy. (2) Based on user behavior data. The user characteristic extraction tags are analyzed through mining of user behaviors, the mode does not relate to user privacy, and meanwhile, the mass user behavior data of the Internet/mobile Internet also provides enough data support. And thus become the current primary mode of analysis.
In the analysis mechanism based on the user behavior, the user does not need any direct privacy data (such as family address) and social identification (such as identification number) of real life, and the summary is abstracted through the continuous behavior history of the user. Each user is uniquely identified as a meaningless number id (which cannot correspond to a specific person in real life, e.g., u001), whose authenticity is deduced and tagged with data on long-term behavior of the id (e.g., cell phone App usage/web browsing/shopping transactions, etc.). To take an intuitive example, we have no knowledge of user u001 at first, but find from their semi-annual behaviour data: its cell-phone App is used beautiful picture exuberance autodyne frequently and is opened certain yoga and use, browses the website and love to basha fashion and green travel, and the online shopping often purchases into milk powder, and we can analyze very easily this user (high possibility) characteristic label includes: women (Ma Lai), fashion, yoga, and infants at home. In practical application, due to various scenes and large scale of behavior data, the scale of a user to be analyzed is often over a million level, and the analysis must be completed by an automatic analysis method.
The current mainstream of methods for automatically analyzing user tags is a keyword (behavior feature keyword) based mode (mostly adopted by internet/e-commerce enterprises). The basic method is as follows:
keywords in the behavior are defined, and the corresponding classification and associated user labels (user characteristics) are set.
Statistical information (e.g., frequency) of occurrences of keywords in the behavior data is calculated and mapped to the frequency of associated user tags.
The user characteristics with high statistical frequency are regarded as the final characteristics of the user and are reserved.
The method is used for analyzing partial user tags (shopping and brand preference classes) in a specific behavior scene (shopping transaction behaviors), and is very suitable for user tag identification and subsequent accurate sales recommendation of e-commerce/Internet. However, the method is difficult to be used in other (such as App using/browsing behaviors) more valuable behavior scenes, so that a more comprehensive user tag cannot be found. And a relatively simple evaluation mechanism is not only less accurate, but also only able to analyze the characteristics of the user's surface (usually called surface user tags), and difficult to mine its deep characteristics (deep tags). For example, a certain user often purchases diet cola and xylitol in shopping behavior, the existing method can only find out that the user labels are cola-liked, coca-cola-preferred brands and xylitol-eaten in an isolated manner, but cannot comprehensively reveal the hidden characteristics of the user: a large number of sugar-free products, suggesting that it may be diabetic. This trait is called deep user tags (user tags that cannot be directly deduced through user behavior data). Obviously, the deep label is more meaningful and has higher application value (the recommendation of the goods for the diabetic is more accurate, and the user acceptance is higher).
Disclosure of Invention
The invention aims to provide a method for identifying user characteristics from behavior data, aiming at the defects of the existing correlation method for automatically analyzing the user characteristics based on the behavior data. The method is based on a more comprehensive user behavior feature library, comprehensively introduces various distribution (self, belonging classification and global) features of behavior features, and achieves more accurate association of the features and the user characteristics through probability characterization. And meanwhile, a multi-level derivation method is adopted, and the deep user label is further found through the surface layer characteristics. Compared with the existing analysis algorithm, the analysis result of the invention is more accurate and deeper, has universality, and can be suitable for all behavior scenes, so that more comprehensive user characteristics can be conveniently researched.
In order to achieve the purpose, the invention provides the following technical scheme:
a method of identifying characteristics of a user from behavioural data, comprising the steps of:
1) establishing a behavior characteristic database which comprises a behavior characteristic definition library, a behavior characteristic-user characteristic mapping rule library, behavior characteristic distribution data and a user characteristic deduction library;
the behavior feature definition library defines basic attributes of all the behavior features/user characteristics involved;
the behavior characteristic-user characteristic mapping rule base defines how each behavior characteristic is mapped to the user characteristic;
the behavior feature distribution data is distribution data in which behavior features are calculated from the full-scale behavior data;
defining deduction rules of the shallow tags and the deep tags by the user characteristic deduction library;
2) for a user, calculating the distribution information of a certain behavior characteristic appearing in the behavior data of the user, and then obtaining the personal distribution, the classification distribution and the global distribution corresponding to the behavior characteristic; taking the classification distribution and the global distribution as a reference, and comprehensively calculating a final distribution result of the behavior characteristics through the personal distribution, the classification distribution and the global distribution by combining a weighting algorithm;
3) evaluating a likelihood evaluation value of the associated user characteristic, expressed in probability, based on the final distribution result of the behavior feature of the user;
4) after all the labels related to the user behavior characteristics are calculated, the basic shallow user characteristics are calculated;
5) then, based on a user characteristic deduction library, finding out the characteristics of deep labels of the user deduced from the characteristics of the shallow user identified by the current user, and further calculating the final evaluation result of the deep labels of the user based on a deduction mode, wherein the final evaluation result is represented by probability;
6) all the labels of a certain user, namely the shallow label and the deep label, and the related evaluation value, are calculated by the method, namely the user characteristics are analyzed finally.
As a further scheme of the invention: behavioral characteristic distribution data, including: calculating classification distribution data Fc: based on the classification to which each behavior feature belongs, counting the distribution frequency or user proportion of the classification in the total behavior data;
calculating global distribution data Fg: and counting the average distribution of the relevant global situation by taking the group of the matched users as the standard for all the users with the behavior characteristics in the statistical behavior data.
As a further scheme of the invention: determining whether a shallow tag deduction deep tag deduction mode is based on a probability or a distribution threshold; if the probability is deduced based on the probability, the credibility probability of the shallow label deducing to the deep label is between 0 and 1; if derived based on the distribution threshold, a derived minimum distribution threshold is generated, beyond which the likelihood of being deemed to have the deep tag is exceeded.
As a further scheme of the invention: and 3) if the user has a plurality of behavior characteristics mapped to the same label, the final evaluation result of the label is obtained by comprehensively calculating the possibility evaluation values of the behavior characteristics based on the independent and same distribution principle of the probability statistics theory.
Compared with the prior art, the invention has the beneficial effects that:
the method and the system can analyze and discover the characteristic tags (including deep characteristic tags) of the user from massive user behavior data. The model structure and parameters are simple, the algorithm complexity is low, and good performance and spam webpage identification effect are obtained on experimental test data. The method has good popularization and adaptability, has the characteristics of objective, reliable and comprehensive identification effect, and has good application prospect.
Drawings
FIG. 1 is a diagram of an actual user profile analysis process;
fig. 2 is a diagram of correspondence between user characteristics/behavior characteristics and characteristic keywords.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is completed on a computer, and sequentially comprises the following steps:
The behavior characteristic database is an important resource for automatically calculating the user characteristics in the method and is obtained by manual (user research experts) small-amount labeling and automatic statistical calculation. The related work included:
step 1.1: and creating a behavior feature definition library of behavior features and user features, and defining basic attributes of all the behavior features/user features involved. The attributes of the behavioral characteristics are shown in table 1.
TABLE 1
The attributes of the user profile are shown in table 2.
TABLE 2
Step 1.2: a mapping rule base of behavior characteristics to user characteristics is created, defining how each behavior characteristic maps to a user characteristic. There are cases where one user characteristic corresponds to a plurality of behavior characteristics. The mapping rule relationship between the behavior characteristics and the user characteristics is defined as table 3.
TABLE 3
Step 1.3: distribution data of the behavior features is calculated from the full-scale behavior data. The method comprises the following steps:
calculating classification distribution data Fc: based on the classification to which each behavior feature belongs (table 1), the distribution (frequency/user ratio, etc.) of the classification in the total amount of behavior data is counted.
Calculating global distribution data Fg: and counting the average distribution of the relevant global (the average frequency of the behavior characteristics/the proportion of the total number of users and the like) by taking the group of the matched users as the reference for all the users with the behavior characteristics.
Step 1.4: a library of shallow tag deductions is created for the user characteristics of deep tags, defining how deep user characteristics are found by the behavioral characteristics of shallow tags. Multiple shallow tags are often required to jointly deduce the deep tag case. The deduction rule relationship between the shallow user characteristics and the deep user characteristics is defined as table 4.
TABLE 4
Step 2.1 statistical user base distribution of behavioral characteristics
And acquiring all related keywords according to the behavior characteristics P defined in the table 1, and inquiring in the user behavior data according to the keywords. If the behavior data relates to Chinese (such as the title of the browsing content), corresponding word segmentation processing is required in advance (a word segmentation program such as ICTCCLAS 3.0 Chinese word segmentation system can be selected). The matching behavioural data records (set as set DSet) are used to analyse the relevant characteristics of the behavioural characteristics P of the user.
For the user U, for the matched behavior record set DSet, statistics is performed on the distribution PFu of the behavior feature P possessed by the user, such as the total number of occurrences, the average frequency of unit duration (which may be day/month, etc.), and smoothing (such as squaring) is performed to avoid the influence of abnormal extreme values.
Step 2.2 calculating the final credible distribution Pf of the behavior feature P based on the three distribution attributes
For the behavior characteristics P of the user U, classified distribution data Fc to which the behavior characteristics P belong and global distribution data Fg of all users related to the behavior characteristics P are inquired from a behavior characteristic database. Based on PFu, Fc and Fg three distributions, a final credible distribution Pf of the behavior feature P is calculated. Pf ═ K1*PFu+K2*Fc+K3*Fg。K1+K2+K31.0, and K1Usually at (0.6-0.8), the fluctuation is determined by the ratios of PFu/Fc and PFu/Fg.
Step 2.3 of calculating a likelihood evaluation value TPu of the user characteristic T corresponding to the behavior feature P
Based on the final credibility distribution Pf of the behavior feature P, a likelihood evaluation value TPu that generates the corresponding user characteristic T is calculated.
TPu ═ f (Pf, Rate), f is a binomial function, the final confidence distribution Pf is the confidence distribution of the behavior feature P, and Rate is the derived probability of the behavior feature P and the corresponding label Tag (defined in step 1.2).
From this, a final estimated probability (probability) that the user has the user characteristic T is derived by means of a behavior feature P.
Because multiple behavior characteristics can indicate that the user has the same characteristics (e.g., visiting multiple news sites can indicate that the user reads news). Therefore, the final evaluation of the user characteristics T requires the final analysis according to TPu of all the associated behavior characteristics.
The likelihood evaluation value TPu of the user characteristic corresponding to each behavior feature has been calculated in step 2. Assuming that the set of behavior features P from which the user characteristic T can be derived is PSet (P1, P2, P3.. Pn), and each behavior feature P corresponds to a behavior feature (obtained from table 3), the evaluation result Tu of the user characteristic can be calculated as follows: f (TPu)1,TPu2...,TPuN),TPu1,TPu2...,TPuNIs all rows of the PSetIs the evaluation result of the characteristics. N is typically between 10 and 20.
The evaluation result Tu finally defines the probability (between 0-1) that the user has a (shallow) user characteristic T.
Let the tag evaluation result set of the user U be UT, and add the user characteristic T (user characteristic T and evaluation result Tu) to the tag evaluation result set UT.
And (4) repeating the steps 2 to 3 for all the shallow user characteristics (non-deep user characteristics), and completing the calculation of all the evaluation results Tu related to the user U. Thus, the tag evaluation result set UT obtains the results of all the shallow user characteristics of the user U.
Step 4 calculating an evaluation result TDu of the deep-level user characteristics of the user U
For all the shallow user characteristics (note not behavior characteristics) of the user U obtained in the previous step, the set is assumed to be TLSet (TL1, TL2, TL3 …), and TLx is the shallow user characteristics of the user U. For each deep tag (defined in table 2) TLx, the derivation rule and the deep tag TagD in table 4 are looked up, and correlation calculation is performed according to the related derivation mode (probability or distribution threshold), and finally, an evaluation result TDu of all derivable deep user characteristics is generated.
TagD (TagD with evaluation result TDu) is added to the tag evaluation result set UT of the user U.
And by circulating operation, all possible deep user characteristics can be generated for the user U.
After the above steps are completed, a tag evaluation result set UT is obtained, that is, an evaluation result set of all tags (including shallow tags and deep tags) of the user U, and the related tags (evaluation values) quantitatively represent the final characteristics of the user U.
The related algorithm is implemented through software HCR big data user research and analysis platform. The software is developed by java language, programming realizes the related algorithm of the method, and completes the whole processing process of analyzing the behavior big data to obtain the user characteristic label based on the new method. The main functional modules and processes include:
a label management module: it is used to establish a user characteristic system and perform related settings for different services and scenarios (label definition in table 2 of step 1.1, deduction relationship defined in step 1.4, etc.).
Basic evaluation/labeling module: and realizing rapid manual labeling and management on basic resources (behavior characteristics/relevance deduction setting and the like related to steps 1.1-1.3) required by analysis.
A data preprocessing module: and carrying out related automatic preprocessing on massive user behavior data. Including the import of raw data, the cleaning of non-normative data, and the calculation of the relevant distribution data required in step 1.3.
A label analysis module: the core analysis module of the algorithm is realized. And (4) automatically carrying out actual analysis on the label on the preprocessed behavior data (all calculations of the step 2 to the step 4), and recording the analysis result into a result library. Due to the large amount of relevant data and the large calculation load, the program supports a framework of distributed calculation and can be completed by a server cluster in a concurrent mode.
And a result display module: and carrying out related display on the user characteristic analysis result obtained by calculation and analysis based on Web visual statistics and charts so as to facilitate the actual analysis of researchers.
The actual process flow is shown in FIG. 1.
(1) A user profile and associated tag deductions are defined. The relevant settings are done manually by the investigator.
(2) Defining behavior features and associated label deductions. Part of the work is manually set, and the rest of the work is obtained by statistics. The user characteristic/behavior feature related architecture is shown in fig. 2.
(3) The user behavior data to be analyzed is preprocessed. Including basic data cleansing (ETL tool based) and computing relevant distribution information.
(4) One user is selected and the subsequent operation is performed.
(5) And analyzing according to the single behavior characteristics of the user to obtain an evaluation value of the user characteristic (shallow label).
(6) And comprehensively evaluating a plurality of behavior characteristics related to the user characteristics to obtain a final evaluation value of the user characteristics.
(7) And (5) returning to the step (5) to continue execution. Until the analysis of all shallow tags is completed. Turning to the next step.
(8) Based on all the obtained shallow tags, all the deep user characteristics of the user are analyzed and generated.
(9) The set of all analysis tag results is output as the final analysis result.
(10) And (4) turning to the step.
To verify the effectiveness and versatility of the method of the invention, relevant experiments were performed.
Two important behavioral scenarios were selected: the mobile internet App uses behavior (detailed records of the App used by the user) and web browsing (browsing of various internet web pages) to perform the experiments. For the selected 200 ten thousand users, extracting a real behavior data set: mobile App behavioral data (6 months of continuous behavior, 58 billion pieces) and web browsing behavioral data (3 months of continuous browsing history, 1.2 billion pieces).
After the related initial labeling and the establishment of the basic label system (about 150 shallow labels and 20 deep labels), the related data are actually analyzed and tested by related software. And finally, comparing the analysis result with the label analysis result obtained by the batch of users based on the traditional method. The results are as follows:
discovery capability of user characteristics: the discovery capabilities of shallow tags are similar to the capabilities of the traditional methods in certain categories (interest/shopping preferences), but tags that can be analyzed on more categories (e.g., nature, lifestyle, etc.) are more than 50% more than the traditional methods. For deep tags, 15 tags can be analyzed by the new method, but none of the traditional methods can identify.
Accuracy of label analysis: the label analysis results (the interest bias and the shopping preference class) shared by the two methods are manually distinguished. The analysis result of 1000 users is randomly sampled and judged by user researchers, and the accuracy of the possible result of the new method is 23% higher than that of the traditional method.
Algorithm adaptability to behavioral scenarios: the traditional method is good at the behavior of online shopping, but the method is not only suitable for the scene, but also can be effectively applied to the scenes of mobile App behaviors and browsing behaviors.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (4)
1. A method of identifying characteristics of a user from behavioural data, comprising the steps of:
1) establishing a behavior characteristic database which comprises a behavior characteristic definition library, a behavior characteristic-user characteristic mapping rule library, behavior characteristic distribution data and a user characteristic deduction library;
the behavior feature definition library defines basic attributes of all the behavior features involved;
the behavior characteristic-user characteristic mapping rule base defines how each behavior characteristic is mapped to the user characteristic;
the behavior feature distribution data is distribution data in which behavior features are calculated from the full-scale behavior data;
defining deduction rules of the shallow tags and the deep tags by the user characteristic deduction library;
2) for a user, calculating the distribution information of a certain behavior characteristic appearing in the behavior data of the user, and then obtaining the personal distribution, the classification distribution and the global distribution corresponding to the behavior characteristic; taking the classification distribution and the global distribution as a reference, and comprehensively calculating a final distribution result of the behavior characteristics through the personal distribution, the classification distribution and the global distribution by combining a weighting algorithm; wherein, step 2) includes:
(i) user base distribution of statistical behavior features
Acquiring all related keywords according to the behavior characteristics P, inquiring in user behavior data according to the keywords, if the behavior data relates to Chinese, performing corresponding word segmentation in advance, and setting the matched behavior data records as a set DSet for analyzing the relevant characteristics of the behavior characteristics P of the user;
for a user U, for the matched behavior data record set DSet, the distribution situation PFu of the behavior characteristic P of the user is counted, and smoothing is carried out, so that the influence of an abnormal extreme value is avoided;
(ii) calculating the final credibility distribution Pf of the behavior characteristics P based on the three distribution attributes
For the behavior feature P of the user U, querying classified distribution data Fc to which the behavior feature P belongs and global distribution data Fg of all users related to the behavior feature P from a behavior feature database, and calculating a final credible distribution Pf of the behavior feature P based on PFu, Fc and Fg, wherein Pf = K1 PFu + K2 Fc + K3 Fg, K1+ K2+ K3=1.0, and K1 is 0.6-0.8, and the fluctuation value is determined by the ratio of PFu/Fc to PFu/Fg;
(iii) likelihood evaluation value TPu of user characteristic T corresponding to behavior feature P is calculated
Calculating a likelihood assessment value TPu that generates a corresponding user characteristic T based on the final credibility distribution Pf of the behavior feature P;
TPu = f (Pf, Rate), where f is a binomial function, the final credibility distribution Pf is the credibility distribution of the behavior feature P, and Rate is the derived probability of the behavior feature P and the corresponding label Tag;
deriving a final evaluation possibility that the user has the user characteristic T from the behavior characteristic P;
3) evaluating a likelihood evaluation value of the associated user characteristic, expressed in probability, based on the final distribution result of the behavior feature of the user;
since various behavior characteristics can indicate that the user has the same characteristics, the final evaluation of the user characteristics T needs to be finally analyzed according to TPu of all associated behavior characteristics;
having calculated the likelihood evaluation value TPu of the user characteristic corresponding to each behavior feature in step 2), assuming that the set of behavior features P from which the user characteristic T can be derived is PSet, and each behavior feature P corresponds to one behavior feature, the evaluation result Tu of the user characteristic is calculated as follows: tu = f (TPu1, TPu2 …, TPuN), TPu1, TPu2 …, TPuN being the result of an evaluation of all behavioral characteristics of the PSet, N being between 10 and 20,
the evaluation result Tu finally defines the probability that the user has the shallow user characteristic T;
setting a tag evaluation result set of a user U as UT, and adding the user characteristic T and the evaluation result Tu into the tag evaluation result set UT;
repeating the steps 2) to 3) for all the shallow user characteristics to complete the calculation of all the evaluation results Tu related to the user U, so that the tag evaluation result set UT obtains the results of all the shallow user characteristics of the user U;
4) after all the labels related to the user behavior characteristics are calculated, shallow user characteristics are calculated;
5) then, based on a user characteristic deduction library, finding out the characteristics of deep labels of the user deduced from the characteristics of the shallow user identified by the current user, and based on a deduction mode, calculating the final evaluation result of the deep labels of the user, wherein the final evaluation result is represented by probability;
for all the shallow user characteristics of the user U obtained in the previous step, assuming that the set is TLSet and TLx is the shallow user characteristics of the user U, for each shallow user characteristic TLx, finding a derivation rule and a deep tag TagD, performing correlation calculation according to a related derivation mode, and finally generating an evaluation result TDu of all derivable deep user characteristics;
adding the TagD and the evaluation result TDu into a tag evaluation result set UT of the user U;
the circulation operation can generate all possible deep user characteristics for the user U;
6) calculating all the labels of a certain user, namely a shallow label and a deep label and related evaluation values, namely the user characteristics analyzed finally;
the user characteristics in the user research field refer to characteristics of a user based on self background and behavior, the characteristics define a certain side and tendency of the user, and the user characteristics comprise natural characteristics, life characteristics, interests, shopping preferences, value view and life style.
2. The method of identifying characteristics of a user from behavioral data according to claim 1, wherein the behavioral characteristics distributes the data, including: calculating classification distribution data Fc: based on the classification to which each behavior feature belongs, counting the distribution frequency or user proportion of the classification in the total behavior data;
calculating global distribution data Fg: and counting the average distribution of the relevant global situation on the basis of the group of matched users, wherein the statistical behavior data contains all users with the behavior characteristics.
3. The method of identifying user characteristics from behavioral data according to claim 1, wherein determining whether the shallow tag deduction deep tag deduction pattern is based on probability of likelihood or a distribution threshold; if the probability is deduced based on the probability, the credibility probability of the shallow label deducing to the deep label is between 0 and 1; if derived based on the distribution threshold, a derived minimum distribution threshold is generated, beyond which the likelihood of being deemed to have the deep tag is exceeded.
4. The method for identifying characteristics of a user from behavior data according to claim 1, wherein if a plurality of behavior characteristics of the user are mapped to the same tag in step 3), the final evaluation result of the tag is obtained by comprehensively calculating the probability evaluation values of the plurality of behavior characteristics based on an independent and same distribution principle of probability statistics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510701305.XA CN105389714B (en) | 2015-10-23 | 2015-10-23 | Method for identifying user characteristics from behavior data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510701305.XA CN105389714B (en) | 2015-10-23 | 2015-10-23 | Method for identifying user characteristics from behavior data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105389714A CN105389714A (en) | 2016-03-09 |
CN105389714B true CN105389714B (en) | 2022-07-05 |
Family
ID=55421972
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510701305.XA Active CN105389714B (en) | 2015-10-23 | 2015-10-23 | Method for identifying user characteristics from behavior data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105389714B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056444A (en) * | 2016-05-25 | 2016-10-26 | 腾讯科技(深圳)有限公司 | Data processing method and device |
CN106127515A (en) * | 2016-06-22 | 2016-11-16 | 北京网智天元科技股份有限公司 | A kind of passenger portrait and the method and device of data analysis |
CN107016026B (en) * | 2016-11-11 | 2020-07-24 | 阿里巴巴集团控股有限公司 | User tag determination method, information push method, user tag determination device, information push device |
CN108491490A (en) * | 2018-03-14 | 2018-09-04 | 南京易好信息技术有限公司 | Electric business platform Commercial goods labels Division identification system and method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778555A (en) * | 2014-01-21 | 2014-05-07 | 北京集奥聚合科技有限公司 | User attribute mining method and system based on user tags |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104679771B (en) * | 2013-11-29 | 2018-09-18 | 阿里巴巴集团控股有限公司 | A kind of individuation data searching method and device |
-
2015
- 2015-10-23 CN CN201510701305.XA patent/CN105389714B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103778555A (en) * | 2014-01-21 | 2014-05-07 | 北京集奥聚合科技有限公司 | User attribute mining method and system based on user tags |
Non-Patent Citations (2)
Title |
---|
基于数据挖掘的社区网站用户行为分析***;黄碗明;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120715(第07期);全文 * |
微博用户行为分析技术的研究与实现;李政泽;《中国优秀硕士学位论文全文数据库 信息科技辑》;20141215(第12期);第I139-68页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105389714A (en) | 2016-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11574139B2 (en) | Information pushing method, storage medium and server | |
CN107424043B (en) | Product recommendation method and device and electronic equipment | |
US10783534B2 (en) | Method, system and computer readable medium for creating a profile of a user based on user behavior | |
WO2019214245A1 (en) | Information pushing method and apparatus, and terminal device and storage medium | |
CN108550068B (en) | Personalized commodity recommendation method and system based on user behavior analysis | |
WO2021027595A1 (en) | User portrait generation method and apparatus, computer device, and computer-readable storage medium | |
CN111062757A (en) | Information recommendation method and system based on multi-path optimization matching | |
WO2019149145A1 (en) | Compliant report class sorting method and apparatus | |
CN106682686A (en) | User gender prediction method based on mobile phone Internet-surfing behavior | |
CN105893406A (en) | Group user profiling method and system | |
US9607340B2 (en) | Method and system for implementing author profiling | |
CN104077723B (en) | A kind of social networks commending system and method | |
CN112269805A (en) | Data processing method, device, equipment and medium | |
CN107515915A (en) | User based on user behavior data identifies correlating method | |
CN105389714B (en) | Method for identifying user characteristics from behavior data | |
CN112632405B (en) | Recommendation method, recommendation device, recommendation equipment and storage medium | |
CN111159561A (en) | Method for constructing recommendation engine according to user behaviors and user portrait | |
CN114840766A (en) | User portrait construction method, system, equipment and storage medium | |
Ding et al. | Establishing smartphone user behavior model based on energy consumption data | |
CN107070702B (en) | User account correlation method and device based on cooperative game support vector machine | |
US20150142782A1 (en) | Method for associating metadata with images | |
CN117455529A (en) | User electricity utilization characteristic image construction method and system based on big data technology | |
Zhang et al. | Discovering consumers’ purchase intentions based on mobile search behaviors | |
CN116186119A (en) | User behavior analysis method, device, equipment and storage medium | |
CN116501957A (en) | User tag portrait processing method, user portrait system, apparatus and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |