CN105389714B - Method for identifying user characteristics from behavior data - Google Patents

Method for identifying user characteristics from behavior data Download PDF

Info

Publication number
CN105389714B
CN105389714B CN201510701305.XA CN201510701305A CN105389714B CN 105389714 B CN105389714 B CN 105389714B CN 201510701305 A CN201510701305 A CN 201510701305A CN 105389714 B CN105389714 B CN 105389714B
Authority
CN
China
Prior art keywords
user
behavior
distribution
characteristic
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510701305.XA
Other languages
Chinese (zh)
Other versions
CN105389714A (en
Inventor
马亮
周鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huichen Capital Information Co ltd
Original Assignee
Beijing Huichen Capital Information Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huichen Capital Information Co ltd filed Critical Beijing Huichen Capital Information Co ltd
Priority to CN201510701305.XA priority Critical patent/CN105389714B/en
Publication of CN105389714A publication Critical patent/CN105389714A/en
Application granted granted Critical
Publication of CN105389714B publication Critical patent/CN105389714B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for identifying user characteristics from behavior data, which comprises the following steps: and establishing a behavior characteristic database, calculating the distribution information of certain behavior characteristics appearing in the user behavior data, obtaining personal distribution, classified distribution and global distribution corresponding to the behavior characteristics, and comprehensively calculating the final distribution result of the behavior characteristics. A likelihood evaluation value of the associated user characteristic is evaluated. Completing calculation of shallow user characteristics; and calculating the final evaluation result of the deep label possessed by the user. All the obtained labels are the user characteristics which are finally analyzed. The method has the advantages of simple model structure and parameters, low algorithm complexity, good performance and spam webpage identification effect on experimental test data, good popularization and adaptability, and objective, reliable and comprehensive identification effect.

Description

Method for identifying user characteristics from behavior data
Technical Field
The invention relates to the field of Internet, in particular to a method for identifying user characteristics from behavior data.
Background
1. User behavior data
The user behavior data refers to the digital recording result of all daily behaviors of a person as a behavior individual. With the rapid development of the internet and the mobile internet, the online behavior becomes an important component of the daily behavior of human beings, and the corresponding online behavior data accounts for more than 90% of the total amount of the daily recordable user behavior data, and from this viewpoint, the online behavior data can be used to represent the user behavior data.
The online behavior data can be divided into several categories by the behavior scene to which the online behavior data belongs: mobile App behavior, location change behavior, search behavior, web browsing behavior, shopping transaction behavior, social behavior, and the like. The source scene, the attribute and the generation mode of each type of data are different. With the development of internet/mobile internet services, the online user group is large in size (more than 7 times of the daily population), and the amount of generated behavior data is huge. For each user, daily behavior data can reach thousands, more than one hundred thousand per year. The recorded user search behavior data is in the neighborhood of billions a day.
Such rich/large-scale behavioral data can reveal many personal characteristics of the user, and has great commercial value. For example, shopping characteristics (purchased products and brand preferences) of the user can be found through searching and shopping transaction behavior data, and the e-commerce enterprise can perform accurate personalized commodity recommendation based on the shopping characteristics. Social characteristics (such as interest and value) of the user can be found through social behavior data, and a large number of enterprises can provide more matched services (such as intelligent friend making) for the user based on interest and hobbies.
2. User characteristics
The user characteristics refer to characteristics of a user based on self background and behaviors in the user research field. This feature may define/describe a certain side and inclination of the user. User characteristics include many aspects such as nature (e.g., male, 90 th, old, fat, beijing), life characteristics (job title, occupation, car with private …), interests (like basketball, love to see movies …), shopping preferences (like brand, type of cosmetics used), value, and lifestyle (e.g., like branding, pursuit of quality, small funding, high consumer ability).
The user characteristics come from a qualitative (non-quantitative), multi-dimensional description of the user after long-term observation. The method is from original attribute information and long-term behaviors of the user, but hides original attribute details, so that the privacy of the user is protected (for example, from the identity card information of the user, the user characteristics which can be obtained are female and 80 days later, but do not correspond to a specific birthday), and the method has a generalization and popularization value.
Currently, the user characteristics draw from the idea of the internet and define specific attributes in a tagging manner. Each user characteristic may be considered a tag of the user such that all characteristics of the user may be defined by a series of tags combined. The analysis of the user's characteristics becomes an analysis of the user's tags. The user property is replaced by the primary user tag hereafter.
3. User characteristic (tag) analysis recognition
Since the user tags (user characteristics) embody a large amount of user intrinsic information (such as interest preferences) and can bring huge commercial values (such as corresponding commodity service recommendations for user interest type tags), how to analyze and accurately identify the user tags, and the related methods have been widely regarded by the fields of user research and commercial application since 2014.
User profile analysis is mainly through two mechanisms. (1) Based on a large amount of basic attribute information (such as identity card numbers/positions/residential addresses and the like) of users, the method has the advantages of narrow data coverage range, limited analyzable user characteristics and less use because of the problem of revealing user privacy. (2) Based on user behavior data. The user characteristic extraction tags are analyzed through mining of user behaviors, the mode does not relate to user privacy, and meanwhile, the mass user behavior data of the Internet/mobile Internet also provides enough data support. And thus become the current primary mode of analysis.
In the analysis mechanism based on the user behavior, the user does not need any direct privacy data (such as family address) and social identification (such as identification number) of real life, and the summary is abstracted through the continuous behavior history of the user. Each user is uniquely identified as a meaningless number id (which cannot correspond to a specific person in real life, e.g., u001), whose authenticity is deduced and tagged with data on long-term behavior of the id (e.g., cell phone App usage/web browsing/shopping transactions, etc.). To take an intuitive example, we have no knowledge of user u001 at first, but find from their semi-annual behaviour data: its cell-phone App is used beautiful picture exuberance autodyne frequently and is opened certain yoga and use, browses the website and love to basha fashion and green travel, and the online shopping often purchases into milk powder, and we can analyze very easily this user (high possibility) characteristic label includes: women (Ma Lai), fashion, yoga, and infants at home. In practical application, due to various scenes and large scale of behavior data, the scale of a user to be analyzed is often over a million level, and the analysis must be completed by an automatic analysis method.
The current mainstream of methods for automatically analyzing user tags is a keyword (behavior feature keyword) based mode (mostly adopted by internet/e-commerce enterprises). The basic method is as follows:
keywords in the behavior are defined, and the corresponding classification and associated user labels (user characteristics) are set.
Statistical information (e.g., frequency) of occurrences of keywords in the behavior data is calculated and mapped to the frequency of associated user tags.
The user characteristics with high statistical frequency are regarded as the final characteristics of the user and are reserved.
The method is used for analyzing partial user tags (shopping and brand preference classes) in a specific behavior scene (shopping transaction behaviors), and is very suitable for user tag identification and subsequent accurate sales recommendation of e-commerce/Internet. However, the method is difficult to be used in other (such as App using/browsing behaviors) more valuable behavior scenes, so that a more comprehensive user tag cannot be found. And a relatively simple evaluation mechanism is not only less accurate, but also only able to analyze the characteristics of the user's surface (usually called surface user tags), and difficult to mine its deep characteristics (deep tags). For example, a certain user often purchases diet cola and xylitol in shopping behavior, the existing method can only find out that the user labels are cola-liked, coca-cola-preferred brands and xylitol-eaten in an isolated manner, but cannot comprehensively reveal the hidden characteristics of the user: a large number of sugar-free products, suggesting that it may be diabetic. This trait is called deep user tags (user tags that cannot be directly deduced through user behavior data). Obviously, the deep label is more meaningful and has higher application value (the recommendation of the goods for the diabetic is more accurate, and the user acceptance is higher).
Disclosure of Invention
The invention aims to provide a method for identifying user characteristics from behavior data, aiming at the defects of the existing correlation method for automatically analyzing the user characteristics based on the behavior data. The method is based on a more comprehensive user behavior feature library, comprehensively introduces various distribution (self, belonging classification and global) features of behavior features, and achieves more accurate association of the features and the user characteristics through probability characterization. And meanwhile, a multi-level derivation method is adopted, and the deep user label is further found through the surface layer characteristics. Compared with the existing analysis algorithm, the analysis result of the invention is more accurate and deeper, has universality, and can be suitable for all behavior scenes, so that more comprehensive user characteristics can be conveniently researched.
In order to achieve the purpose, the invention provides the following technical scheme:
a method of identifying characteristics of a user from behavioural data, comprising the steps of:
1) establishing a behavior characteristic database which comprises a behavior characteristic definition library, a behavior characteristic-user characteristic mapping rule library, behavior characteristic distribution data and a user characteristic deduction library;
the behavior feature definition library defines basic attributes of all the behavior features/user characteristics involved;
the behavior characteristic-user characteristic mapping rule base defines how each behavior characteristic is mapped to the user characteristic;
the behavior feature distribution data is distribution data in which behavior features are calculated from the full-scale behavior data;
defining deduction rules of the shallow tags and the deep tags by the user characteristic deduction library;
2) for a user, calculating the distribution information of a certain behavior characteristic appearing in the behavior data of the user, and then obtaining the personal distribution, the classification distribution and the global distribution corresponding to the behavior characteristic; taking the classification distribution and the global distribution as a reference, and comprehensively calculating a final distribution result of the behavior characteristics through the personal distribution, the classification distribution and the global distribution by combining a weighting algorithm;
3) evaluating a likelihood evaluation value of the associated user characteristic, expressed in probability, based on the final distribution result of the behavior feature of the user;
4) after all the labels related to the user behavior characteristics are calculated, the basic shallow user characteristics are calculated;
5) then, based on a user characteristic deduction library, finding out the characteristics of deep labels of the user deduced from the characteristics of the shallow user identified by the current user, and further calculating the final evaluation result of the deep labels of the user based on a deduction mode, wherein the final evaluation result is represented by probability;
6) all the labels of a certain user, namely the shallow label and the deep label, and the related evaluation value, are calculated by the method, namely the user characteristics are analyzed finally.
As a further scheme of the invention: behavioral characteristic distribution data, including: calculating classification distribution data Fc: based on the classification to which each behavior feature belongs, counting the distribution frequency or user proportion of the classification in the total behavior data;
calculating global distribution data Fg: and counting the average distribution of the relevant global situation by taking the group of the matched users as the standard for all the users with the behavior characteristics in the statistical behavior data.
As a further scheme of the invention: determining whether a shallow tag deduction deep tag deduction mode is based on a probability or a distribution threshold; if the probability is deduced based on the probability, the credibility probability of the shallow label deducing to the deep label is between 0 and 1; if derived based on the distribution threshold, a derived minimum distribution threshold is generated, beyond which the likelihood of being deemed to have the deep tag is exceeded.
As a further scheme of the invention: and 3) if the user has a plurality of behavior characteristics mapped to the same label, the final evaluation result of the label is obtained by comprehensively calculating the possibility evaluation values of the behavior characteristics based on the independent and same distribution principle of the probability statistics theory.
Compared with the prior art, the invention has the beneficial effects that:
the method and the system can analyze and discover the characteristic tags (including deep characteristic tags) of the user from massive user behavior data. The model structure and parameters are simple, the algorithm complexity is low, and good performance and spam webpage identification effect are obtained on experimental test data. The method has good popularization and adaptability, has the characteristics of objective, reliable and comprehensive identification effect, and has good application prospect.
Drawings
FIG. 1 is a diagram of an actual user profile analysis process;
fig. 2 is a diagram of correspondence between user characteristics/behavior characteristics and characteristic keywords.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is completed on a computer, and sequentially comprises the following steps:
step 1, establishing a behavior characteristic database
The behavior characteristic database is an important resource for automatically calculating the user characteristics in the method and is obtained by manual (user research experts) small-amount labeling and automatic statistical calculation. The related work included:
step 1.1: and creating a behavior feature definition library of behavior features and user features, and defining basic attributes of all the behavior features/user features involved. The attributes of the behavioral characteristics are shown in table 1.
TABLE 1
Figure BSA0000122475020000051
Figure BSA0000122475020000061
The attributes of the user profile are shown in table 2.
TABLE 2
Figure BSA0000122475020000062
Step 1.2: a mapping rule base of behavior characteristics to user characteristics is created, defining how each behavior characteristic maps to a user characteristic. There are cases where one user characteristic corresponds to a plurality of behavior characteristics. The mapping rule relationship between the behavior characteristics and the user characteristics is defined as table 3.
TABLE 3
Figure BSA0000122475020000063
Step 1.3: distribution data of the behavior features is calculated from the full-scale behavior data. The method comprises the following steps:
calculating classification distribution data Fc: based on the classification to which each behavior feature belongs (table 1), the distribution (frequency/user ratio, etc.) of the classification in the total amount of behavior data is counted.
Calculating global distribution data Fg: and counting the average distribution of the relevant global (the average frequency of the behavior characteristics/the proportion of the total number of users and the like) by taking the group of the matched users as the reference for all the users with the behavior characteristics.
Step 1.4: a library of shallow tag deductions is created for the user characteristics of deep tags, defining how deep user characteristics are found by the behavioral characteristics of shallow tags. Multiple shallow tags are often required to jointly deduce the deep tag case. The deduction rule relationship between the shallow user characteristics and the deep user characteristics is defined as table 4.
TABLE 4
Figure BSA0000122475020000071
Step 2, calculating the user characteristics corresponding to the single behavior characteristics
Step 2.1 statistical user base distribution of behavioral characteristics
And acquiring all related keywords according to the behavior characteristics P defined in the table 1, and inquiring in the user behavior data according to the keywords. If the behavior data relates to Chinese (such as the title of the browsing content), corresponding word segmentation processing is required in advance (a word segmentation program such as ICTCCLAS 3.0 Chinese word segmentation system can be selected). The matching behavioural data records (set as set DSet) are used to analyse the relevant characteristics of the behavioural characteristics P of the user.
For the user U, for the matched behavior record set DSet, statistics is performed on the distribution PFu of the behavior feature P possessed by the user, such as the total number of occurrences, the average frequency of unit duration (which may be day/month, etc.), and smoothing (such as squaring) is performed to avoid the influence of abnormal extreme values.
Step 2.2 calculating the final credible distribution Pf of the behavior feature P based on the three distribution attributes
For the behavior characteristics P of the user U, classified distribution data Fc to which the behavior characteristics P belong and global distribution data Fg of all users related to the behavior characteristics P are inquired from a behavior characteristic database. Based on PFu, Fc and Fg three distributions, a final credible distribution Pf of the behavior feature P is calculated. Pf ═ K1*PFu+K2*Fc+K3*Fg。K1+K2+K31.0, and K1Usually at (0.6-0.8), the fluctuation is determined by the ratios of PFu/Fc and PFu/Fg.
Step 2.3 of calculating a likelihood evaluation value TPu of the user characteristic T corresponding to the behavior feature P
Based on the final credibility distribution Pf of the behavior feature P, a likelihood evaluation value TPu that generates the corresponding user characteristic T is calculated.
TPu ═ f (Pf, Rate), f is a binomial function, the final confidence distribution Pf is the confidence distribution of the behavior feature P, and Rate is the derived probability of the behavior feature P and the corresponding label Tag (defined in step 1.2).
From this, a final estimated probability (probability) that the user has the user characteristic T is derived by means of a behavior feature P.
Step 3 calculating an evaluation result Tu of the user characteristics derived from the multiple behavior features
Because multiple behavior characteristics can indicate that the user has the same characteristics (e.g., visiting multiple news sites can indicate that the user reads news). Therefore, the final evaluation of the user characteristics T requires the final analysis according to TPu of all the associated behavior characteristics.
The likelihood evaluation value TPu of the user characteristic corresponding to each behavior feature has been calculated in step 2. Assuming that the set of behavior features P from which the user characteristic T can be derived is PSet (P1, P2, P3.. Pn), and each behavior feature P corresponds to a behavior feature (obtained from table 3), the evaluation result Tu of the user characteristic can be calculated as follows: f (TPu)1,TPu2...,TPuN),TPu1,TPu2...,TPuNIs all rows of the PSetIs the evaluation result of the characteristics. N is typically between 10 and 20.
The evaluation result Tu finally defines the probability (between 0-1) that the user has a (shallow) user characteristic T.
Let the tag evaluation result set of the user U be UT, and add the user characteristic T (user characteristic T and evaluation result Tu) to the tag evaluation result set UT.
And (4) repeating the steps 2 to 3 for all the shallow user characteristics (non-deep user characteristics), and completing the calculation of all the evaluation results Tu related to the user U. Thus, the tag evaluation result set UT obtains the results of all the shallow user characteristics of the user U.
Step 4 calculating an evaluation result TDu of the deep-level user characteristics of the user U
For all the shallow user characteristics (note not behavior characteristics) of the user U obtained in the previous step, the set is assumed to be TLSet (TL1, TL2, TL3 …), and TLx is the shallow user characteristics of the user U. For each deep tag (defined in table 2) TLx, the derivation rule and the deep tag TagD in table 4 are looked up, and correlation calculation is performed according to the related derivation mode (probability or distribution threshold), and finally, an evaluation result TDu of all derivable deep user characteristics is generated.
TagD (TagD with evaluation result TDu) is added to the tag evaluation result set UT of the user U.
And by circulating operation, all possible deep user characteristics can be generated for the user U.
After the above steps are completed, a tag evaluation result set UT is obtained, that is, an evaluation result set of all tags (including shallow tags and deep tags) of the user U, and the related tags (evaluation values) quantitatively represent the final characteristics of the user U.
The related algorithm is implemented through software HCR big data user research and analysis platform. The software is developed by java language, programming realizes the related algorithm of the method, and completes the whole processing process of analyzing the behavior big data to obtain the user characteristic label based on the new method. The main functional modules and processes include:
a label management module: it is used to establish a user characteristic system and perform related settings for different services and scenarios (label definition in table 2 of step 1.1, deduction relationship defined in step 1.4, etc.).
Basic evaluation/labeling module: and realizing rapid manual labeling and management on basic resources (behavior characteristics/relevance deduction setting and the like related to steps 1.1-1.3) required by analysis.
A data preprocessing module: and carrying out related automatic preprocessing on massive user behavior data. Including the import of raw data, the cleaning of non-normative data, and the calculation of the relevant distribution data required in step 1.3.
A label analysis module: the core analysis module of the algorithm is realized. And (4) automatically carrying out actual analysis on the label on the preprocessed behavior data (all calculations of the step 2 to the step 4), and recording the analysis result into a result library. Due to the large amount of relevant data and the large calculation load, the program supports a framework of distributed calculation and can be completed by a server cluster in a concurrent mode.
And a result display module: and carrying out related display on the user characteristic analysis result obtained by calculation and analysis based on Web visual statistics and charts so as to facilitate the actual analysis of researchers.
The actual process flow is shown in FIG. 1.
(1) A user profile and associated tag deductions are defined. The relevant settings are done manually by the investigator.
(2) Defining behavior features and associated label deductions. Part of the work is manually set, and the rest of the work is obtained by statistics. The user characteristic/behavior feature related architecture is shown in fig. 2.
(3) The user behavior data to be analyzed is preprocessed. Including basic data cleansing (ETL tool based) and computing relevant distribution information.
(4) One user is selected and the subsequent operation is performed.
(5) And analyzing according to the single behavior characteristics of the user to obtain an evaluation value of the user characteristic (shallow label).
(6) And comprehensively evaluating a plurality of behavior characteristics related to the user characteristics to obtain a final evaluation value of the user characteristics.
(7) And (5) returning to the step (5) to continue execution. Until the analysis of all shallow tags is completed. Turning to the next step.
(8) Based on all the obtained shallow tags, all the deep user characteristics of the user are analyzed and generated.
(9) The set of all analysis tag results is output as the final analysis result.
(10) And (4) turning to the step.
To verify the effectiveness and versatility of the method of the invention, relevant experiments were performed.
Two important behavioral scenarios were selected: the mobile internet App uses behavior (detailed records of the App used by the user) and web browsing (browsing of various internet web pages) to perform the experiments. For the selected 200 ten thousand users, extracting a real behavior data set: mobile App behavioral data (6 months of continuous behavior, 58 billion pieces) and web browsing behavioral data (3 months of continuous browsing history, 1.2 billion pieces).
After the related initial labeling and the establishment of the basic label system (about 150 shallow labels and 20 deep labels), the related data are actually analyzed and tested by related software. And finally, comparing the analysis result with the label analysis result obtained by the batch of users based on the traditional method. The results are as follows:
discovery capability of user characteristics: the discovery capabilities of shallow tags are similar to the capabilities of the traditional methods in certain categories (interest/shopping preferences), but tags that can be analyzed on more categories (e.g., nature, lifestyle, etc.) are more than 50% more than the traditional methods. For deep tags, 15 tags can be analyzed by the new method, but none of the traditional methods can identify.
Accuracy of label analysis: the label analysis results (the interest bias and the shopping preference class) shared by the two methods are manually distinguished. The analysis result of 1000 users is randomly sampled and judged by user researchers, and the accuracy of the possible result of the new method is 23% higher than that of the traditional method.
Algorithm adaptability to behavioral scenarios: the traditional method is good at the behavior of online shopping, but the method is not only suitable for the scene, but also can be effectively applied to the scenes of mobile App behaviors and browsing behaviors.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (4)

1. A method of identifying characteristics of a user from behavioural data, comprising the steps of:
1) establishing a behavior characteristic database which comprises a behavior characteristic definition library, a behavior characteristic-user characteristic mapping rule library, behavior characteristic distribution data and a user characteristic deduction library;
the behavior feature definition library defines basic attributes of all the behavior features involved;
the behavior characteristic-user characteristic mapping rule base defines how each behavior characteristic is mapped to the user characteristic;
the behavior feature distribution data is distribution data in which behavior features are calculated from the full-scale behavior data;
defining deduction rules of the shallow tags and the deep tags by the user characteristic deduction library;
2) for a user, calculating the distribution information of a certain behavior characteristic appearing in the behavior data of the user, and then obtaining the personal distribution, the classification distribution and the global distribution corresponding to the behavior characteristic; taking the classification distribution and the global distribution as a reference, and comprehensively calculating a final distribution result of the behavior characteristics through the personal distribution, the classification distribution and the global distribution by combining a weighting algorithm; wherein, step 2) includes:
(i) user base distribution of statistical behavior features
Acquiring all related keywords according to the behavior characteristics P, inquiring in user behavior data according to the keywords, if the behavior data relates to Chinese, performing corresponding word segmentation in advance, and setting the matched behavior data records as a set DSet for analyzing the relevant characteristics of the behavior characteristics P of the user;
for a user U, for the matched behavior data record set DSet, the distribution situation PFu of the behavior characteristic P of the user is counted, and smoothing is carried out, so that the influence of an abnormal extreme value is avoided;
(ii) calculating the final credibility distribution Pf of the behavior characteristics P based on the three distribution attributes
For the behavior feature P of the user U, querying classified distribution data Fc to which the behavior feature P belongs and global distribution data Fg of all users related to the behavior feature P from a behavior feature database, and calculating a final credible distribution Pf of the behavior feature P based on PFu, Fc and Fg, wherein Pf = K1 PFu + K2 Fc + K3 Fg, K1+ K2+ K3=1.0, and K1 is 0.6-0.8, and the fluctuation value is determined by the ratio of PFu/Fc to PFu/Fg;
(iii) likelihood evaluation value TPu of user characteristic T corresponding to behavior feature P is calculated
Calculating a likelihood assessment value TPu that generates a corresponding user characteristic T based on the final credibility distribution Pf of the behavior feature P;
TPu = f (Pf, Rate), where f is a binomial function, the final credibility distribution Pf is the credibility distribution of the behavior feature P, and Rate is the derived probability of the behavior feature P and the corresponding label Tag;
deriving a final evaluation possibility that the user has the user characteristic T from the behavior characteristic P;
3) evaluating a likelihood evaluation value of the associated user characteristic, expressed in probability, based on the final distribution result of the behavior feature of the user;
since various behavior characteristics can indicate that the user has the same characteristics, the final evaluation of the user characteristics T needs to be finally analyzed according to TPu of all associated behavior characteristics;
having calculated the likelihood evaluation value TPu of the user characteristic corresponding to each behavior feature in step 2), assuming that the set of behavior features P from which the user characteristic T can be derived is PSet, and each behavior feature P corresponds to one behavior feature, the evaluation result Tu of the user characteristic is calculated as follows: tu = f (TPu1, TPu2 …, TPuN), TPu1, TPu2 …, TPuN being the result of an evaluation of all behavioral characteristics of the PSet, N being between 10 and 20,
the evaluation result Tu finally defines the probability that the user has the shallow user characteristic T;
setting a tag evaluation result set of a user U as UT, and adding the user characteristic T and the evaluation result Tu into the tag evaluation result set UT;
repeating the steps 2) to 3) for all the shallow user characteristics to complete the calculation of all the evaluation results Tu related to the user U, so that the tag evaluation result set UT obtains the results of all the shallow user characteristics of the user U;
4) after all the labels related to the user behavior characteristics are calculated, shallow user characteristics are calculated;
5) then, based on a user characteristic deduction library, finding out the characteristics of deep labels of the user deduced from the characteristics of the shallow user identified by the current user, and based on a deduction mode, calculating the final evaluation result of the deep labels of the user, wherein the final evaluation result is represented by probability;
for all the shallow user characteristics of the user U obtained in the previous step, assuming that the set is TLSet and TLx is the shallow user characteristics of the user U, for each shallow user characteristic TLx, finding a derivation rule and a deep tag TagD, performing correlation calculation according to a related derivation mode, and finally generating an evaluation result TDu of all derivable deep user characteristics;
adding the TagD and the evaluation result TDu into a tag evaluation result set UT of the user U;
the circulation operation can generate all possible deep user characteristics for the user U;
6) calculating all the labels of a certain user, namely a shallow label and a deep label and related evaluation values, namely the user characteristics analyzed finally;
the user characteristics in the user research field refer to characteristics of a user based on self background and behavior, the characteristics define a certain side and tendency of the user, and the user characteristics comprise natural characteristics, life characteristics, interests, shopping preferences, value view and life style.
2. The method of identifying characteristics of a user from behavioral data according to claim 1, wherein the behavioral characteristics distributes the data, including: calculating classification distribution data Fc: based on the classification to which each behavior feature belongs, counting the distribution frequency or user proportion of the classification in the total behavior data;
calculating global distribution data Fg: and counting the average distribution of the relevant global situation on the basis of the group of matched users, wherein the statistical behavior data contains all users with the behavior characteristics.
3. The method of identifying user characteristics from behavioral data according to claim 1, wherein determining whether the shallow tag deduction deep tag deduction pattern is based on probability of likelihood or a distribution threshold; if the probability is deduced based on the probability, the credibility probability of the shallow label deducing to the deep label is between 0 and 1; if derived based on the distribution threshold, a derived minimum distribution threshold is generated, beyond which the likelihood of being deemed to have the deep tag is exceeded.
4. The method for identifying characteristics of a user from behavior data according to claim 1, wherein if a plurality of behavior characteristics of the user are mapped to the same tag in step 3), the final evaluation result of the tag is obtained by comprehensively calculating the probability evaluation values of the plurality of behavior characteristics based on an independent and same distribution principle of probability statistics.
CN201510701305.XA 2015-10-23 2015-10-23 Method for identifying user characteristics from behavior data Active CN105389714B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510701305.XA CN105389714B (en) 2015-10-23 2015-10-23 Method for identifying user characteristics from behavior data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510701305.XA CN105389714B (en) 2015-10-23 2015-10-23 Method for identifying user characteristics from behavior data

Publications (2)

Publication Number Publication Date
CN105389714A CN105389714A (en) 2016-03-09
CN105389714B true CN105389714B (en) 2022-07-05

Family

ID=55421972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510701305.XA Active CN105389714B (en) 2015-10-23 2015-10-23 Method for identifying user characteristics from behavior data

Country Status (1)

Country Link
CN (1) CN105389714B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056444A (en) * 2016-05-25 2016-10-26 腾讯科技(深圳)有限公司 Data processing method and device
CN106127515A (en) * 2016-06-22 2016-11-16 北京网智天元科技股份有限公司 A kind of passenger portrait and the method and device of data analysis
CN107016026B (en) * 2016-11-11 2020-07-24 阿里巴巴集团控股有限公司 User tag determination method, information push method, user tag determination device, information push device
CN108491490A (en) * 2018-03-14 2018-09-04 南京易好信息技术有限公司 Electric business platform Commercial goods labels Division identification system and method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778555A (en) * 2014-01-21 2014-05-07 北京集奥聚合科技有限公司 User attribute mining method and system based on user tags

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679771B (en) * 2013-11-29 2018-09-18 阿里巴巴集团控股有限公司 A kind of individuation data searching method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778555A (en) * 2014-01-21 2014-05-07 北京集奥聚合科技有限公司 User attribute mining method and system based on user tags

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于数据挖掘的社区网站用户行为分析***;黄碗明;《中国优秀硕士学位论文全文数据库 信息科技辑》;20120715(第07期);全文 *
微博用户行为分析技术的研究与实现;李政泽;《中国优秀硕士学位论文全文数据库 信息科技辑》;20141215(第12期);第I139-68页 *

Also Published As

Publication number Publication date
CN105389714A (en) 2016-03-09

Similar Documents

Publication Publication Date Title
US11574139B2 (en) Information pushing method, storage medium and server
CN107424043B (en) Product recommendation method and device and electronic equipment
US10783534B2 (en) Method, system and computer readable medium for creating a profile of a user based on user behavior
WO2019214245A1 (en) Information pushing method and apparatus, and terminal device and storage medium
CN108550068B (en) Personalized commodity recommendation method and system based on user behavior analysis
WO2021027595A1 (en) User portrait generation method and apparatus, computer device, and computer-readable storage medium
CN111062757A (en) Information recommendation method and system based on multi-path optimization matching
WO2019149145A1 (en) Compliant report class sorting method and apparatus
CN106682686A (en) User gender prediction method based on mobile phone Internet-surfing behavior
CN105893406A (en) Group user profiling method and system
US9607340B2 (en) Method and system for implementing author profiling
CN104077723B (en) A kind of social networks commending system and method
CN112269805A (en) Data processing method, device, equipment and medium
CN107515915A (en) User based on user behavior data identifies correlating method
CN105389714B (en) Method for identifying user characteristics from behavior data
CN112632405B (en) Recommendation method, recommendation device, recommendation equipment and storage medium
CN111159561A (en) Method for constructing recommendation engine according to user behaviors and user portrait
CN114840766A (en) User portrait construction method, system, equipment and storage medium
Ding et al. Establishing smartphone user behavior model based on energy consumption data
CN107070702B (en) User account correlation method and device based on cooperative game support vector machine
US20150142782A1 (en) Method for associating metadata with images
CN117455529A (en) User electricity utilization characteristic image construction method and system based on big data technology
Zhang et al. Discovering consumers’ purchase intentions based on mobile search behaviors
CN116186119A (en) User behavior analysis method, device, equipment and storage medium
CN116501957A (en) User tag portrait processing method, user portrait system, apparatus and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant