CN104281635A - Method for predicting basic attributes of mobile user based on privacy feedback - Google Patents

Method for predicting basic attributes of mobile user based on privacy feedback Download PDF

Info

Publication number
CN104281635A
CN104281635A CN201410092727.7A CN201410092727A CN104281635A CN 104281635 A CN104281635 A CN 104281635A CN 201410092727 A CN201410092727 A CN 201410092727A CN 104281635 A CN104281635 A CN 104281635A
Authority
CN
China
Prior art keywords
user
privacy
matrix
feedback
web page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410092727.7A
Other languages
Chinese (zh)
Inventor
程红蓉
夏勇
秦臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201410092727.7A priority Critical patent/CN104281635A/en
Publication of CN104281635A publication Critical patent/CN104281635A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for predicting basic attributes of a mobile user based on privacy feedback, which is used for predicting basic attributes such as age and gender of the user by analyzing the browsing content of the mobile user. The method comprises the following steps: classifying the browsing behaviors of the user based on the browsing log of the user to obtain a user behavior preference click matrix, and acquiring a relation feedback matrix of the user by adopting an LFM (Local File Manager) method; analyzing the relation between the browsing behaviors of the user and the user to predict the basic attributes of the user according to the browsing behaviors of the user; realizing classified prediction of the basic attributes of the user by adopting a Bayesian network model. According to the method, the gender of the user is processed as male and female, and the age is processed as a multi-type problem after segmentation. The method has the beneficial effects that the basic attributes such as the gender and age of the user can be predicted by analyzing the browsing behaviors of the mobile user. The age prediction accuracy can be over 85 percent, and the gender prediction accuracy can be over 92 percent.

Description

Based on the method for privacy feedback forecasting mobile subscriber primary attribute
Technical field
The present invention relates to Internet technology, be specifically related to the implementation method based on privacy feedback forecasting mobile subscriber primary attribute.
Background technology
In network application, the primary attribute of user plays important role, and user base attribute forecast is exactly the primary attribute predicting the user such as age, sex, Income situation, geographic position, schooling, religious belief of user by analyzing the navigation patterns of user and search content.Along with the arrival of web2.0 and the develop rapidly of mobile interchange, the primary attribute of user is more and more important in network application, and associated research contents becomes a focus of research.The personalized search service that such as Google provides, is recorded as user according to the geographical location information of user and the search history of user and returns personalized search listing, thinks the search service that user provides personalized.
The research of user base attribute forecast is mainly concentrated on the network log of user and the search content of user.The research of the network log of user is mainly predicted to the sex and age of author by the writing style and term custom studying user network daily record, the method taked is text based sorting technique mainly, as SVM text classification.To the contact that search content research of user is mainly analyzed between the search content of user and the primary attribute of user, realized the primary attribute prediction object to user, the method taked is generally statistical study and Association Rule Analysis.But, be no matter based on search content association analysis and or the classification prediction based on custom all can not reach desirable effect, recall rate and accuracy rate are still very low.
Summary of the invention
The object of this invention is to provide the implementation method based on privacy feedback forecasting mobile subscriber primary attribute.Use embodiment provided by the invention, can be predicted by the primary attribute of navigation patterns to user analyzing mobile subscriber.
The present invention passes through the browsing content analyzing mobile subscriber, the primary attributes such as the age of prediction user and sex.From the travel log of user, the content of the webpage browsed according to user is by Web relation recognition, and user is considered as the single ballot of user to such webpage to once browsing of such webpage, thus the webpage classification obtaining user clicks matrix.By the ID of user, the primary attribute of user and click are browsed and please be associated by money, analyze the relation between the navigation patterns of user and user, thus predict the primary attribute of user by the navigation patterns of user.The present invention using the sex of user as man and the process of woman two classification problem, as the process of many classification problems after age segmentations.The record of browsing of each user is considered as a text by the present invention, adopts naive Bayesian to the primary attribute modeling of user, realizes predicting the primary attribute of user.But, the webpage classification browsed of user is a lot of often, and the hobby of user is more stable within a period of time, therefore in order to solve the contradiction between finiteness that the openness of data and user interest be, the method for the method that we adopt privacy to feed back to us is improved.Propose based on naive Bayesian, privacy feedback and neighbor model algorithm and achieve the method based on privacy feedback forecasting mobile subscriber primary attribute.
The step of the method comprises:
1, crawl the content of the URL of user's access, by keyword match by Web relation recognition, obtain (user ID, webpage classification) value pair;
2, by (user ID, webpage classification) value to the click matrix R being converted into user, with TFIDF statistical method processing array R;
3, by key word user ID, user is clicked matrix and associate with the primary attribute of user, the primary attribute of user is set to class mark;
4, other prior probability of each web page class is calculated;
5, row normalized is done to matrix R, adopt the SVD method matrix decomposed after normalization to obtain privacy feedback matrix P and other privacy feedback matrix of web page class Q of user;
6, other top n of web page class neighbours are obtained in conjunction with other privacy feedback matrix of web page class Q neighbor model, and with other prior probability of this web page class of prior probability correction of neighbours;
7, the primary attribute of model-naive Bayesian to user is utilized to make prediction;
8, front M the neighbours of user are obtained in conjunction with privacy of user feedback matrix P and neighbor model, by the posterior probability of the posterior probability correction user of neighbours, for test sample book makes final prediction;
9, predicting the outcome to test test sample book is exported
Finally, implement the present invention and there is following beneficial effect:
The beneficial effect of the embodiment of the present invention is, can be made prediction to primary attributes such as the sex of user, ages by the navigation patterns analyzing mobile subscriber, wherein can reach the accuracy rate of more than 80% to the prediction of sex, the accuracy rate of more than 85% can be reached the prediction of sex.
Accompanying drawing explanation
Accompanying drawing is the algorithm flow of the implementation method based on privacy feedback forecasting mobile subscriber primary attribute that the present invention proposes.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.
In the present embodiment, as shown in the figure, the algorithm flow of the method that the present invention proposes is provided:
Step 101, crawl the content of URL of user's access, by keyword match by Web relation recognition, obtain (user ID, webpage classification) value pair;
Through key word process, a Visitor Logs is converted into (user ID, a webpage classification) value pair.
Step 102, by (user ID, webpage classification) value to the click matrix R being converted into user, with TFIDF statistical method processing array R;
Add up (the user ID obtained by access log, webpage classification) value pair, obtain other total n of web page class, the webpage classification of same user being accessed counts on a line, once access the single ballot be considered as such webpage, repeatedly access, for repeatedly to vote, finally obtain the click matrix of user, by TFIDF method, statistical treatment is done to click matrix.
Step 103, by key word user ID, user is clicked matrix and associate with the primary attribute of user, the primary attribute of user is set to class mark;
The sex of user is man and female two class, age of user be teenager (<18 year), juvenile (18-24), young (25-34), middle aged (35-49) and old age (>50) five class.
Step 104, calculate other prior probability of each web page class;
According to the click matrix of user and the primary attribute of user, calculate the probability that each webpage classification is accessed by corresponding primary attribute user, be other prior probability of this web page class.
Step 105, row normalized is done to matrix R, adopt the SVD method matrix decomposed after normalization to obtain privacy feedback matrix P and other privacy feedback matrix of web page class Q of user;
Be SVD to the click matrix after normalization to decompose, adopt stochastic gradient descent method to solve privacy feedback matrix P and other privacy feedback matrix of web page class Q of user, suitable iterations and data dimension K can be selected as required in the process solved.
Step 106, obtain other top n of web page class neighbours in conjunction with other privacy feedback matrix of web page class Q neighbor model, and with other prior probability of this web page class of prior probability correction of neighbours;
Webpage privacy after SVD decomposes is fed back Q matrix as other vector model of web page class, adopt the similarity between revised cosine similarity calculating user, obtain other front T neighbours of web page class, with other prior probability of this web page class of prior probability correction of neighbours, user's naive Bayesian is predicted.
Step 107, the primary attribute of model-naive Bayesian to user is utilized to make prediction;
To each user, according to the webpage situation of access, adopt Bayesian formula, calculate the probability that user belongs to each primary attribute classification, be the posterior probability of user, according to maximum likelihood thought, the maximum class of select probability is the classification of the corresponding primary attribute of user.
Step 108, obtain front M the neighbours of user in conjunction with privacy of user feedback matrix P and neighbor model, by the posterior probability of the posterior probability correction user of neighbours, for test sample book makes final prediction;
The privacy feedback matrix P of user is that user feeds back in the privacy of particular space, adopts neighbor model to obtain M neighbours before user, according to the posterior probability of the posterior probability correction self of neighbours, for test sample book makes final prediction based on the similarity of user.
Step 109, to export prediction the predicting the outcome of test sample book.
Export predicting the outcome.
Although be described the illustrative embodiment of the present invention above; so that the technician of this technology neck understands the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various change to limit and in the spirit and scope of the present invention determined, these changes are apparent, and all innovation and creation utilizing the present invention to conceive are all at the row of protection in appended claim.

Claims (1)

1. based on the method for privacy feedback forecasting mobile subscriber primary attribute: it is characterized in that, from the daily record of mobile phone user and search, according to the URL content browsed, according to keywords sorted out, the access of a user is converted into (user ID, webpage classification) value pair; (user ID, webpage classification) is converted into click matrix, clicks matrix with the process of TFIDF statistical method; By the primary attribute of user ID associated user, using the primary attribute of user as class mark; Calculate other prior probability of each web page class; Row normalization is done to click matrix, adopts SVD split-matrix to obtain privacy feedback matrix and other privacy feedback matrix of web page class of user; According to other privacy feedback matrix of web page class to other T of web page class neighbour, with other prior probability of prior probability correction web page class of neighbours; Naive Bayesian is utilized to calculate posterior probability all kinds of belonging to user; Draw N number of neighbours of user in conjunction with the privacy feedback matrix of user and neighbor model, posterior probability all kinds of belonging to the posterior probability correction user of neighbours, prolongs the class of maximum probability and makes prediction to user base attribute after user being classified as.
CN201410092727.7A 2014-03-13 2014-03-13 Method for predicting basic attributes of mobile user based on privacy feedback Pending CN104281635A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410092727.7A CN104281635A (en) 2014-03-13 2014-03-13 Method for predicting basic attributes of mobile user based on privacy feedback

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410092727.7A CN104281635A (en) 2014-03-13 2014-03-13 Method for predicting basic attributes of mobile user based on privacy feedback

Publications (1)

Publication Number Publication Date
CN104281635A true CN104281635A (en) 2015-01-14

Family

ID=52256511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410092727.7A Pending CN104281635A (en) 2014-03-13 2014-03-13 Method for predicting basic attributes of mobile user based on privacy feedback

Country Status (1)

Country Link
CN (1) CN104281635A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598648A (en) * 2015-02-26 2015-05-06 苏州大学 Interactive gender identification method and device for microblog user
CN105005918A (en) * 2015-07-24 2015-10-28 金鹃传媒科技股份有限公司 Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof
CN105138508A (en) * 2015-08-06 2015-12-09 电子科技大学 Preference diffusion based context recommendation system
CN106204083A (en) * 2015-04-30 2016-12-07 ***通信集团山东有限公司 A kind of targeted customer's sorting technique, Apparatus and system
CN106382719A (en) * 2016-08-30 2017-02-08 广东美的制冷设备有限公司 Air conditioner control method and air conditioner control system based on router
CN107180243A (en) * 2016-03-09 2017-09-19 精硕科技(北京)股份有限公司 The age recognition methods of the network user and system
CN107870941A (en) * 2016-09-27 2018-04-03 北京搜狗科技发展有限公司 A kind of Web page sequencing method, device and equipment
CN108229989A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 The Attribute class method for distinguishing and device of a kind of determining user property
CN109117889A (en) * 2018-08-23 2019-01-01 北京小米智能科技有限公司 Tag Estimation method and device
CN110688528A (en) * 2019-09-26 2020-01-14 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating classification information of video
CN114706987A (en) * 2022-06-06 2022-07-05 腾讯科技(深圳)有限公司 Text category prediction method, device, equipment, storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1954336A (en) * 2004-03-10 2007-04-25 韦伯拉玛公司 System and method for determining a profile of a user of a communication network
US20080104225A1 (en) * 2006-11-01 2008-05-01 Microsoft Corporation Visualization application for mining of social networks
US20090187520A1 (en) * 2008-01-23 2009-07-23 Chao Liu Demographics from behavior
WO2010026297A1 (en) * 2008-09-08 2010-03-11 Xtract Oy A method and an arrangement for predicting customer demographics
CN102236867A (en) * 2011-08-15 2011-11-09 悠易互通(北京)广告有限公司 Cloud computing-based audience behavioral analysis advertisement targeting system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1954336A (en) * 2004-03-10 2007-04-25 韦伯拉玛公司 System and method for determining a profile of a user of a communication network
US20080104225A1 (en) * 2006-11-01 2008-05-01 Microsoft Corporation Visualization application for mining of social networks
US20090187520A1 (en) * 2008-01-23 2009-07-23 Chao Liu Demographics from behavior
WO2010026297A1 (en) * 2008-09-08 2010-03-11 Xtract Oy A method and an arrangement for predicting customer demographics
CN102236867A (en) * 2011-08-15 2011-11-09 悠易互通(北京)广告有限公司 Cloud computing-based audience behavioral analysis advertisement targeting system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIAN HU ETAL: "基于用户浏览行为的基础属性预测", 《PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB. ACM 2007》 *
JOSH JIA-CHING YING ETAL: "基于移动用户行为的基础属性预测", 《MOBILE DATA CHALLENGE》 *
ROSIE JONES ETAL: "我知晓去年夏天你的所作所为", 《PROCEEDINGS OF THE SIXTEENTH ACM CONFERENCE ON CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598648B (en) * 2015-02-26 2017-12-26 苏州大学 A kind of microblog users interactive mode gender identification method and device
CN104598648A (en) * 2015-02-26 2015-05-06 苏州大学 Interactive gender identification method and device for microblog user
CN106204083A (en) * 2015-04-30 2016-12-07 ***通信集团山东有限公司 A kind of targeted customer's sorting technique, Apparatus and system
CN105005918A (en) * 2015-07-24 2015-10-28 金鹃传媒科技股份有限公司 Online advertisement push method based on user behavior data and potential user influence analysis and push evaluation method thereof
CN105005918B (en) * 2015-07-24 2018-07-17 金鹃传媒科技股份有限公司 A kind of online advertisement push appraisal procedure analyzed based on user behavior data and potential user's influence power
CN105138508A (en) * 2015-08-06 2015-12-09 电子科技大学 Preference diffusion based context recommendation system
CN107180243A (en) * 2016-03-09 2017-09-19 精硕科技(北京)股份有限公司 The age recognition methods of the network user and system
CN106382719B (en) * 2016-08-30 2019-10-15 广东美的制冷设备有限公司 A kind of air conditioning control method and system based on router realization
CN106382719A (en) * 2016-08-30 2017-02-08 广东美的制冷设备有限公司 Air conditioner control method and air conditioner control system based on router
CN107870941A (en) * 2016-09-27 2018-04-03 北京搜狗科技发展有限公司 A kind of Web page sequencing method, device and equipment
CN107870941B (en) * 2016-09-27 2021-11-02 北京搜狗科技发展有限公司 Webpage sorting method, device and equipment
CN108229989A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 The Attribute class method for distinguishing and device of a kind of determining user property
CN108229989B (en) * 2016-12-14 2020-09-22 北京国双科技有限公司 Method and device for determining attribute category of user attribute
CN109117889A (en) * 2018-08-23 2019-01-01 北京小米智能科技有限公司 Tag Estimation method and device
CN110688528A (en) * 2019-09-26 2020-01-14 北京字节跳动网络技术有限公司 Method, apparatus, electronic device, and medium for generating classification information of video
CN110688528B (en) * 2019-09-26 2023-04-07 抖音视界有限公司 Method, apparatus, electronic device, and medium for generating classification information of video
CN114706987A (en) * 2022-06-06 2022-07-05 腾讯科技(深圳)有限公司 Text category prediction method, device, equipment, storage medium and program product
CN114706987B (en) * 2022-06-06 2022-08-19 腾讯科技(深圳)有限公司 Text category prediction method, device, equipment, storage medium and program product

Similar Documents

Publication Publication Date Title
CN104281635A (en) Method for predicting basic attributes of mobile user based on privacy feedback
TWI612488B (en) Computer device and method for predicting market demand of commodities
CN107315759B (en) Method, device and processing system for classifying keywords and classification model generation method
CN110888990B (en) Text recommendation method, device, equipment and medium
CN107862022B (en) Culture resource recommendation system
US20160170982A1 (en) Method and System for Joint Representations of Related Concepts
CN103605658B (en) A kind of search engine system analyzed based on text emotion
JP5615857B2 (en) Analysis apparatus, analysis method, and analysis program
WO2013138968A1 (en) Method and system for hybrid information query
WO2013138969A1 (en) Method and system for recommending content to a user
CN104281634A (en) Neighborhood-based mobile subscriber basic attribute forecasting method
CN104573048A (en) User basic attribute predicting method based on flow data of smart phone
Ebadi et al. A hybrid multi-criteria hotel recommender system using explicit and implicit feedbacks
JP6719399B2 (en) Analysis device, analysis method, and program
Wang et al. Using social media mining technology to assist in price prediction of stock market
Ren et al. Where are you settling down: Geo-locating twitter users based on tweets and social networks
CN101645083A (en) Acquisition system and method of text field based on concept symbols
Bahamonde et al. Power structure in Chilean news media
JP2014197300A (en) Text information processor, text information processing method, and text information processing program
Xu et al. Leveraging app usage contexts for app recommendation: a neural approach
Rao et al. A machine learning approach to classify news articles based on location
Nguyen et al. Prediction of population health indices from social media using kernel-based textual and temporal features
Bottin et al. diatSOM: a R-package for diatom biotypology using self-organizing maps
TW201243627A (en) Multi-label text categorization based on fuzzy similarity and k nearest neighbors
CN112200674A (en) Stock market emotion index intelligent calculation information system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150114