CN106875278A - Social network user portrait method based on random forest - Google Patents

Social network user portrait method based on random forest Download PDF

Info

Publication number
CN106875278A
CN106875278A CN201710038836.4A CN201710038836A CN106875278A CN 106875278 A CN106875278 A CN 106875278A CN 201710038836 A CN201710038836 A CN 201710038836A CN 106875278 A CN106875278 A CN 106875278A
Authority
CN
China
Prior art keywords
attribute
label
social network
random forest
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710038836.4A
Other languages
Chinese (zh)
Other versions
CN106875278B (en
Inventor
琚春华
胡坤
鲍福光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN201710038836.4A priority Critical patent/CN106875278B/en
Publication of CN106875278A publication Critical patent/CN106875278A/en
Application granted granted Critical
Publication of CN106875278B publication Critical patent/CN106875278B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a kind of social network user portrait method based on random forest, following steps are specifically included:Obtain the multi-source attribute data of online social network sites;The data attribute set of original multi-source attribute is carried out into primitive attribute label, similarity function is called to the data attribute COLLECTION TRAVERSALSThe approx imately-detecting of different attribute;According to the decision tree of original individual layer multi-source attribute, after the data attribute set that similarity meets threshold range is merged into generation merging attribute tags, using random forests algorithm training sample;Ballot mode is obtained, the ballot mode that will be obtained assigns weight, according still further to the descending sequence of weight, obtain whole label weighted values;Retain the label in predetermined threshold value, forming new tag attributes collection is used for the portrait of attribute in user social contact network.Present invention aim at Random Forest model is used, the attribute tags for user are divided, and effectively improve the problem of traditional not enough and complexity that attribute is divided based on small sample sampling.

Description

Social network user portrait method based on random forest
Technical field
The present invention relates to online community network technical field, more particularly to a kind of social network user based on random forest Portrait method.
Background technology
The research of online community network is the major fields of academic research in recent years, and China has worldwide largest Internet netizen, therefore, generate substantial amounts of data during the early stage of internet promotes stage and use at this stage.Absolutely Most data resource is idle, it is impossible to processes well and commercial applications, huge loss is caused, while being also unfavorable for The further development of social networks, major Internet firms put into huge financial resources and manpower to online social relationships field one after another Carry out a series of researchs, the data resource of internet is reasonably developed and using significant.
The content of the invention
The present invention provides a kind of social network user portrait method based on random forest, it is therefore intended that use random forest Model, the attribute tags for user are divided, and are effectively improved traditional being sampled based on small sample and are divided the not enough and multiple of attribute The problem of miscellaneous degree.
To solve the above problems, the embodiment of the present invention provides a kind of social network user portrait side based on random forest Method, specifically includes following steps:
Obtain the multi-source attribute data of online social network sites;
The data attribute set of original multi-source attribute is carried out into primitive attribute label, similarity function is called to different attribute Data attribute COLLECTION TRAVERSALSThe approx imately-detecting;
According to the decision tree of original individual layer multi-source attribute, the data attribute set that similarity meets threshold range is merged into life Into after merging attribute tags, using random forests algorithm training sample;
Ballot mode is obtained, the ballot mode that will be obtained assigns weight, according still further to the descending sequence of weight, obtain complete The label weighted value in portion;
Retain the label in predetermined threshold value, forming new tag attributes collection is used for the portrait of attribute in user social contact network.
It is further comprising the steps of as a kind of implementation method:
Setting lowest detection terminates threshold value, when similarity terminates threshold value less than lowest detection, terminates the similar of the set Degree detection.
Used as a kind of implementation method, it is 0.15 that the lowest detection terminates threshold value.
Used as a kind of implementation method, the similarity function is:
Wherein, α be similarity regulation parameter, α ∈ [0,1],ω (x) represents label similarity two kinds higher Property function.
Used as a kind of implementation method, the α values are 0.001.
Used as a kind of implementation method, the label in the reservation predetermined threshold value, forming new tag attributes collection is used for user The portrait step of attribute, specifically includes following steps in social networks:
Setting label mode threshold value, when the ballot mode that random forests algorithm is obtained is less than label mode, then it is assumed that should Label is under-represented, gives up the label;
By the label after reservation according to the descending sequence of label weighted value, new tag attributes collection is formed.
Used as a kind of implementation method, the similarity threshold scope is [0.9,1].
The present invention is compared to the beneficial effect of prior art:Using Random Forest model, for the attribute mark of user Sign and divide, effectively improve the problem of traditional not enough and complexity that attribute is divided based on small sample sampling.
Brief description of the drawings
Fig. 1 is the flow chart of the social network user portrait method based on random forest of the invention.
Specific embodiment
Below in conjunction with accompanying drawing, the technical characteristic above-mentioned and other to the present invention and advantage are clearly and completely described, Obviously, described embodiment is only section Example of the invention, rather than whole embodiments.
As illustrated, a kind of social network user portrait method based on random forest, specifically includes following steps:
S100:The multi-source attribute data of online social network sites is obtained, data-storage system is conducted into;
S101:The data attribute set of original multi-source attribute is carried out into primitive attribute label, similarity function is called to not With the COLLECTION TRAVERSALSThe approx imately-detecting of attribute, similarity function is:
Wherein, wherein, α be similarity regulation parameter, α ∈ [0,1],ω (x) represents higher two of label similarity Attribute function.But α values are general very small in practice, depend on the test value of sample constantly to correct, and tied according to experiment Fruit shows that, when α improves an order of magnitude, the feature of selection is considerably less, and the numerical value obtained when α reduces an order of magnitude is several It is constant, therefore, α uses 0.001 in the present embodiment;
S102:Setting lowest detection terminates threshold value, when similarity terminates threshold value less than lowest detection, terminates the set Similarity detection, wherein, it is 0.15 that lowest detection terminates threshold value;
S103:According to the decision tree of original individual layer multi-source attribute, the set that similarity meets threshold range is merged into generation After merging attribute tags, using random forests algorithm training sample, similarity threshold scope is [0.9,1];
S104:Ballot mode is obtained, the ballot mode that will be obtained assigns weight, according still further to the descending sequence of weight, Obtain whole label weighted values;
S105:Retain the label in predetermined threshold value, forming new tag attributes collection is used for attribute in user social contact network Draw a portrait, specific embodiment is:Setting label mode threshold value, when the ballot mode that random forests algorithm is obtained is less than label mode During threshold value, then it is assumed that the label is under-represented, give up the label;Label after reservation is descending according to label weighted value Sequence, forms new tag attributes collection, and the user that new tag attributes collection is used for social networks draws a portrait.
The present invention is compared to the beneficial effect of prior art:Using Random Forest model, for the attribute mark of user Sign and divide, effectively improve the problem of traditional not enough and complexity that attribute is divided based on small sample sampling.
Particular embodiments described above, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, it will be appreciated that the foregoing is only specific embodiment of the invention, the protection being not intended to limit the present invention Scope.Particularly point out, to those skilled in the art, it is all within the spirit and principles in the present invention, done any repair Change, equivalent, improvement etc., should be included within the scope of the present invention.

Claims (7)

1. it is a kind of based on random forest social network user portrait method, it is characterised in that specifically include following steps:
Obtain the multi-source attribute data of online social network sites;
The data attribute set of original multi-source attribute is carried out into primitive attribute label, similarity function is called to the number of different attribute Approx imately-detecting is traveled through according to attribute set;
According to the decision tree of original individual layer multi-source attribute, the data attribute set that similarity meets threshold range is merged into generation and is closed And after attribute tags, using random forests algorithm training sample;
Acquisition ballot mode, the ballot mode that will be obtained assigns weight, according still further to the descending sequence of weight, obtains whole Label weighted value;
Retain the label in predetermined threshold value, forming new tag attributes collection is used for the portrait of attribute in user social contact network.
2. it is according to claim 1 based on random forest social network user portrait method, it is characterised in that also include Following steps:
Setting lowest detection terminates threshold value, when similarity terminates threshold value less than lowest detection, terminates the similarity inspection of the set Survey.
3. it is according to claim 2 based on random forest social network user portrait method, it is characterised in that it is described most It is 0.15 that low detection terminates threshold value.
4. it is according to claim 1 based on random forest social network user portrait method, it is characterised in that the phase It is like degree function:
Wherein, α be similarity regulation parameter, α ∈ 0,1,ω (x) represents label similarity two attribute functions higher.
5. it is according to claim 4 based on random forest social network user portrait method, it is characterised in that the α Value is 0.001.
6. it is according to claim 1 based on random forest social network user portrait method, it is characterised in that the guarantor The label in predetermined threshold value is stayed, forming new tag attributes collection is used for the portrait step of attribute in user social contact network, specific bag Include following steps:
Setting label mode threshold value, when the ballot mode that random forests algorithm is obtained is less than label mode, then it is assumed that the label It is under-represented, give up the label;
By the label after reservation according to the descending sequence of label weighted value, new tag attributes collection is formed.
7. it is according to claim 1 based on random forest social network user portrait method, it is characterised in that the phase It is [0.9,1] like degree threshold range.
CN201710038836.4A 2017-01-19 2017-01-19 Social network user image drawing method based on random forest Active CN106875278B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710038836.4A CN106875278B (en) 2017-01-19 2017-01-19 Social network user image drawing method based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710038836.4A CN106875278B (en) 2017-01-19 2017-01-19 Social network user image drawing method based on random forest

Publications (2)

Publication Number Publication Date
CN106875278A true CN106875278A (en) 2017-06-20
CN106875278B CN106875278B (en) 2020-11-03

Family

ID=59157771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710038836.4A Active CN106875278B (en) 2017-01-19 2017-01-19 Social network user image drawing method based on random forest

Country Status (1)

Country Link
CN (1) CN106875278B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596444A (en) * 2018-04-02 2018-09-28 清华大学 The method and device of large scale community network user sampling based on diversification strategy
CN108876470A (en) * 2018-06-29 2018-11-23 腾讯科技(深圳)有限公司 Tagging user extended method, computer equipment and storage medium
CN109635190A (en) * 2018-11-28 2019-04-16 四川亨通网智科技有限公司 User characteristics method for digging based on position and behavior Conjoint Analysis
CN109785034A (en) * 2018-11-13 2019-05-21 北京码牛科技有限公司 User's portrait generation method, device, electronic equipment and computer-readable medium
CN110659921A (en) * 2018-06-28 2020-01-07 上海传漾广告有限公司 Method and system for analyzing correlation between network advertisement audience behaviors and audience interests
CN112307831A (en) * 2019-07-31 2021-02-02 广州弘度信息科技有限公司 Violent movement detection method based on human body key point detection and tracking
CN113076476A (en) * 2021-04-01 2021-07-06 重庆邮电大学 User portrait construction method of microblog heterogeneous information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678659A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 E-commerce website cheat user identification method and system based on random forest algorithm
CN105824912A (en) * 2016-03-15 2016-08-03 平安科技(深圳)有限公司 Personalized recommending method and device based on user portrait
CN105868773A (en) * 2016-03-23 2016-08-17 华南理工大学 Hierarchical random forest based multi-tag classification method
US20160328837A1 (en) * 2015-05-08 2016-11-10 Kla-Tencor Corporation Method and System for Defect Classification

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678659A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 E-commerce website cheat user identification method and system based on random forest algorithm
US20160328837A1 (en) * 2015-05-08 2016-11-10 Kla-Tencor Corporation Method and System for Defect Classification
CN105824912A (en) * 2016-03-15 2016-08-03 平安科技(深圳)有限公司 Personalized recommending method and device based on user portrait
CN105868773A (en) * 2016-03-23 2016-08-17 华南理工大学 Hierarchical random forest based multi-tag classification method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FENG LIU 等: "MLRF:Multi-label Classification Through Random Forest with Label-Set Partition", 《ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS(ICIC 2015)》 *
刘勘 等: "基于随机森林分类的微博机器用户识别研究", 《北京大学学报(自然科学版)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596444A (en) * 2018-04-02 2018-09-28 清华大学 The method and device of large scale community network user sampling based on diversification strategy
CN108596444B (en) * 2018-04-02 2021-06-29 清华大学 Method and device for sampling large-scale social network users based on diversified strategies
CN110659921A (en) * 2018-06-28 2020-01-07 上海传漾广告有限公司 Method and system for analyzing correlation between network advertisement audience behaviors and audience interests
CN108876470A (en) * 2018-06-29 2018-11-23 腾讯科技(深圳)有限公司 Tagging user extended method, computer equipment and storage medium
CN109785034A (en) * 2018-11-13 2019-05-21 北京码牛科技有限公司 User's portrait generation method, device, electronic equipment and computer-readable medium
CN109635190A (en) * 2018-11-28 2019-04-16 四川亨通网智科技有限公司 User characteristics method for digging based on position and behavior Conjoint Analysis
CN112307831A (en) * 2019-07-31 2021-02-02 广州弘度信息科技有限公司 Violent movement detection method based on human body key point detection and tracking
CN112307831B (en) * 2019-07-31 2023-04-14 广州弘度信息科技有限公司 Violent movement detection method based on human body key point detection and tracking
CN113076476A (en) * 2021-04-01 2021-07-06 重庆邮电大学 User portrait construction method of microblog heterogeneous information
CN113076476B (en) * 2021-04-01 2021-11-30 重庆邮电大学 User portrait construction method of microblog heterogeneous information

Also Published As

Publication number Publication date
CN106875278B (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN106875278A (en) Social network user portrait method based on random forest
CN102289522B (en) Method of intelligently classifying texts
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
CN105893609B (en) A kind of mobile APP recommended method based on weighted blend
CN102521248B (en) Network user classification method and device
CN105760888B (en) A kind of neighborhood rough set integrated learning approach based on hierarchical cluster attribute
CN109961095B (en) Image labeling system and method based on unsupervised deep learning
CN109299258A (en) A kind of public sentiment event detecting method, device and equipment
CN101853470A (en) Collaborative filtering method based on socialized label
CN105677640A (en) Domain concept extraction method for open texts
CN107480213B (en) Community detection and user relation prediction method based on time sequence text network
CN103390046A (en) Multi-scale dictionary natural scene image classification method based on latent Dirichlet model
CN108804516A (en) Similar users search device, method and computer readable storage medium
CN108416535A (en) The method of patent valve estimating based on deep learning
CN104090936A (en) News recommendation method based on hypergraph sequencing
CN112559764A (en) Content recommendation method based on domain knowledge graph
CN105046274A (en) Automatic labeling method for electronic commerce commodity category
CN105183748A (en) Combined forecasting method based on content and score
CN104252616A (en) Human face marking method, device and equipment
CN109191210A (en) A kind of broadband target user's recognition methods based on Adaboost algorithm
CN111723666A (en) Signal identification method and device based on semi-supervised learning
CN110334278A (en) A kind of web services recommended method based on improvement deep learning
CN110059222A (en) A kind of video tab adding method based on collaborative filtering
WO2022188646A1 (en) Graph data processing method and apparatus, and device, storage medium and program product
CN111625838A (en) Vulnerability scene identification method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant