CN105260877A - E-mail-based method for acquiring data of user portrait - Google Patents

E-mail-based method for acquiring data of user portrait Download PDF

Info

Publication number
CN105260877A
CN105260877A CN201510611139.4A CN201510611139A CN105260877A CN 105260877 A CN105260877 A CN 105260877A CN 201510611139 A CN201510611139 A CN 201510611139A CN 105260877 A CN105260877 A CN 105260877A
Authority
CN
China
Prior art keywords
email
user tag
user
weight
spam
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510611139.4A
Other languages
Chinese (zh)
Inventor
陶智明
张颖
梁家盛
张荣圣
谭自强
马幸晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CENTURY DRAGON INFORMATION NETWORK Co Ltd
Original Assignee
CENTURY DRAGON INFORMATION NETWORK Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CENTURY DRAGON INFORMATION NETWORK Co Ltd filed Critical CENTURY DRAGON INFORMATION NETWORK Co Ltd
Priority to CN201510611139.4A priority Critical patent/CN105260877A/en
Publication of CN105260877A publication Critical patent/CN105260877A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides an E-mail-based method for acquiring the data of a user portrait, and a system thereof. According to the method, firstly, a user label keyword, a user label in a user label database and corresponding to the user label keyword, and the weight of the user label are obtained from an e-mail. After that, whether the above e-mail is a spam mail or not is judged by means of an e-mail anti-spam system. Finally, based on the judgment result of the e-mail anti-spam system, the corresponding weight of the user label is correct. According to the technical scheme of the invention, the e-mail-based communication form is relatively formal and stringent. Meanwhile, the user label keyword and the relevant information are obtained from the e-mail, so that the attribute characteristics of a person can be reflected more accurately. At the same time, the collected basis data source of a user portrait is more concentrated and the relationship thereof is closer. Therefore, the finally exported user label is significant in pertinence, so that the accuracy of the user portrait is improved.

Description

The method of user's representation data is obtained based on Email
Technical field
The present invention relates to user and draw a portrait field, particularly a kind of method obtaining user's representation data based on Email.
Background technology
After internet steps into large data age gradually, inevitably for enterprise and consumer behaviour are brought a series of change and reinvented.Wherein maximum change no more than, all behaviors of consumer seem will be all " visual " in face of enterprise.Along with further investigation and the application of large data technique, the absorbed point of enterprise day by day focuses on and how to utilize large data to come for precision marketing service, and then deeply excavates potential commercial value.So the concept of " user's portrait " is also just arisen at the historic moment.
User draws a portrait, i.e. user profile labeling, after being exactly the data of enterprise by main informations such as collection and analysis consumer's society attribute, habits and customs, consumer behaviors, ideally taking out the business overall picture of a user, can be regarded as the basic mode that enterprise applies large data technique.User's portrait provides enough Information bases for enterprise, and enterprise can be helped to find the feedback information more widely such as accurate user group and user's request fast.
The focus work of user's portrait is exactly for user labels, and the signature identification of height refining that label normally artificially specifies, as age, sex, region, user preference etc., in general by all labels of user, substantially just can sketch the contours of the solid " portrait " of this user finally.
Just at present, most of basic data source of user's portrait is very scattered, and relation is not tight each other, lacks specific aim to the final user tag derived, and the accuracy of user's portrait of summarizing of therefore deriving is lower, and effect is not ideal enough.
Summary of the invention
For the problem that the accuracy of the user's portrait existed in above-mentioned prior art is lower, the object of the present invention is to provide a kind of method and the system thereof that obtain user's representation data based on Email, the accuracy of user's portrait can be improved.
Obtain a method for user's representation data based on Email, comprise the following steps:
Obtain the user tag keyword in Email;
Obtain user tag corresponding with described user tag keyword in user tag storehouse, and the weight corresponding with described user tag;
Obtain the spam criterion in Email anti-spam system, according to described spam criterion, carry out judging rubbish mail to described Email, wherein, described spam criterion comprises the spam criterion of described user tag and label weight;
The weight corresponding to described user tag according to judging rubbish mail result is revised.
The present invention also comprises a kind of system obtaining user's representation data based on Email, comprising:
Keyword acquisition module, for obtaining the user tag keyword in Email;
User tag acquisition module, for obtaining user tag corresponding with described user tag keyword in user tag storehouse, and the weight corresponding with described user tag;
Judging rubbish mail module, for obtaining the spam criterion in Email anti-spam system, according to described spam criterion, judging rubbish mail is carried out to described Email, wherein, described spam criterion comprises the spam criterion of described user tag and label weight;
Weight correcting module, revises for the weight corresponding to described user tag according to judging rubbish mail result.
Method and the system thereof obtaining user's representation data based on Email of the present invention, first obtain the weight of the user tag corresponding with described user tag keyword and described user tag in the user tag keyword in Email, user tag storehouse, recycling Email anti-spam system carries out judging rubbish mail to described Email, and finally corresponding to described user tag according to judging rubbish mail result weight is revised.Because E-mail communication form is more formal, rigorous, the user tag keyword obtained from Email content and relevant information thereof can reflect the attribute feature of people more accurately, therefore, the user collected draw a portrait basic data source concentrate, close relation, to the final user tag derived, there is obvious specific aim, thus improve the accuracy of user's portrait.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet obtaining the method for user's representation data based on Email of an embodiment;
Fig. 2 is the structural representation obtaining the system of user's representation data based on Email of an embodiment.
Embodiment
In order to make the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, the present invention is described in further detail.
Refer to the method flow schematic diagram obtaining user's representation data based on Email of an embodiment in Fig. 1.
Obtain a method for user's representation data based on Email, comprise the following steps:
S102, obtains the user tag keyword in Email.
By this step, user tag keyword is obtained from Email, because E-mail communication form is more formal, rigorous, the user tag keyword obtained from Email content and relevant information thereof can reflect the attribute feature of people more accurately, therefore, the user collected draws a portrait that basic data source is concentrated, close relation, thus improves user and draw a portrait the compactedness of basic data.
Wherein in an embodiment, the step of the user tag keyword in described acquisition Email comprises:
Obtain the content in Email, the content in described Email is mated with the user tag keyword preset;
Obtain the word met in the Email content of matching result, be the user tag keyword in Email.
By this embodiment, make the user tag keyword obtained more meet the description of default user tag, improve user further and draw a portrait the compactedness of basic data.
S104, obtains user tag corresponding with described user tag keyword in user tag storehouse, and the label weight corresponding with described user tag.
By this step, for judging rubbish mail step below provides basic data, thus improve the accuracy of judging rubbish mail.
S106, obtain the spam criterion in Email anti-spam system, according to described spam criterion, judging rubbish mail is carried out to described Email, wherein, described spam criterion comprises the spam criterion of described user tag and label weight.
By the spam criterion of described user tag and label weight, judging rubbish mail is carried out to described Email, the False Rate of spam can be reduced on the one hand, thus improve the accuracy rate of judging rubbish mail; Improve the priority that Email anti-spam judges on the other hand, the determination step that the system that decreases is follow-up, thus save system resource.
Wherein in an embodiment, described the step that Email carries out judging rubbish mail to be comprised:
Add described user tag keyword to Email header;
Email anti-spam system is utilized to carry out judging rubbish mail to the content in Email header.
By this embodiment, improve convenience Email being carried out to judging rubbish mail, thus save system resource.
S108, the weight corresponding to described user tag according to judging rubbish mail result is revised.
This step weight corresponding to described user tag according to judging rubbish mail result is revised, thus improves the accuracy of user's portrait.
Wherein in an embodiment, the step that the described weight corresponding to described user tag according to judging rubbish mail result is revised comprises:
According to described judging rubbish mail result, if be judged as spam, then power correction is subtracted to the weight of described user tag, otherwise, correction is weighted to the weight of described user tag.
Be judged as the Email of spam, power correction is subtracted to the label weight of its respective user label; Be judged as the Email of non-spam email, correction is weighted to the label weight of its respective user label, make the user tag weight that obtains more accurate, thus improve the accuracy of user's portrait.
In conjunction with above-described embodiment, by first obtaining the weight of the user tag corresponding with described user tag keyword and described user tag in user tag keyword in Email, user tag storehouse, recycling Email anti-spam system carries out judging rubbish mail to described Email, and finally corresponding to described user tag according to judging rubbish mail result weight is revised.Because E-mail communication form is more formal, rigorous, the user tag keyword obtained from Email content and relevant information thereof can reflect the attribute feature of people more accurately, therefore, the user collected draw a portrait basic data source concentrate, close relation, to the final user tag derived, there is obvious specific aim, thus improve the accuracy of user's portrait.
Wherein in an embodiment, the described method based on Email acquisition user representation data also comprises:
Obtain the historical data of user tag in user tag storehouse;
Add up described historical data, judge historical data grade according to statistics;
The weight of predetermined level weight to user tag according to described historical data grade and correspondence thereof is revised.
By this embodiment, improve the accuracy of user tag weight further, thus improve the accuracy of user's portrait further.
Wherein in an embodiment, the historical data of described user tag comprises the history keyword word of user tag, keyword weighting number of times, keyword subtract power number of times, mail sending frequency, mail interception frequency, spam history result of determination.
By above-mentioned historical data, the user collected is drawn a portrait, and basic data source is more concentrated, relation is tightr, has more specific aim to the final user tag derived.
The present invention also provides a kind of system obtaining user's representation data based on Email, as shown in Figure 2, comprising:
Keyword acquisition module 202, for obtaining the user tag keyword in Email.
Described keyword acquisition module 202, user tag keyword is obtained from Email, because E-mail communication form is more formal, rigorous, the user tag keyword obtained from Email content and relevant information thereof can reflect the attribute feature of people more accurately, therefore, the user collected draws a portrait that basic data source is concentrated, close relation, thus improves user and draw a portrait the compactedness of basic data.
User tag acquisition module 204, for obtaining user tag corresponding with described user tag keyword in user tag storehouse, and the weight corresponding with described user tag.
Described user tag acquisition module 204, for judging rubbish mail step below provides basic data, thus improves the accuracy of judging rubbish mail.
Judging rubbish mail module 206, for obtaining the spam criterion in Email anti-spam system, according to described spam criterion, judging rubbish mail is carried out to described Email, wherein, described spam criterion comprises the spam criterion of described user tag and label weight.
Described judging rubbish mail module 206, by the spam criterion of described user tag and label weight, judging rubbish mail is carried out to described Email, the False Rate of spam can be reduced on the one hand, thus improve the accuracy rate of judging rubbish mail; Improve the priority that Email anti-spam judges on the other hand, the determination step that the system that decreases is follow-up, thus save system resource.
Weight correcting module 208, revises for the weight corresponding to described user tag according to judging rubbish mail result.
Described weight correcting module 208, the weight corresponding to described user tag according to judging rubbish mail result is revised, thus improves the accuracy of user's portrait.
In above-described embodiment, because E-mail communication form is more formal, rigorous, the user tag keyword obtained from Email content and relevant information thereof can reflect the attribute feature of people more accurately, therefore, the user collected draw a portrait basic data source concentrate, close relation, to the final user tag derived, there is obvious specific aim, thus improve the accuracy of user's portrait.
Wherein in an embodiment, described keyword acquisition module 202 comprises:
Keywords matching module, for obtaining the content in Email, mates the content in described Email with the user tag keyword preset;
User tag obtains submodule, for obtaining the word met in the Email content of matching result, is the user tag keyword in Email.
By this embodiment, make the user tag keyword obtained more meet the description of default user tag, improve user further and draw a portrait the compactedness of basic data.
Wherein in an embodiment, described judging rubbish mail module 206 comprises:
Keyword adds module, for adding described user tag keyword to Email header;
Decision sub-module, carries out judging rubbish mail for utilizing Email anti-spam system to the content in Email header.
By this embodiment, improve convenience Email being carried out to judging rubbish mail, thus save system resource.
Wherein in an embodiment, described weight correcting module 208 comprises:
Subtract power correcting module, for according to described judging rubbish mail result, if be judged as spam, then power correction is subtracted to the weight of described user tag;
Weighting correcting module, for according to described judging rubbish mail result, if be judged as non-spam email, is then weighted correction to the weight of described user tag.
By this embodiment, be judged as the Email of spam, power correction is subtracted to the label weight of its respective user label; Be judged as the Email of non-spam email, correction is weighted to the label weight of its respective user label, make the user tag weight that obtains more accurate, thus improve the accuracy of user's portrait.
Wherein in an embodiment, the described system based on Email acquisition user representation data also comprises:
Historical data acquisition module, for obtaining the historical data of user tag in user tag storehouse;
Grade statistical module, for adding up described historical data, judges historical data grade according to statistics;
Historical data weight correcting module, for revising according to the weight of predetermined level weight to user tag of described historical data grade and correspondence thereof.
By this embodiment, improve the accuracy of user tag weight further, thus improve the accuracy of user's portrait further.
Wherein in an embodiment, the historical data of described user tag comprises the history keyword word of user tag, keyword weighting number of times, keyword subtract power number of times, mail sending frequency, mail interception frequency, spam history result of determination.
By above-mentioned historical data, the user collected is drawn a portrait, and basic data source is more concentrated, relation is tightr, has more specific aim to the final user tag derived.
Each technical characteristic of the above embodiment can combine arbitrarily, for making description succinct, the all possible combination of each technical characteristic in above-described embodiment is not all described, but, as long as the combination of these technical characteristics does not exist contradiction, be all considered to be the scope that this instructions is recorded.
The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims (10)

1. obtain a method for user's representation data based on Email, it is characterized in that, comprise the following steps:
Obtain the user tag keyword in Email;
Obtain user tag corresponding with described user tag keyword in user tag storehouse, and the label weight corresponding with described user tag;
Obtain the spam criterion in Email anti-spam system, according to described spam criterion, carry out judging rubbish mail to described Email, wherein, described spam criterion comprises the spam criterion of described user tag and label weight;
The weight corresponding to described user tag according to judging rubbish mail result is revised.
2. the method obtaining user's representation data based on Email according to claim 1, it is characterized in that, the step of the user tag keyword in described acquisition Email comprises:
Obtain the content in Email, the content in described Email is mated with the user tag keyword preset;
The acquisition word met in the Email content of matching result is the user tag keyword in Email.
3. the method obtaining user representation data based on Email according to claim 1, is characterized in that, describedly comprises the step that Email carries out judging rubbish mail:
Add described user tag keyword to Email header;
Email anti-spam system is utilized to carry out judging rubbish mail to the content in Email header.
4. the method obtaining user's representation data based on Email according to claim 1, it is characterized in that, the step that the described weight corresponding to described user tag according to judging rubbish mail result is revised comprises:
According to described judging rubbish mail result, if be judged as spam, then power correction is subtracted to the weight of described user tag, otherwise, correction is weighted to the weight of described user tag.
5. the method obtaining user's representation data based on Email according to claim 1, is characterized in that, also comprise:
Obtain the historical data of user tag in user tag storehouse;
Add up described historical data, judge historical data grade according to statistics;
The weight of predetermined level weight to user tag according to described historical data grade and correspondence thereof is revised.
6. obtain a system for user's representation data based on Email, it is characterized in that, comprising:
Keyword acquisition module, for obtaining the user tag keyword in Email;
User tag acquisition module, for obtaining user tag corresponding with described user tag keyword in user tag storehouse, and the weight corresponding with described user tag;
Judging rubbish mail module, for obtaining the spam criterion in Email anti-spam system, according to described spam criterion, judging rubbish mail is carried out to described Email, wherein, described spam criterion comprises the spam criterion of described user tag and label weight;
Weight correcting module, revises for the weight corresponding to described user tag according to judging rubbish mail result.
7. the system obtaining user's representation data based on Email according to claim 6, it is characterized in that, described keyword acquisition module comprises:
Keywords matching module, for obtaining the content in Email, mates the content in described Email with the user tag keyword preset;
User tag obtains submodule, for obtaining the word met in the Email content of matching result, is the user tag keyword in Email.
8. the system obtaining user's representation data based on Email according to claim 6, it is characterized in that, described judging rubbish mail module comprises:
Keyword adds module, for adding described user tag keyword to Email header;
Decision sub-module, carries out judging rubbish mail for utilizing Email anti-spam system to the content in Email header.
9. the system obtaining user's representation data based on Email according to claim 6, it is characterized in that, described weight correcting module comprises:
Subtract power correcting module, for according to described judging rubbish mail result, if be judged as spam, then power correction is subtracted to the weight of described user tag;
Weighting correcting module, for according to described judging rubbish mail result, if be judged as non-spam email, is then weighted correction to the weight of described user tag.
10. the system obtaining user's representation data based on Email according to claim 6, is characterized in that, also comprise:
Historical data acquisition module, for obtaining the historical data of user tag in user tag storehouse;
Grade statistical module, for adding up described historical data, judges historical data grade according to statistics;
Historical data weight correcting module, for revising according to the weight of predetermined level weight to user tag of described historical data grade and correspondence thereof.
CN201510611139.4A 2015-09-22 2015-09-22 E-mail-based method for acquiring data of user portrait Pending CN105260877A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510611139.4A CN105260877A (en) 2015-09-22 2015-09-22 E-mail-based method for acquiring data of user portrait

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510611139.4A CN105260877A (en) 2015-09-22 2015-09-22 E-mail-based method for acquiring data of user portrait

Publications (1)

Publication Number Publication Date
CN105260877A true CN105260877A (en) 2016-01-20

Family

ID=55100555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510611139.4A Pending CN105260877A (en) 2015-09-22 2015-09-22 E-mail-based method for acquiring data of user portrait

Country Status (1)

Country Link
CN (1) CN105260877A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133393A (en) * 2017-12-28 2018-06-08 新智数字科技有限公司 Data processing method and system
CN108388572A (en) * 2018-01-10 2018-08-10 链家网(北京)科技有限公司 A kind of user's portrait access method
CN109391535A (en) * 2017-08-02 2019-02-26 阿里巴巴集团控股有限公司 The contact person of domain grade determines method, spam judgment method and device
CN109548005A (en) * 2018-11-27 2019-03-29 浙江每日互动网络科技股份有限公司 The system for obtaining mobile terminal label information
CN113284509A (en) * 2021-05-06 2021-08-20 北京百度网讯科技有限公司 Method and device for acquiring accuracy of voice annotation and electronic equipment
CN114331368A (en) * 2021-12-31 2022-04-12 深圳市云登智能有限公司 Mail processing method and related equipment thereof
CN114693245A (en) * 2022-03-02 2022-07-01 深圳市小满科技有限公司 User portrait generation method and device, electronic equipment and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218355A (en) * 2012-01-18 2013-07-24 腾讯科技(深圳)有限公司 Method and device for generating tags for user

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218355A (en) * 2012-01-18 2013-07-24 腾讯科技(深圳)有限公司 Method and device for generating tags for user

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109391535A (en) * 2017-08-02 2019-02-26 阿里巴巴集团控股有限公司 The contact person of domain grade determines method, spam judgment method and device
CN108133393A (en) * 2017-12-28 2018-06-08 新智数字科技有限公司 Data processing method and system
CN108388572A (en) * 2018-01-10 2018-08-10 链家网(北京)科技有限公司 A kind of user's portrait access method
CN109548005A (en) * 2018-11-27 2019-03-29 浙江每日互动网络科技股份有限公司 The system for obtaining mobile terminal label information
CN109548005B (en) * 2018-11-27 2021-10-01 每日互动股份有限公司 System for acquiring tag information of mobile terminal
CN113284509A (en) * 2021-05-06 2021-08-20 北京百度网讯科技有限公司 Method and device for acquiring accuracy of voice annotation and electronic equipment
CN113284509B (en) * 2021-05-06 2024-01-16 北京百度网讯科技有限公司 Method and device for obtaining accuracy of voice annotation and electronic equipment
CN114331368A (en) * 2021-12-31 2022-04-12 深圳市云登智能有限公司 Mail processing method and related equipment thereof
CN114693245A (en) * 2022-03-02 2022-07-01 深圳市小满科技有限公司 User portrait generation method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN105260877A (en) E-mail-based method for acquiring data of user portrait
JP5759228B2 (en) A method for calculating semantic similarity between messages and conversations based on extended entity extraction
CN102479223B (en) Data query method and system
US11270316B2 (en) Systems, methods, and apparatuses for implementing automatic entry of customer relationship management (CRM) data into a CRM database system
US9015254B2 (en) Method and system for calculating email and email participant prominence
JP2018511116A (en) Method and device for selecting data content to be pushed to a terminal
CN102915307A (en) Device and method for recommending personalized information and information processing system
CN104881770A (en) Express bill information identification system and express bill information identification method
US11210284B2 (en) Method, system, apparatus, and computer-readable storage medium for sharing account resources
US20120143806A1 (en) Electronic Communications Triage
US10157228B2 (en) Communication system including a confidence level for a contact type and method of using same
CN105701488A (en) Identity card identification method
CN102419975B (en) A kind of data digging method based on speech recognition and system
US20120233197A1 (en) Social network system and member searching and analyzing method in social network
CN103001994B (en) friend recommendation method and device
CN110516057B (en) Petition question answering method and device
CN106599060B (en) Method and system for acquiring user portrait
US10296509B2 (en) Method, system and apparatus for managing contact data
CN105631016A (en) Guide type retrieval method and system
KR101930394B1 (en) Method of providing transparent quotation service by comparing quotation of builder with published unit price
CN104598780A (en) Account identification method and system
CN105453081A (en) Answering people-related questions
CN105701171A (en) User attribute based personalized big data searching method and system
CN107071181B (en) Method for automatically matching communication contact persons
CN112732923B (en) Express delivery logistics service semantic extraction method based on knowledge graph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160120

RJ01 Rejection of invention patent application after publication