WO2015021937A1

WO2015021937A1 - Method and device for user recommendation

Info

Publication number: WO2015021937A1
Application number: PCT/CN2014/084402
Authority: WO
Inventors: 程刚
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2013-08-14
Filing date: 2014-08-14
Publication date: 2015-02-19
Also published as: CN104376010A; CN104376010B

Abstract

Provided in the present invention is a method for user recommendation. The method comprises: reading an interest tag of a user and a score corresponding to the interest tag; reading a specialization tag of the user and a score corresponding to the specialization tag; generating a degree of match between two users on the basis of the score corresponding to the interest tag and of the score corresponding to the specialization tag; and, selecting, on the basis of the degree of match, a to-be-recommended user for recommendation. The user recommendation method allows for reduced pushing of redundant information, thus conserving network resources. In addition, also provided is a device for user recommendation.

Description

用户推荐方法和装置 User recommendation method and device

相关申请的交叉引用 Cross-reference to related applications

本申请要求于 2013 年 8 月 14 日提交中国专利局、申请号为 201310354181. 3、发明名称为 "用户推荐方法和装置" 的中国专利申请的优先权，其全部内容通过引用结合在本申请中。 The present application claims priority to Chinese Patent Application No. 201310354181. 3, entitled "User-Recommended Method and Apparatus", which is incorporated herein by reference. .

技术领域 Technical field

本发明涉及网络技术领域，特别是涉及一种用户推荐方法和装置。 The present invention relates to the field of network technologies, and in particular, to a user recommendation method and apparatus.

背景技术 Background technique

传统的用户推荐方法通常采用基于好友关系的推荐方式，比如，若两个用户有共同的好友或共同关注了某些人，则可将其中一个用户推荐给另一个用户。 The traditional user recommendation method usually adopts a friend-based recommendation method. For example, if two users have a common friend or pay attention to some people together, one of the users can be recommended to another user.

然而，发明人发现现有技术中至少存在以下技术问题： However, the inventors have found that at least the following technical problems exist in the prior art:

基于好友关系的推荐方式虽然可以有效拓展用户的社交关系，但是仅依赖用户的好友关系或者关注人的话，推荐给用户的人往往不符合用户所需，容易造成用户推荐的盲目性，这样，必然会造成大量冗余信息的推送，从而浪费了网络资源。 Although the recommendation method based on the friend relationship can effectively expand the social relationship of the user, but only relying on the friend relationship of the user or paying attention to the person, the person recommended to the user often does not meet the user's needs, and is likely to cause blindness of the user recommendation, thus, inevitably It will cause a lot of redundant information to be pushed, thus wasting network resources.

发明内容 Summary of the invention

基于此，有必要针对上述技术问题，提供一种能减少冗余信息推送，从而节省网络资源的用户推荐方法和装置。一种用户推荐方法，方法包括： Based on this, it is necessary to provide a user recommendation method and apparatus that can reduce redundant information push and save network resources in response to the above technical problems. A user recommendation method includes:

读取用户的兴趣标签和兴趣标签对应的分值； Reading the score corresponding to the user's interest tag and the interest tag;

读取用户的擅长标签和擅长标签对应的分值； Read the user's good tag and the score corresponding to the tag;

根据兴趣标签对应的分值和擅长标签对应的分值生成两个用户之间的匹配度； Generating the matching degree between the two users according to the score corresponding to the interest tag and the score corresponding to the tag;

根据匹配度选取待推荐用户进行推荐。 The recommended users are selected according to the matching degree for recommendation.

一种用户推荐装置，装置包括： A user recommendation device, the device comprising:

兴趣标签读取模块，用于读取用户的兴趣标签和兴趣标签对应的分值；擅长标签读取模块，用于读取用户的擅长标签和擅长标签对应的分值；第一匹配度生成模块，用于根据兴趣标签对应的分值和擅长标签对应的分值生成两个用户之间的匹配度； The interest tag reading module is configured to read the score corresponding to the user's interest tag and the interest tag; the tag reading module is used to read the user's good tag and the score corresponding to the tag; the first matching degree generating module , configured to generate a matching degree between two users according to the score corresponding to the interest tag and the score corresponding to the good tag;

用户推荐模块，用于根据所述匹配度选取待推荐用户进行推荐。 The user recommendation module is configured to select a user to be recommended for recommendation according to the matching degree.

上述用户推荐方法和装置，由于兴趣标签表示了用户感兴趣的领域或词汇，而擅长标签表示了用户擅长的领域或词汇，通过兴趣标签对应的分值和擅长标签对应的分值来生成两个用户之间的匹配度，从而将两个用户的兴趣和擅长进行相互匹配，根据匹配度选取待推荐用户进行推荐，推荐给用户的人很可能是用户对其感兴趣的内容比较擅长的，或者对其擅长的内容比较感兴趣的，从而使得推荐给用户的人更能满足用户所需，避免了用户推荐的盲目性，从而减少了冗余信息的推送，节省了网络资源。 The above user recommendation method and device, because the interest tag indicates the domain or vocabulary that the user is interested in, and the tag is good for the domain or vocabulary that the user is good at, and the score corresponding to the interest tag and the score corresponding to the tag are used to generate two. The degree of matching between users, so that the interests and expertise of the two users are matched with each other, and the recommended users are selected according to the matching degree for recommendation, and the person recommended to the user is likely to be better at the content that the user is interested in, or The content that is good at it is more interesting, so that the person recommended to the user can better meet the user's needs, avoiding the blindness of the user recommendation, thereby reducing the push of redundant information and saving network resources.

附图说明 DRAWINGS

图 1 为一个实施例中用户推荐方法的流程示意图；图 2 为一个实施例中挖掘用户的兴趣标签的流程示意图；图 3 为另一个实施例中挖掘用户的兴趣标签的流程示意图； 1 is a schematic flow chart of a user recommendation method in an embodiment; 2 is a schematic flowchart of mining an interest tag of a user in an embodiment; FIG. 3 is a schematic flowchart of mining an interest tag of a user in another embodiment;

图 4为一个实施例中根据专业相关数据挖掘用户的擅长标签的流程示意图； FIG. 4 is a schematic flow chart of mining a user's good tag according to professional related data in an embodiment; FIG.

图 5 为另一个实施例中根据专业相关数据挖掘用户的擅长标签的流程示意图； FIG. 5 is a flow chart showing another embodiment of mining a user's good tag according to professional related data;

图 6 为一个实施例中根据专业相关数据和个人信息挖掘用户的擅长标签的流程示意图； 6 is a schematic flow chart of mining a user's exclusivity tag according to professional related data and personal information in one embodiment;

图 7 为图 6 所示实施例提供的挖掘用户的擅长类别的流程示意图；图 8 为一个实施例中生成两个用户之间的匹配度的流程示意图；图 9 为另一个实施例中生成两个用户之间的匹配度的流程示意图；图 10 为一个实施例中用户推荐装置的结构框图； FIG. 7 is a schematic flowchart of the excavation user's excellent category provided by the embodiment shown in FIG. 6. FIG. 8 is a schematic flowchart of generating a matching degree between two users in one embodiment; FIG. 9 is a second embodiment generated in another embodiment. Schematic diagram of the matching degree between users; FIG. 10 is a structural block diagram of a user recommendation device in one embodiment;

图 11 为另一个实施例中用户推荐装置的结构框图； 11 is a structural block diagram of a user recommendation device in another embodiment;

图 12 为一个实施例中兴趣标签挖掘模块的结构框图； 12 is a structural block diagram of an interest tag mining module in an embodiment;

图 13 为另一个实施例中兴趣标签挖掘模块的结构框图； 13 is a structural block diagram of an interest tag mining module in another embodiment;

图 14 为再一个实施例中用户推荐装置的结构框图； Figure 14 is a block diagram showing the structure of a user recommendation device in still another embodiment;

图 15 为一个实施例中第一擅长标签挖掘模块的结构框图； 15 is a structural block diagram of a first excel tag mining module in an embodiment;

图 16 为另一个实施例中第一擅长标签挖掘模块的结构框图； 16 is a structural block diagram of a first excel tag mining module in another embodiment;

图 17 为又一个实施例中用户推荐装置的结构框图； 17 is a structural block diagram of a user recommendation device in still another embodiment;

图 18 为一个实施例中第二擅长标签挖掘模块的结构框图； 18 is a structural block diagram of a second excel tag mining module in an embodiment;

图 19 为另一个实施例中第二擅长标签挖掘模块的结构框图；图 20 为一个实施例中匹配度生成模块的结构框图； 19 is a structural block diagram of a second excel tag mining module in another embodiment; 20 is a structural block diagram of a matching degree generation module in an embodiment;

图 21 为另一个实施例中匹配度生成模块的结构框图； 21 is a structural block diagram of a matching degree generation module in another embodiment;

图 22 为本发明实施例可在其中实现的计算环境的示意图。 22 is a schematic diagram of a computing environment in which embodiments of the present invention may be implemented.

具体实施方式 detailed description

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。 The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

在一个实施例中，提供了一种用户推荐方法，该方法可以应用于各种服务器，如图 22所示的服务器 1101，为了描述方便，以两个用户之间的匹配为例描述本实施例。如图 1 所示，该方法包括： In an embodiment, a user recommendation method is provided. The method can be applied to various servers, such as the server 1101 shown in FIG. 22. For the convenience of description, the present embodiment is described by taking the matching between two users as an example. . As shown in Figure 1, the method includes:

步骤 102，读取用户的兴趣标签和兴趣标签对应的分值。 Step 102: Read a score corresponding to the user's interest tag and the interest tag.

服务器预先存储了与用户标识对应的兴趣标签和兴趣标签对应的分值，其中兴趣标签包括标签词和标签词所属类别，用以表征用户感兴趣的内容。如兴趣标签可以是 "外科疾病" ，表示用户对外科疾病的内容感兴趣，兴趣标签也可以是 "军事" ，表示用户对军事类别的内容感兴趣，等。而兴趣标签对应的分值表示用户对标签的相关内容感兴趣的程度。 The server pre-stores a score corresponding to the interest tag and the interest tag corresponding to the user identifier, where the interest tag includes a tag word and a category to which the tag word belongs to represent the content that the user is interested in. If the interest tag can be "surgical disease", the user is interested in the content of the surgical disease, and the interest tag can also be "military", indicating that the user is interested in the content of the military category, and so on. The score corresponding to the interest tag indicates the degree to which the user is interested in the relevant content of the tag.

步骤 104，读取用户的擅长标签和擅长标签对应的分值。 Step 104: Read the user's good tag and the score corresponding to the tag.

服务器预先存储了与用户标识对应的擅长标签和擅长标签对应的分值，其中擅长标签包括标签词和标签词所属类别，用以表征用户擅长的内容。如擅长标签可以是 "法律" ，表示用户对法律类别的内容比较擅长，等。而擅长标签对应的分值则表示用户对该标签的相关内容擅长的程度。本实施例中，用户的兴趣标签可预先从用户大量的线上行为数据挖掘出来，而兴趣标签对应的分值则可通过对用户的线上行为数据中的文档进行处理得到。用户的擅长标签可预先从用户大量的专业相关数据中挖掘出来，而擅长标签对应的分值则可通过对用户的专业相关数据中的文档进行处理得到。得到每个用户的兴趣标签、兴趣标签对应的分值、擅长标签和擅长标签对应的分值后，可存储在服务器中，以便在推荐用户的时候读取出来进行处理。 The server pre-stores the scores corresponding to the user-identified tag and the tag that is good at the tag, wherein the tag includes the tag word and the category of the tag word, and is used to represent the content that the user is good at. If you are good at labeling, it can be "legal", indicating that the user is good at the content of the legal category, and so on. The score corresponding to the tag is the extent to which the user is good at the relevant content of the tag. In this embodiment, the user's interest tag may be pre-extracted from a large amount of online behavior data of the user, and the score corresponding to the interest tag may be obtained by processing the document in the online behavior data of the user. The user's good tag can be excavated from the user's large amount of professional related data in advance, and the score corresponding to the tag can be obtained by processing the document in the user's professional related data. After obtaining the interest tag of each user, the score corresponding to the interest tag, the tag that is good at the tag, and the score corresponding to the tag, the score can be stored in the server, so that it can be read and processed when the user is recommended.

步骤 106，根据兴趣标签对应的分值和擅长标签对应的分值生成两个用户之间的匹配度。 Step 106: Generate a matching degree between the two users according to the score corresponding to the interest tag and the score corresponding to the tag.

具体的，两个用户之间的匹配度表示了其中一个用户的兴趣标签与另一个用户的擅长标签匹配、以及该用户的擅长标签与另一个用户的兴趣标签匹配的程度。对于两个用户，可将其中一个用户的兴趣标签匹配另一个用户的擅长标签，得到该用户的兴趣标签和另一个用户的擅长标签的相似度，以及将该用户的擅长标签匹配另一个用户的兴趣标签，得到该用户的擅长标签和另一个用户的兴趣标签的相似度，然后结合兴趣标签对应的分值和擅长标签对应的分值从而生成得到两个用户之间的匹配度。 Specifically, the degree of matching between the two users indicates the degree to which one of the user's interest tags matches another user's good tag, and the user's good tag matches the other user's interest tag. For two users, one of the user's interest tags can be matched to another user's good tag, the similarity of the user's interest tag and another user's good tag is obtained, and the user's good tag is matched with another user's The interest tag obtains the similarity between the user's good tag and the other user's interest tag, and then combines the score corresponding to the interest tag and the score corresponding to the tag to generate a matching degree between the two users.

步骤 108，根据匹配度选取待推荐用户进行推荐。 Step 108: Select a user to be recommended according to the matching degree to perform recommendation.

本实施例中，对于服务器中的每个用户，都可以生成该用户与服务器中的其他用户之间的匹配度。这样，在确定该用户的推荐列表时，可选取匹配度最高的预设数量的待推荐用户进行推荐。例如，可选取与该用户的匹配度最高的前 100个待推荐用户进行推荐。进一步的，可获取选取的待推荐用户的个人信息，包括用户所在 SNS 社区的昵称、头像等信息，将这些信息通过网络发送至用户所在终端。所述终端例如是图 22所示的计算平台 1104、桌面型计算机 1105、笔记本 1106、移动计算设备（例如个人数字助理 PDA) 1107、移动电话 1108中的任一个。 In this embodiment, for each user in the server, the degree of matching between the user and other users in the server may be generated. In this way, when determining the recommendation list of the user, the preset number of recommended users with the highest matching degree may be selected for recommendation. For example, the top 100 to-be-recommended users with the highest degree of matching with the user may be selected for recommendation. Further, the selected user to be recommended may be obtained. Personal information, including the nickname and avatar of the SNS community where the user is located, sends the information to the user's terminal through the network. The terminal is, for example, any one of the computing platform 1104, the desktop computer 1105, the notebook 1106, the mobile computing device (e.g., personal digital assistant PDA) 1107, and the mobile phone 1108 shown in FIG.

本实施例中，通过兴趣标签对应的分值和擅长标签对应的分值来生成两个用户之间的匹配度，从而将两个用户的兴趣和擅长进行相互匹配，根据匹配度选取待推荐用户进行推荐，推荐给用户的人很可能是用户对其感兴趣的内容比较擅长的，或者对其擅长的内容比较感兴趣的，从而使得推荐给用户的人更能满足用户所需，避免了用户推荐的盲目性，从而减少了冗余信息的推送，节省了网络资源。 In this embodiment, the matching degree between the two users is generated by the score corresponding to the interest tag and the score corresponding to the tag, so that the interests and the goodness of the two users are matched with each other, and the user to be recommended is selected according to the matching degree. For recommendation, the person recommended to the user is likely to be interested in the content that the user is interested in, or interested in the content that he is good at, so that the person recommended to the user can better meet the user's needs and avoid the user. The recommended blindness reduces the push of redundant information and saves network resources.

在一个实施例中，用户推荐方法还包括：收集用户的线上行为数据，根据用户的线上行为数据挖掘用户的兴趣标签。 In one embodiment, the user recommendation method further includes: collecting online behavior data of the user, and mining the user's interest tag according to the online behavior data of the user.

用户的线上行为数据为用户使用各种网络应用所形成的数据，包括但不限于用户在搜索网站中的搜索关键词、用户在微博上发布的微博、用户在 SNS 社区中发表的日志、评论等、用户的群聊天记录、用户在问答社区的提问或回答和用户在论坛上发表的帖子或回复，等等。对于每个用户，可从不同的业务服务器中获取到与该用户标识对应的线上行为数据，进而根据线上行为数据挖掘出该用户的兴趣标签，并对应用户标识进行存储。所述业务服务器例如是图 22所示的服务器 1102或 1103，这些服务器可如 1102所示自身包括数据库，或如 1103所示另外附接有数据库。 The online behavior data of the user is data formed by the user using various network applications, including but not limited to the search keywords of the user in the search website, the microblog posted by the user on Weibo, and the log published by the user in the SNS community. , comments, etc., the user's group chat history, the user's questions or answers in the Q&A community, and the posts or replies posted by the user on the forum, and so on. For each user, the online behavior data corresponding to the user identifier is obtained from different service servers, and the user's interest tag is mined according to the online behavior data, and is stored corresponding to the user identifier. The service server is, for example, the server 1102 or 1103 shown in Fig. 22, which may itself include a database as shown at 1102, or additionally have a database as shown at 1103.

进一步的，在一个实施例中，如图 2 所示，根据线上行为数据挖掘用户的兴趣标签，包括： Further, in an embodiment, as shown in FIG. 2, the user is mined according to the online behavior data. Interest tags, including:

步骤 202，对线上行为数据中的文档进行分词。 Step 202: Perform word segmentation on the document in the online behavior data.

本实施例中，可提取出用户的线上行为数据中的文档，然后采用传统的分词方法对文档中的内容进行分割，并去除掉一些常见的副词、动词和名词，如 "你" 、 "我" 、 "的" 、 "得"等，得到多个标签词。 In this embodiment, the document in the online behavior data of the user may be extracted, and then the traditional word segmentation method is used to segment the content in the document, and some common adverbs, verbs and nouns, such as "you", are removed. I get "," "," "get", etc., and get multiple tag words.

步骤 204，计算分词后得到的标签词对应的分值为标签词的词频与用户的所有标签词的词频总和的比率。即，计算分词后得到的标签词的词频与用户的所有标签词的词频总和这两者之间的比率，作为标签词对应的分值。 Step 204: The score corresponding to the tag word obtained after the segmentation is calculated is the ratio of the word frequency of the tag word to the word frequency sum of all the tag words of the user. That is, the ratio between the word frequency of the tag word obtained after the word segmentation and the word frequency sum of all the tag words of the user is calculated as the score corresponding to the tag word.

步骤 204 中，统计分词后得到的标签词的词频，即该标签词出现的频率，按照如下公式计算每个标签词对应的分值： In step 204, the word frequency of the tag word obtained after the segmentation is counted, that is, the frequency of occurrence of the tag word, and the score corresponding to each tag word is calculated according to the following formula:

Ins (x) = pv (x) / pv (al l) Ins (x) = pv (x) / pv (al l)

其中， Ins (x)表示标签词 x 对应的分值， pv (x)表示标签词 X 的词频， pv (al l)表示该用户的所有标签词的词频总和。 Wherein, Ins (x) represents the score corresponding to the tag word x, pv (x) represents the word frequency of the tag word X, and pv (al l) represents the word frequency sum of all tag words of the user.

步骤 206，根据标签词对应的分值选取标签词作为用户的兴趣标签。具体的，可根据标签词对应的分值选取分值最大的预设数量的标签词作为用户的兴趣标签。比如，选取分值最大的 10个标签词作为用户的兴趣标签。 Step 206: Select a tag word as the user's interest tag according to the score corresponding to the tag word. Specifically, the preset number of tag words with the largest score may be selected as the user's interest tag according to the score corresponding to the tag word. For example, select the 10 tag words with the largest score as the user's interest tag.

在另一个实施例中，如图 3 所示，根据线上行为数据挖掘用户的兴趣标签，包括： In another embodiment, as shown in FIG. 3, the user's interest tag is mined according to the online behavior data, including:

步骤 302，对线上行为数据中的文档进行分词。 Step 302: Perform word segmentation on the document in the online behavior data.

步骤 304，对分词后得到的标签词进行归类。 Step 304: classify the tag words obtained after the word segmentation.

具体的，可人工对分词后得到的标签词进行归类，也可按照传统的机器学习的方法对标签词进行归类。比如，标签词所属类别包括：科技、教育、军事、医学等。 Specifically, the label words obtained after the word segmentation can be manually classified, or according to the traditional machine. The learning method classifies the tag words. For example, the categories of tag words include: technology, education, military, medicine, etc.

步骤 306，计算分词后得到的标签词对应的分值为标签词的词频与用户的所有标签词的词频总和的比率。 Step 306: The score corresponding to the tag word obtained after the segmentation is calculated is the ratio of the word frequency of the tag word to the word frequency sum of all the tag words of the user.

步骤 308，根据标签词对应的分值计算标签词所属类别对应的分值。本实施例中，可获取用户在每个类别下的标签词和标签词的分值，则用户在每个类别对应的分值为该类别下的标签词的分值的总和。例如，标签词所属类别包括 A、 B 和 C三个类别，在 A类别下的标签词和对应的分值包括 {tagAl:3 分} 、 {tagA2:2 分}和 {tagA3:3 分} ，在 B类别下的标签词和对应的分值包括： {tagBl:2 分}和 {tagB2:l 分} ，在 C类别下的标签词和对应的分值包括 {tagCl:3 分} ，则该用户的 A 类别对应的分值为 6 分， B类别对应的分值为 4分， C 类别对应的分值为 3分。 Step 308: Calculate a score corresponding to the category to which the tag word belongs according to the score corresponding to the tag word. In this embodiment, the scores of the tag words and the tag words of each user in the category can be obtained, and the score corresponding to each category of the user is the sum of the scores of the tag words under the category. For example, the category of the tag word includes the three categories A, B, and C. The tag words and corresponding scores under the A category include {tagAl:3 points}, {tagA2:2 points}, and {tagA3:3 points}. The tag words and corresponding scores under the B category include: {tagBl: 2 points} and {tagB2: l points}, the tag words under the C category and the corresponding scores include {tagCl: 3 points}, then The user's A category has a score of 6 points, the B category has a score of 4 points, and the C category has a score of 3 points.

步骤 310，根据标签词所属类别对应的分值选取类别作为用户的兴趣类别。 Step 310: Select a category as the user's interest category according to the score corresponding to the category to which the tag word belongs.

具体的，可选取标签词所属类别对应的分值最大的预设数量的类别作为用户的兴趣类别，比如，选取分值最大的 2个类别作为用户的兴趣类别。本实施例中，还可根据标签词对应的分值选取标签词作为用户的兴趣标签，比如选取分值最大的 8个标签词作为用户的兴趣标签。本实施例中，每个用户的兴趣标签包括用户感兴趣的类别和标签词，使得后续在生成匹配度时，除了可根据标签词对应的分值进行计算外，还可根据类别对应的分值进行计算。 Specifically, the preset number of categories with the largest score corresponding to the category to which the label word belongs may be selected as the interest category of the user, for example, the two categories with the largest score are selected as the interest category of the user. In this embodiment, the tag word may be selected as the user's interest tag according to the score corresponding to the tag word, for example, the eight tag words with the largest score are selected as the user's interest tag. In this embodiment, each user's interest tag includes a category and a tag word that the user is interested in, so that when the matching degree is generated, in addition to the score corresponding to the tag word, the score corresponding to the category may also be used. Calculation.

在一个实施例中，用户推荐方法还包括：收集用户的专业相关数据，根据专业相关数据挖掘用户的擅长标签。 In an embodiment, the user recommendation method further includes: collecting professional related data of the user, and rooting According to professional related data mining users are good at labeling.

用户的专业相关数据，是指用户使用各种网络应用产生的与专业相关的数据，包括问答社区数据、专业论坛数据中的至少一种，其中问答社区数据是指用户在问答社区的提问和问答等，专业论坛数据是指用户在一些专业论坛发表的帖子和回帖等。进一步的，可从与专业相关的业务服务器中获取到与用户标识对应的专业相关数据，进而根据专业相关数据挖掘出每个用户的擅长标签，并对应用户标识进行存储。 The user's professional related data refers to the professional-related data generated by the user using various network applications, including at least one of the Q&A community data and the professional forum data, wherein the Q&A community data refers to the user's questions and questions in the Q&A community. Etc., professional forum data refers to posts and replies posted by users in some professional forums. Further, the professional related data corresponding to the user identifier can be obtained from the service server related to the professional, and then each user's good tag is mined according to the professional related data, and stored corresponding to the user identifier.

进一步的，在一个实施例中，根据专业相关数据挖掘用户擅长标签，包括： Further, in one embodiment, the user is good at tags according to professional related data mining, including:

步骤 402，对专业相关数据中的文档进行分词。 Step 402: Perform word segmentation on the documents in the professional related data.

如上所述，可提取出用户的专业相关数据中的文档，然后采用传统的分词方法对文档中的内容进行分割，并去除掉一些常见的副词、动词和名词，如 "你" 、 "我" 、 "的" 、 "得"等，得到多个标签词。 As described above, the document in the user's professional related data can be extracted, and then the traditional word segmentation method is used to segment the content in the document, and some common adverbs, verbs and nouns, such as "you" and "me", are removed. , "", "得", etc., get multiple tag words.

步骤 404，计算分词后得到的标签词对应的分值为标签词的词频与用户的所有标签词的词频总和的比率。 Step 404: The score corresponding to the tag word obtained after the segmentation is calculated is the ratio of the word frequency of the tag word to the word frequency sum of all the tag words of the user.

步骤 404 中，统计分词后得到的标签词的词频，即该标签词出现的频率，按照如下公式计算每个标签词对应的分值： In step 404, the word frequency of the tag word obtained after the word segmentation, that is, the frequency of occurrence of the tag word, calculates the score corresponding to each tag word according to the following formula:

Expert (x) = ev (x) / ev (al l) Expert (x) = ev (x) / ev (al l)

其中， Expert (x)表示标签词 x 对应的分值， ev (x)表示标签词 x 的词频， ev (al l)表示该用户的所有标签词的词频总和。 Among them, Expert (x) represents the score corresponding to the tag word x, ev (x) represents the word frequency of the tag word x, and ev (al l) represents the word frequency sum of all tag words of the user.

步骤 406，根据标签词对应的分值选取标签词作为用户的擅长标签。具体的，可根据标签词对应的分值选取分值最大的预设数量的标签词作为用户的擅长标签。比如，选取分值最大的 10个标签词作为用户的擅长标签。 Step 406: Select a tag word as the user's good tag according to the score corresponding to the tag word. Specifically, the preset number of tag words with the largest score may be selected according to the score corresponding to the tag word as the user's good tag. For example, select the 10 tag words with the largest score as the user's good tags.

在另一个实施例中，如图 5 所示，根据专业相关数据挖掘用户的擅长标签，包括： In another embodiment, as shown in FIG. 5, the user's excels tags are based on professional related data mining, including:

步骤 502，对专业相关数据中的文档进行分词。 Step 502, segmenting the document in the professional related data.

步骤 504，对分词后得到的标签词进行归类。 Step 504, classifying the tag words obtained after the word segmentation.

具体的，可人工对分词后得到的标签词进行归类，也可按照传统的机器学习的方法对标签词进行归类。比如，标签词所属类别包括：科技、教育、军事、医学等。 Specifically, the tag words obtained after the word segmentation can be manually classified, and the tag words can be classified according to the traditional machine learning method. For example, the categories of tag words include: technology, education, military, medicine, etc.

步骤 506，计算分词后得到的标签词对应的分值为标签词的词频与用户的所有标签词的词频总和的比率。 Step 506: The score corresponding to the tag word obtained after the segmentation is calculated is the ratio of the word frequency of the tag word to the word frequency sum of all the tag words of the user.

步骤 508，根据标签词对应的分值计算标签词所属类别对应的分值。本实施例中，可获取用户在每个类别下的标签词和标签词的分值，则用户在每个类别对应的分值为该类别下的标签词的分值的总和。 Step 508: Calculate a score corresponding to the category to which the tag word belongs according to the score corresponding to the tag word. In this embodiment, the scores of the tag words and the tag words of each user in the category can be obtained, and the score corresponding to each category of the user is the sum of the scores of the tag words under the category.

步骤 510，根据标签词所属类别对应的分值选取类别作为用户的擅长标签。 Step 510: Select a category according to the score corresponding to the category of the label word as the user's good label.

具体的，可选取标签词所属类别对应的分值最大的预设数量的类别作为用户的擅长类别，比如，选取分值最大的 2个类别作为用户的擅长类别。本实施例中，还可根据标签词对应的分值选取标签词作为用户的擅长标签，比如选取分值最大的 8个标签词作为用户的擅长标签。本实施例中，每个用户的擅长标签包括用户擅长的类别和标签词，使得后续在生成匹配度时，除了可根据标签词对应的分值进行计算外，还可根据类别对应的分值进行计算。 Specifically, the preset number of categories with the largest score corresponding to the category to which the label word belongs may be selected as the user's good category, for example, the two categories with the largest score are selected as the user's good category. In this embodiment, the tag word may be selected as the user's good tag according to the score corresponding to the tag word, for example, the 8 tag words with the largest score are selected as the user's good tag. In this embodiment, each user's ex-good tag includes a category and a tag word that the user is good at, so that when the matching degree is generated, in addition to the root According to the score corresponding to the tag word, it can also be calculated according to the score corresponding to the category.

在另一个实施例中，还提供了另一种挖掘用户的擅长标签的方式，具体的，用户推荐方法还包括：收集用户的专业相关数据和个人信息，根据专业相关数据和个人信息挖掘用户的擅长标签。 In another embodiment, another method for mining the user's good tags is provided. Specifically, the user recommendation method further includes: collecting professional related data and personal information of the user, and mining the user according to professional related data and personal information. Good at labels.

如上所述，专业相关数据包括问答社区数据、专业论坛数据中的至少一种。用户的个人信息包括但不限于用户的教育、工作、年龄和职业等信息。具体的，可从不同的业务服务器中获取到与用户标识对应的个人信息，这些个人信息可以是用户登录网络应用所填写的个人信息，也可以是用户所在群组中的个人信息等。 As mentioned above, professional related data includes at least one of question and answer community data and professional forum data. The user's personal information includes, but is not limited to, the user's education, work, age, and occupation. Specifically, the personal information corresponding to the user identifier may be obtained from different service servers, and the personal information may be personal information filled in by the user to log in to the network application, or may be personal information in the group in which the user is located.

进一步的，在一个实施例中，如图 6 所示，根据专业相关数据和个人信息挖掘用户的擅长标签，包括： Further, in one embodiment, as shown in FIG. 6, the user's excelling tags are mined according to professional related data and personal information, including:

步骤 602，对专业相关数据中的文档进行分词。 Step 602: Perform word segmentation on the documents in the professional related data.

步骤 604，计算分词后得到的标签词对应的词频概率为标签词的词频与用户的所有标签词的词频总和的比率。即，计算分词后得到的标签词的词频与用户的所有标签词的词频总和的比率，作为标签词对应的词频概率。 Step 604: The word frequency probability corresponding to the tag word obtained after the word segmentation is the ratio of the word frequency of the tag word to the word frequency sum of all the tag words of the user. That is, the ratio of the word frequency of the tag word obtained after the word segmentation to the word frequency sum of all the tag words of the user is calculated as the word frequency probability corresponding to the tag word.

关于分词和计算标签词的词频概率的过程如上所述，在此则不再赘述。步骤 606，根据个人信息获取对应的标签词，以及根据个人信息计算获取到的标签词对应的置信度。 The process of word segmentation and the calculation of the word frequency probability of the tag word is as described above, and will not be described again here. Step 606: Acquire a corresponding tag word according to the personal information, and calculate a confidence level corresponding to the obtained tag word according to the personal information.

具体的，预先可设置与一些专业相关的词汇对应的标签词，则可根据用户的个人信息中与专业相关的词汇获取到对应的标签词。例如，用户的职业为 "律师" ，则获取到与 "律师"对应的标签词为 "法律" ，又例如，用户所在群组为 "XX律师事务所"，则获取到对应到标签词为 "法律" 。进一步的，预先可设置置信度函数，其取值为 0〜1，可根据个人信息中与专业相关的词汇的来源来确定对应的标签词所对应的置信度函数的取值。如，若用户的职业为用户填写的，则该职业所对应的标签词的置信度为 1，若用户所在群组中一共有 10个成员，其中 8个成员的职业都为 "律师"，则该用户的标签词 "法律" 的置信度的取值为 0. 8。 Specifically, the label words corresponding to the vocabulary related to some professions may be set in advance, and the corresponding label words may be obtained according to the vocabulary related to the professional in the personal information of the user. For example, if the user's occupation is "lawyer", the tag word corresponding to "lawyer" is obtained as "law", for example, the user If the group is "XX Law Firm", the corresponding word is "Law". Further, a confidence function may be set in advance, and the value is 0 to 1. The value of the confidence function corresponding to the corresponding tag word may be determined according to the source of the professional-related vocabulary in the personal information. For example, if the occupation of the user is filled in by the user, the confidence of the tag word corresponding to the occupation is 1. If the user has a total of 10 members in the group, and 8 of the occupations are "lawyers", then The value of the confidence value of the label of the user is 0. 8.

步骤 608，根据标签词对应的词频概率和置信度进行拟合，得到标签词对应的分值。 Step 608: Fitting according to the word frequency probability and the confidence degree corresponding to the tag word, and obtaining a score corresponding to the tag word.

具体的，可按照如下公式进行拟合来计算标签词对应的分值： Specifically, the score corresponding to the tag word can be calculated by fitting according to the following formula:

Fin_expert (χ) = γ * Expert (x) + λ * Profession (x) Fin_expert (χ) = γ * Expert (x) + λ * Profession (x)

其中， Fin_expert (x)表示标签词 x 对应的分值， Expert (x)表示标签词 x 的词频概率， Profession (X)表示标签词 x 的置信度， γ和 λ为常数，且 γ + λ =1。优选的， γ可取值为 0. 7， λ可取值为 0. 3。 Where Fin_expert (x) represents the score corresponding to the tag word x, Expert (x) represents the word frequency probability of the tag word x, Profession (X) represents the confidence of the tag word x, γ and λ are constants, and γ + λ = 1. λ。 The value of γ may be 0. 3.

步骤 610，根据标签词对应的分值选取标签词作为用户的擅长标签。计算出每个标签次对应的分值后，可选取分值最大的预设数量的标签词作为用户的擅长标签。本实施例中，根据用户的专业相关数据和个人信息来挖掘用户的擅长标签，所挖掘出的擅长标签更能体现用户所擅长的内容，因此更具有准确性。 Step 610: Select a tag word according to the score corresponding to the tag word as the user's good tag. After calculating the score corresponding to each label, the preset number of label words with the largest score can be selected as the user's good label. In this embodiment, the user's professional tags and personal information are used to mine the user's good tags, and the excavated good tags can better reflect the user's good content, and thus are more accurate.

进一步的，在一个实施例中，在图 6 所示实施例的基础上，根据专业相关数据和个人信息挖掘用户的擅长标签，还包括： Further, in an embodiment, based on the embodiment shown in FIG. 6, mining the user's good tags according to professional related data and personal information, further includes:

步骤 702，根据分词后得到的标签进行归类。步骤 704，根据标签词对应的分值计算标签词所属类别对应的分值。步骤 706，根据标签词所属类别对应的分值选取类别作为用户的擅长类别。 Step 702: classify according to the label obtained after the word segmentation. Step 704: Calculate a score corresponding to the category to which the tag word belongs according to the score corresponding to the tag word. Step 706: Select a category according to the score corresponding to the category of the tag word as the user's good category.

本实施例中，根据用户的专业相关数据和个人信息挖掘出的擅长标签除了包含擅长的标签词外还包含擅长类别，后续则可根据擅长的标签词对应的分值和擅长类别来计算两个用户之间的匹配度，使得推荐给用户的人更符合用户所需，进一步减少了冗余信息的推送，节省了网络资源。 In this embodiment, the excelling tag excavated according to the user's professional related data and personal information includes the excellent category in addition to the exclused tag word, and the subsequent can calculate two according to the score and the excellent category corresponding to the tag word that is good at The degree of matching between users makes the recommendation to the user more suitable for the user, further reducing the push of redundant information and saving network resources.

在一个实施例中，如图 8 所示，根据兴趣标签对应的分值和擅长标签对应的分值生成两个用户之间的匹配度，包括： In an embodiment, as shown in FIG. 8, the matching degree between the two users is generated according to the score corresponding to the interest tag and the score corresponding to the tag, including:

步骤 802，将第一用户的兴趣标签匹配第二用户的擅长标签，获取第一用户的兴趣标签与第二用户的擅长标签的第一相似度。 Step 802: Match the interest tag of the first user to the good tag of the second user, and obtain a first similarity between the interest tag of the first user and the good tag of the second user.

具体的，当用第一用户的兴趣标签去匹配第二用户的擅长标签时，可采用机器学习的方式来得到第一用户的兴趣标签与第二用户的擅长标签之间的第一相似度。例如，可在海量用户的线上行为数据中统计兴趣标签和擅长标签共同出现的次数来计算得到第一用户的兴趣标签与第二用户的擅长标签之间的第一相似度。在一个实施例中，也可判断第一用户的兴趣标签和第二用户的擅长标签是否相同，若相同，则第一相似度取值为 1，若不相同，则第一相似度取值为 0。 Specifically, when the first user's interest tag is used to match the second user's good tag, the machine learning method may be used to obtain a first similarity between the first user's interest tag and the second user's good tag. For example, the first similarity between the first user's interest tag and the second user's good tag may be calculated by counting the number of times the interest tag and the good tag are co-occurring in the online behavior data of the mass user. In an embodiment, the first user's interest tag and the second user's good tag are also the same. If they are the same, the first similarity value is 1. If not, the first similarity value is 0.

步骤 804，将第一用户的擅长标签匹配第二用户的兴趣标签，获取第一用户的擅长标签与第二用户的兴趣标签的第二相似度。 Step 804: Match the good tag of the first user to the interest tag of the second user, and obtain a second similarity between the good tag of the first user and the interest tag of the second user.

具体的，当用第一用户的擅长标签去匹配第二用户的兴趣标签时，也可采用机器学习的方式来得到第二相似度。在一个实施例中，也可判断第一用户的擅长标签和第二用户的兴趣标签是否相同，若相同，则第二相似度取值为 1，若不相同，则第二相似度取值为 0。 Specifically, when the first user's good tag is used to match the second user's interest tag, A machine learning method is used to obtain a second similarity. In an embodiment, it is also determined whether the first user's good tag and the second user's interest tag are the same. If they are the same, the second similarity value is 1. If not, the second similarity value is 0.

步骤 806，根据第一用户的兴趣标签对应的分值、第二用户的擅长标签对应的分值、第一用户的擅长标签对应的分值、第二用户的兴趣标签对应的分值、第一相似度和第二相似度计算第一用户和第二用户之间的匹配度。 Step 806, according to the score corresponding to the first user's interest tag, the score corresponding to the second user's good tag, the score corresponding to the first user's good tag, the score corresponding to the second user's interest tag, and the first The similarity and the second similarity calculate a degree of matching between the first user and the second user.

本实施例中，兴趣标签为表示兴趣的标签词，擅长标签为表示擅长的标签词。在一个实施例中，当用第一用户的兴趣标签去匹配第二用户的擅长标签时，则将第一用户的兴趣标签对应的分值、第二用户的擅长标签对应的分值和第一相似度相乘；当用第一用户的擅长标签去匹配第二用户的兴趣标签时，则将第一用户的擅长标签对应的分值、第二用户的兴趣标签对应的分值和第二相似度相乘；最后将所有得到的乘值相加，即为第一用户和第二用户之间的匹配度。 In this embodiment, the interest tag is a tag word indicating interest, and the tag is a tag word indicating good at the tag. In an embodiment, when the first user's interest tag is used to match the second user's good tag, the first user's interest tag corresponding score, the second user's good tag corresponding score, and the first Multiplying the similarity degree; when the first user's good tag is used to match the second user's interest tag, the first user's good tag corresponding to the score, the second user's interest tag corresponding to the score and the second similar Degree multiplication; finally, all the obtained multiplication values are added, that is, the degree of matching between the first user and the second user.

进一步的，在一个实施例中，在步骤 806 中，可按照如下公式计算第一用户和第二用户之间的匹配度： Further, in an embodiment, in step 806, the degree of matching between the first user and the second user may be calculated according to the following formula:

ID盧！！― = 《 )， C¾y)ID Lu! ! ― = " ), C3⁄4y)

其中， match _ score (a，b)为第一用户 a 与第二用户 b 之间的匹配度， n 为第一用户 a 的标签个数， m 为第二用户 b 的标签个数， α和 β为常数。优选的， α和 β相等，取值都为 0. 5。 Where match _ score (a, b) is the degree of matching between the first user a and the second user b, n is the number of tags of the first user a, m is the number of tags of the second user b, α and β is a constant. 5。 Preferably, α and β are equal, the value is 0.5.

当将第一用户 a 的兴趣标签匹配第二用户 b 的擅长标签时， match (_X， y) 为第一相似度， ^为第一用户 a 的兴趣标签对应的分值， ^为第二用户 b 的擅长标签对应的分值；当将第一用户 a 的擅长标签匹配第二用户 b 的兴趣标签时， match (x， y)为第二相似度， w_x为第一用户 a 的擅长标签的对应的分值， ^为第二用户 b 的兴趣标签对应的分值。 Match ( _X , y) when the first user a's interest tag matches the second user b's good tag For the first similarity, ^ is the score corresponding to the interest tag of the first user a, ^ is the score corresponding to the good tag of the second user b; when the good tag of the first user a matches the interest of the second user b When the tag is used, match (x, y) is the second similarity, w _x is the corresponding score of the first user a's good tag, and ^ is the score corresponding to the second user b's interest tag.

在一个优选的实施例中， match (x， y)取值可为 1 或 0， SP : 当第一用户的兴趣标签和第二用户的擅长标签相同，或者第一用户的擅长标签和第二用户的兴趣标签相同时，则取值为 1，否则取值为 0。这样，可以简化运算，提高处理效率。 In a preferred embodiment, match (x, y) may have a value of 1 or 0, SP: when the first user's interest tag is the same as the second user's good tag, or the first user's good tag and second If the user's interest tag is the same, the value is 1; otherwise, the value is 0. This simplifies the calculation and improves the processing efficiency.

在另一个实施例中，兴趣标签包括兴趣类别，擅长标签包括擅长类别。在本实施例中，可采用兴趣类别和擅长类别来计算两个用户之间的匹配度。则根据兴趣标签对应的分值和擅长标签对应的分值生成两个用户之间的匹配包括：根据兴趣类别对应的分值和擅长类别对应的分值生成两个用户之间的匹配度。 In another embodiment, the interest tag includes an interest category, and the good tag includes a good category. In this embodiment, the interest category and the good category can be used to calculate the degree of matching between the two users. And generating a match between the two users according to the score corresponding to the interest tag and the score corresponding to the tag. The method includes: generating a matching degree between the two users according to the score corresponding to the interest category and the score corresponding to the category.

具体的，在一个实施例中，如图 9所示，根据兴趣类别对应的分值和擅长类别对应的分值生成两个用户之间的匹配度，包括： Specifically, in an embodiment, as shown in FIG. 9, the matching degree between the two users is generated according to the score corresponding to the interest category and the score corresponding to the good category, including:

步骤 902，将第一用户的兴趣标签匹配第二用户的擅长标签，获取第一用户的兴趣标签与第二用户的擅长标签的第一相似度。 Step 902: Match the interest tag of the first user to the good tag of the second user, and obtain a first similarity between the interest tag of the first user and the good tag of the second user.

步骤 904，将第一用户的擅长标签匹配第二用户的兴趣标签，获取第一用户的擅长标签与第二用户的兴趣标签的第二相似度。 Step 904: Match the good tag of the first user to the interest tag of the second user, and obtain a second similarity between the good tag of the first user and the interest tag of the second user.

步骤 906，根据第一用户的兴趣类别对应的分值、第二用户的擅长标类别对应的分值、第一用户的擅长类别对应的分值、第二用户的兴趣类别对应的分值、第一相似度和第二相似度计算第一用户和第二用户之间的匹配度。在一个实施例中，当用第一用户的兴趣标签去匹配第二用户的擅长标签时，则将第一用户的兴趣类别对应的分值、第二用户的擅长类别对应的分值和第一相似度相乘；当用第一用户的擅长标签去匹配第二用户的兴趣标签时，则将第一用户的擅长类别对应的分值、第二用户的兴趣类别对应的分值和第二相似度相乘；最后将所有得到的乘值相加，即为第一用户和第二用户之间的匹配度。 Step 906, corresponding to the score corresponding to the interest category of the first user, the score corresponding to the second user's good target category, the score corresponding to the first user's good category, and the second user's interest category. The score, the first similarity, and the second similarity calculate a degree of matching between the first user and the second user. In an embodiment, when the first user's interest tag is used to match the second user's good tag, the first user's interest category corresponding score, the second user's good category corresponding score, and the first The similarity is multiplied; when the first user's good tag is used to match the second user's interest tag, the first user's expert category corresponding score, the second user's interest category corresponding score, and the second similarity are used. Degree multiplication; finally, all the obtained multiplication values are added, that is, the degree of matching between the first user and the second user.

进一步的，在一个实施例中，步骤 906 中，可按照如下公式计算所述第一用户和第二用户之间的匹配度：

其中， match _ score (a，b)为第一用户 a 与第二用户 b 之间的匹配度， n 为第一用户 a 的类别个数， m 为第二用户 b 的类别个数， α和 β为常数；当将第一用户 a 的兴趣标签匹配第二用户 b 的擅长标签时， match (x， y) 为所述第一相似度， ^为第一用户 a 的兴趣类别对应的分值， _Wy为第二用户 b 的擅长类别对应的分值； Further, in an embodiment, in step 906, the matching degree between the first user and the second user may be calculated according to the following formula:

Where match _ score (a, b) is the degree of matching between the first user a and the second user b, n is the number of categories of the first user a, m is the number of categories of the second user b, α and β is a constant; when the interest tag of the first user a matches the good tag of the second user b, match (x, y) is the first similarity, and ^ is the score corresponding to the interest category of the first user a. , _Wy is the score corresponding to the category of the second user b;

当将第一用户 a 的擅长标签匹配第二用户 b 的兴趣标签时， match (x， y) 为所述第二相似度， ^为第一用户 a 的擅长类别的对应的分值， ^为第二用户 b 的兴趣类别对应的分值。 When the good tag of the first user a matches the interest tag of the second user b, match (x, y) is the second similarity, and ^ is the corresponding score of the good class of the first user a, ^ is The score corresponding to the interest category of the second user b.

在一个优选的实施例中， match (x， y)取值可为 1 或 0， SP : 当第一用户的兴趣标签和第二用户的擅长标签相同，或者第一用户的擅长标签和第二用户的兴趣标签相同时，则取值为 1，否则取值为 0。这样，可以简化运算，提高处理效率。 In a preferred embodiment, match (x, y) may have a value of 1 or 0, SP: when the first user's interest tag is the same as the second user's good tag, or the first user's good tag and second use If the interest tag of the user is the same, the value is 1; otherwise, the value is 0. In this way, the operation can be simplified and the processing efficiency can be improved.

本实施例中，还可采用兴趣类别对应的分值和擅长类别对应的分值来参与计算两个用户之间的匹配度。进一步的，还可结合采用标签词参与计算得到的匹配度和采用类别参与计算的匹配度来得到两个用户之间的综合匹配度，最终根据两个用户之间的总和匹配度来选取匹配度最高的预设数量的用户进行推荐。如，对于一个用户集合 B= { bl, b2, b3, -, bn }和用户 a进行匹配，得到用户集合 B 中每个用户与用户 a之间的匹配度，最后选取其中匹配度最高的 100个用户推荐给用户 a 。本实施例中，结合类别和标签词来参与计算，能够提高推荐的准确度，这种将兴趣和擅长交叉匹配的方式能够避免用户推荐的盲目性，从而减少冗余信息的推送，节省了网络资源。 In this embodiment, the score corresponding to the interest category and the score corresponding to the category are also used to participate in calculating the matching degree between the two users. Further, the matching degree obtained by using the tag word participation calculation and the matching degree using the category participation calculation may be combined to obtain a comprehensive matching degree between the two users, and finally the matching degree is selected according to the total matching degree between the two users. The highest preset number of users make recommendations. For example, a user set B={ bl, b2, b3, -, bn } is matched with the user a, and the matching degree between each user in the user set B and the user a is obtained, and finally the highest matching 100 is selected. Users are recommended to user a. In this embodiment, combining the category and the tag word to participate in the calculation can improve the accuracy of the recommendation, and the method of interest and good cross-matching can avoid the blindness of the user recommendation, thereby reducing the push of redundant information and saving the network. Resources.

如图 10 所示，在一个实施例中，还提供了一种用户推荐装置，包括：兴趣标签读取模块 1002，用于读取用户的兴趣标签和兴趣标签对应的分值。 As shown in FIG. 10, in an embodiment, a user recommendation device is further provided, including: an interest tag reading module 1002, configured to read a score corresponding to the user's interest tag and the interest tag.

擅长标签读取模块 1004，用于读取用户的擅长标签和擅长标签对应的分值。 It is good at the tag reading module 1004, which is used to read the user's good tag and the score corresponding to the tag.

匹配度生成模块 1006，用于根据兴趣标签对应的分值和擅长标签对应的分值生成两个用户之间的匹配度。 The matching degree generating module 1006 is configured to generate a matching degree between the two users according to the score corresponding to the interest tag and the score corresponding to the tag.

用户推荐模块 1008，用于根据匹配度选取待推荐用户进行推荐。 The user recommendation module 1008 is configured to select a user to be recommended according to the matching degree for recommendation.

在一个实施例中，上述模块兴趣标签读取模块 1002、擅长标签读取模块 In an embodiment, the module interest tag reading module 1002 is good at the tag reading module.

1004、匹配度生成模块 1006、用户推荐模块 1008可用于实现上述图 1所示的步骤 102-108。 1004. The matching degree generation module 1006 and the user recommendation module 1008 can be used to implement the foregoing FIG. Steps 102-108.

在另一个实施例中，如图 11 所示，在图 10 所示实施例的基础上，用户推荐装置还包括： In another embodiment, as shown in FIG. 11, on the basis of the embodiment shown in FIG. 10, the user recommendation device further includes:

线上行为数据收集模块 1000-1 (图中未示出），用于收集用户的线上行为数据。 The online behavior data collection module 1000-1 (not shown) is used to collect the user's line uplink data.

兴趣标签挖掘模块 1001，用于根据线上行为数据挖掘用户的兴趣标签。进一步的，在一个实施例中，如图 12 所示，兴趣标签挖掘模块 1001具体包括：第一分词模块 1001a, 用于对线上行为数据中的文档进行分词。 The interest tag mining module 1001 is configured to mine the user's interest tag according to the online behavior data. Further, in an embodiment, as shown in FIG. 12, the interest tag mining module 1001 specifically includes: a first word segmentation module 1001a for segmenting a document in the online behavior data.

第一分值计算模块 1001b, 用于计算分词后得到的标签词对应的分值为标签词的词频与用户的所有标签词的词频的比率。 The first score calculation module 1001b, the score corresponding to the label word obtained after calculating the word segmentation is the ratio of the word frequency of the label word to the word frequency of all the tag words of the user.

兴趣标签选取模块 1001c ,用于根据标签词对应的分值选取标签词作为用户的兴趣标签。 The interest tag selection module 1001c is configured to select a tag word as a user's interest tag according to the score corresponding to the tag word.

在另一个实施例中，如图 13 所示，在图 12 所示实施例的基础上，兴趣标签挖掘模块 1001 还包括： In another embodiment, as shown in FIG. 13, based on the embodiment shown in FIG. 12, the interest tag mining module 1001 further includes:

第一归类模块 1001d，用于对分词后得到的标签词进行归类。 The first categorization module 1001d is configured to classify the tag words obtained after the word segmentation.

第一类别分值计算模块 1001e，用于根据标签词对应的分值计算标签词所属类别对应的分值。 The first category score calculation module 1001e is configured to calculate a score corresponding to the category to which the label word belongs according to the score corresponding to the label word.

兴趣类别选取模块 1001f，用于根据标签词所属类别对应的分值选取类别作为用户的兴趣类别。 The interest category selection module 1001f is configured to select a category as the user's interest category according to the score corresponding to the category to which the tag word belongs.

如图 14 所示，在另一个实施例中，用户推荐装置还包括： As shown in FIG. 14, in another embodiment, the user recommendation apparatus further includes:

专业相关数据收集模块 1000-2 (图中未示出），用于收集用户的专业相关数据。 Professional related data collection module 1000-2 (not shown) for collecting professional aspects of users Off data.

第一擅长标签挖掘模块 1003，用于根据专业相关数据挖掘用户的擅长标签。 The first excels in the tag mining module 1003, which is used to mine the user's excels tags according to professional related data.

进一步的，在一个实施例中，如图 15 所示，第一擅长标签挖掘模块 1003 包括： Further, in an embodiment, as shown in FIG. 15, the first good tag mining module 1003 includes:

第二分词模块 1003a，用于对专业相关数据中的文档进行分词。 The second participle module 1003a is used to segment the documents in the professional related data.

第二分值计算模块 1003b，用于计算分词后得到的标签词对应的分值为标签词的词频与用户的所有标签词的词频的比率。 The second score calculation module 1003b, the score corresponding to the label word obtained after calculating the word segmentation is the ratio of the word frequency of the label word to the word frequency of all the tag words of the user.

第一擅长标签选取模块 1003c，用于根据标签词对应的分值选取标签词作为用户的擅长标签。 The first excellent label selection module 1003c is configured to select a label word according to the score corresponding to the label word as the user's good label.

在另一个实施例中，在图 16所示，在图 15所示实施例的基础上，第一擅长标签挖掘模块 1003还包括： In another embodiment, as shown in FIG. 16, on the basis of the embodiment shown in FIG. 15, the first good tag mining module 1003 further includes:

第二归类模块 1003d，用于对分词后得到的标签进行归类。 The second categorization module 1003d is configured to classify the tags obtained after the word segmentation.

第二类别分值计算模块 1003e，用于根据标签词对应的分值计算标签词所属类别对应的分值。 The second category score calculation module 1003e is configured to calculate a score corresponding to the category to which the label word belongs according to the score corresponding to the label word.

第一擅长类别选取模块 1003f，用于根据标签词所述类别对应的分值选取类别作为用户的擅长类别。 The first excels category selection module 1003f is configured to select a category as the user's excellent category according to the score corresponding to the category of the tag word.

在一个实施例中，如图 17 所示，用户推荐装置还包括： In an embodiment, as shown in FIG. 17, the user recommendation device further includes:

个人信息收集模块 1000-3 (图中未示出），用于收集用户的个人信息。第二擅长标签挖掘模块 1005，用于根据专业相关数据和个人信息挖掘用户的擅长标签。进一步的，在一个实施例中，如图 18 所示，第二擅长标签挖掘模块 1005 包括： The personal information collecting module 1000-3 (not shown) is used to collect personal information of the user. The second is good at the tag mining module 1005, which is used to mine the user's good tags according to professional related data and personal information. Further, in an embodiment, as shown in FIG. 18, the second excel label mining module 1005 includes:

第三分词模块 1005a，用于对专业相关数据中的文档进行分词。 The third participle module 1005a is used to segment the documents in the professional related data.

词频概率计算模块 1005b，用于计算分词得到的标签词对应的词频概率为标签词的词频与用户的所有标签词的词频的比率。 The word frequency probability calculation module 1005b, the word frequency probability corresponding to the tag word obtained by the segmentation word is the ratio of the word frequency of the tag word to the word frequency of all the tag words of the user.

置信度计算模块 1005c，用于根据个人信息获取对应的标签词，以及根据个人信息计算获取的标签词所对应的置信度。 The confidence calculation module 1005c is configured to acquire a corresponding tag word according to the personal information, and calculate a confidence level corresponding to the acquired tag word according to the personal information.

第三分值计算模块 1005d，用于对标签词对应的词频概率和置信度进行拟合，得到标签词对应的分值。 The third score calculation module 1005d is configured to fit the word frequency probability and the confidence corresponding to the label word to obtain a score corresponding to the label word.

第二擅长标签选取模块 1005e，用于根据标签词对应的分值选取标签词作为用户的擅长标签。 The second excellent label selection module 1005e is configured to select a label word according to the score corresponding to the label word as the user's good label.

在另一个实施例中，如图 19 所示，在图 18 所示实施例的基础上，第二擅长标签挖掘模块 1005还包括： In another embodiment, as shown in FIG. 19, based on the embodiment shown in FIG. 18, the second excel label mining module 1005 further includes:

第三归类模块 1005f，用于对分词后得到的标签进行归类。 The third categorization module 1005f is used to classify the tags obtained after the word segmentation.

第三类别分值计算模块 1005g，用于根据标签词对应的分值计算标签词所属类别对应的分值。 The third category score calculation module 1005g is configured to calculate a score corresponding to the category to which the label word belongs according to the score corresponding to the label word.

第二擅长类别选取模块 1005h，用于根据标签词所述类别对应的分值选取类别作为用户的擅长类别。 The second excels category selection module 1005h is configured to select a category according to the category corresponding to the category of the tag word as the user's excellency category.

具体的，在一个实施例中，如图 20所示，匹配度生成模块 1006包括：第一匹配模块 1006a，用于将第一用户的兴趣标签匹配第二用户的擅长标签，获取第一用户的兴趣标签与第二用户的擅长标签的第一相似度；以及用于将第一用户的擅长标签匹配第二用户的兴趣标签，获取第一用户的擅长标签与第二用户的兴趣标签的第二相似度。 Specifically, in an embodiment, as shown in FIG. 20, the matching degree generating module 1006 includes: a first matching module 1006a, configured to match a first user's interest tag with a second user's good tag, and obtain the first user's a first similarity between the interest tag and the second user's expertise in the tag; For matching the first user's good tag with the second user's interest tag, obtaining a second similarity between the first user's good tag and the second user's interest tag.

第一匹配度计算模块 1006b，用于根据第一用户的兴趣标签对应的分值、第二用户的擅长标签对应的分值、第一用户的擅长标签对应的分值、第二用户的兴趣标签对应的分值、第一相似度和第二相似度计算第一用户和第二用户之间的匹配度。 The first matching degree calculation module 1006b is configured to: use a score corresponding to the first user's interest tag, a score corresponding to the second user's good tag, a score corresponding to the first user's good tag, and a second user's interest tag. The corresponding score, the first similarity, and the second similarity calculate a degree of matching between the first user and the second user.

进一步的，在一个实施例中，第一匹配度计算模块 1006b 用于按照如下公式计算所述第一用户和第二用户之间的匹配度：

Further, in an embodiment, the first matching degree calculation module 1006b is configured to calculate a matching degree between the first user and the second user according to the following formula:

其中， match _ score (a，b)为第一用户 a 与第二用户 b 之间的匹配度， n 为第一用户 a 的标签个数， m 为第二用户 b 的标签个数， α和 β为常数；当将第一用户 a 的兴趣标签匹配第二用户 b 的擅长标签时， match (x， y) 为所述第一相似度， ^为第一用户 a 的兴趣标签对应的分值， w_y为第二用户 b 的擅长标签对应的分值； Where match _ score (a, b) is the degree of matching between the first user a and the second user b, n is the number of tags of the first user a, m is the number of tags of the second user b, α and β is a constant; when the interest tag of the first user a matches the good tag of the second user b, match (x, y) is the first similarity, and ^ is the score corresponding to the interest tag of the first user a. , w _y is the score corresponding to the good label of the second user b;

当将第一用户 a 的擅长标签匹配第二用户 b 的兴趣标签时， match (x， y) 为所述第二相似度， ^为第一用户 a 的擅长标签的对应的分值， ^为第二用户 b 的兴趣标签对应的分值。 When the good tag of the first user a matches the interest tag of the second user b, match (x, y) is the second similarity, and ^ is the corresponding score of the first user a's good tag, ^ is The score corresponding to the interest tag of the second user b.

在另一个实施例中，兴趣标签包括兴趣类别，擅长标签包括擅长类别；匹配度生成模块 1006 还用于根据兴趣类别对应的分值和擅长类别对应的分值生成两个用户之间的匹配度。进一步的，如图 21 所示，匹配度生成模块 1006 包括： In another embodiment, the interest tag includes an interest category, and the good tag includes the good-selling category. The matching degree generating module 1006 is further configured to generate a matching degree between the two users according to the score corresponding to the interest category and the score corresponding to the category. . Further, as shown in FIG. 21, the matching degree generation module 1006 includes:

第二匹配模块 1006c，用于将第一用户的兴趣标签匹配第二用户的擅长标签，获取第一用户的兴趣标签与第二用户的擅长标签的第一相似度；以及用于将第一用户的擅长标签匹配第二用户的兴趣标签，获取第一用户的擅长标签与第二用户的兴趣标签的第二相似度。 The second matching module 1006c is configured to match the interest tag of the first user with the good tag of the second user, obtain the first similarity between the interest tag of the first user and the good tag of the second user, and the first user The good tag matches the interest tag of the second user, and obtains the second similarity of the first user's good tag and the second user's interest tag.

第二匹配度计算模块 1006d，用于根据第一用户的兴趣类别对应的分值、第二用户的擅长类别对应的分值、第一用户的擅长类别对应的分值、第二用户的兴趣类别对应的分值、第一相似度和第二相似度计算第一用户和第二用户之间的匹配度。 The second matching degree calculation module 1006d is configured to use a score corresponding to the interest category of the first user, a score corresponding to the second user's good category, a score corresponding to the first user's good category, and a second user's interest category. The corresponding score, the first similarity, and the second similarity calculate a degree of matching between the first user and the second user.

进一步的，在一个实施例中，第二匹配度计算模块 1006d 用于按照如下公式计算所述第一用户和第二用户之间的匹配度： Further, in an embodiment, the second matching degree calculation module 1006d is configured to calculate a matching degree between the first user and the second user according to the following formula:

-fe,)^p 直-fe,) ^p straight

其中， match _ score (a，b)为第一用户 a 与第二用户 b 之间的匹配度， n 为第一用户 a 的类别个数， m 为第二用户 b 的类别个数， α和 β为常数；当将第一用户 a 的兴趣标签匹配第二用户 b 的擅长标签时， match (x， y) 为所述第一相似度， ^为第一用户 a 的兴趣类别对应的分值， ^为第二用户 b 的擅长类别对应的分值；

Where match _ score (a, b) is the degree of matching between the first user a and the second user b, n is the number of categories of the first user a, m is the number of categories of the second user b, α and β is a constant; when the interest tag of the first user a matches the good tag of the second user b, match (x, y) is the first similarity, and ^ is the score corresponding to the interest category of the first user a. , ^ is the score corresponding to the category of the second user b;

当将第一用户 a 的擅长标签匹配第二用户 b 的兴趣标签时， match (x， y) 为所述第二相似度， ^为第一用户 a 的擅长类别的对应的分值， ^为第二用户 b 的兴趣类别对应的分值。应当理解的是，以上示出按照功能来划分模块的一种示例性实施例，实际应用中，可以根据需要将一个或多个模块划分为更小的子模块，或者一个或多个模块可以根据需要被合并到更大的模块中，以完成以上模式的全部或部分功能。 When the good tag of the first user a matches the interest tag of the second user b, match (x, y) is the second similarity, and ^ is the corresponding score of the good class of the first user a, ^ is The score corresponding to the interest category of the second user b. It should be understood that the above shows an exemplary embodiment of dividing a module according to functions. In an actual application, one or more modules may be divided into smaller sub-modules as needed, or one or more modules may be Need to be merged into a larger module to perform all or part of the above modes.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程、图中的流程图和各个框图是可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于一计算机可读取存储介质中，该程序在执行时，可包括如上述各方法的实施例的流程。其中，所述的存储介质可为磁碟、光盘、只读存储记忆体（Read-Only Memory, ROM)或随机存储记忆体（Random Access Memory, RAM) 等。 A person skilled in the art can understand that all or part of the processes in the above embodiments are implemented, the flowcharts in the figures, and the various block diagrams can be completed by a computer program to instruct related hardware, and the program can be stored in a computer. In reading the storage medium, the program, when executed, may include the flow of an embodiment of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明专利的保护范围应以所附权利要求为准。 The above-mentioned embodiments are merely illustrative of several embodiments of the present invention, and the description thereof is not to be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the spirit and scope of the invention. Therefore, the scope of the invention is to be determined by the appended claims.

Claims

权利要求书 claims

1、一种用户推荐方法，所述方法包括： 1. A user recommendation method, the method includes:

读取用户的兴趣标签和所述兴趣标签对应的分值； Read the user's interest tags and the scores corresponding to the interest tags;

读取用户的擅长标签和所述擅长标签对应的分值； Read the user's proficiency tag and the score corresponding to the proficiency tag;

根据所述兴趣标签对应的分值和所述擅长标签对应的分值生成用户之间的匹配度； Generate a matching degree between users based on the score corresponding to the interest tag and the score corresponding to the proficiency tag;

根据所述匹配度选取待推荐用户进行推荐。 Select users to be recommended for recommendation based on the matching degree.

2、根据权利要求 1 所述的方法，其特征在于，所述方法还包括：收集用户的线上行为数据； 2. The method according to claim 1, wherein the method further includes: collecting user's online behavior data;

对所述线上行为数据中的文档进行分词； Perform word segmentation on the documents in the online behavior data;

计算分词后得到的标签词的词频与用户的所有标签词的词频总和这两者之间的比率，作为所述标签词对应的分值； Calculate the ratio between the word frequency of the tag word obtained after word segmentation and the sum of the word frequencies of all tag words of the user, as the score corresponding to the tag word;

根据所述标签词对应的分值选取标签词作为用户的兴趣标签。 Select a tag word as the user's interest tag according to the score corresponding to the tag word.

3、根据权利要求 2 所述的方法，其特征在于，其中，所述兴趣标签包括用户的兴趣类别，所述方法还包括： 3. The method according to claim 2, wherein the interest tag includes the user's interest category, and the method further includes:

对所述分词后得到的标签词进行归类； Classify the tag words obtained after the word segmentation;

根据所述标签词对应的分值计算标签词所属类别对应的分值； Calculate the score corresponding to the category to which the tag word belongs based on the score corresponding to the tag word;

根据所述标签词所属类别对应的分值选取类别作为用户的兴趣类别。 Select a category as the user's interest category based on the score corresponding to the category to which the tag word belongs.

4、根据权利要求 1 所述的方法，其特征在于，所述方法还包括：收集用户的专业相关数据，其中，所述专业相关数据包括问答社区数据、专业论坛数据中的至少一种； 4. The method according to claim 1, characterized in that the method further includes: collecting user's professional-related data, wherein the professional-related data includes at least one of question and answer community data and professional forum data;

对所述专业相关数据中的文档进行分词；计算分词后得到的标签词的词频与用户的所有标签词的词频总和这两者之间的比率，作为所述标签词对应的分值； Perform word segmentation on documents in the professional-related data; Calculate the ratio between the word frequency of the tag word obtained after word segmentation and the sum of the word frequencies of all tag words of the user, as the score corresponding to the tag word;

根据所述标签词对应的分值选取标签词作为用户的擅长标签。 Select a tag word as the user's good tag according to the score corresponding to the tag word.

5、根据权利要求 4所述的方法，其特征在于，所述方法还包括：收集用户的个人信息； 5. The method according to claim 4, characterized in that the method further includes: collecting the user's personal information;

计算分词后得到的标签词的词频与用户的所有标签词的词频总和这两者之间的比率，作为所述标签词对应的词频概率； Calculate the ratio between the word frequency of the tag word obtained after word segmentation and the sum of the word frequencies of all tag words of the user, as the word frequency probability corresponding to the tag word;

根据所述个人信息获取对应的标签词，以及根据所述个人信息计算获取的标签词所对应的置信度； Obtain corresponding tag words according to the personal information, and calculate the confidence corresponding to the acquired tag words according to the personal information;

对标签词对应的词频概率和置信度进行拟合，得到所述标签词对应的分值。 The word frequency probability and confidence level corresponding to the tag word are fitted to obtain the score corresponding to the tag word.

6、根据权利要求 5 所述的方法，其特征在于，通过如下公式进行所述拟合. 6. The method according to claim 5, characterized in that the fitting is performed through the following formula.

Fin—expert (χ) = γ氺 Expert (x) + λ氺 Prof ession (x) Fin—expert (χ) = γ氺Expert (x) + λ氺Profession (x)

其中， Fin_expert (x)为所述标签词 x 对应的分值， Expert (x)为所述标签词 x 对应的词频概率， Profession (X)为所述标签词 x 所对应的置信度， γ和 λ 为常数，且 γ + λ =1。 Among them, Fin_expert (x) is the score corresponding to the tag word x, Expert (x) is the word frequency probability corresponding to the tag word x, Profession (X) is the confidence corresponding to the tag word x, γ and λ is a constant, and γ + λ =1.

7、根据权利要求 4或 5 所述的方法，其特征在于，所述擅长标签包括用户的擅长类别，所述方法还包括： 7. The method according to claim 4 or 5, wherein the proficiency tag includes the user's proficiency category, and the method further includes:

对所述分词后得到的标签进行归类； Classify the labels obtained after the word segmentation;

根据所述标签词对应的分值计算标签词所属类别对应的分值；根据所述标签词所属类别对应的分值选取类别作为用户的擅长类别。 Calculate the score corresponding to the category to which the tag word belongs based on the score corresponding to the tag word; Select a category as the user's specialty category based on the score corresponding to the category to which the tag word belongs.

8、根据权利要求 1 所述的方法，其特征在于，所述根据兴趣标签对应的分值和擅长标签对应的分值生成两个用户之间的匹配度，包括： 8. The method according to claim 1, wherein the matching degree between the two users is generated based on the score corresponding to the interest tag and the score corresponding to the proficiency tag, including:

将第一用户的兴趣标签匹配第二用户的擅长标签，获取所述第一用户的兴趣标签与所述第二用户的擅长标签之间的第一相似度； Match the interest tag of the first user to the proficiency tag of the second user, and obtain the first similarity between the interest tag of the first user and the proficiency tag of the second user;

将第一用户的擅长标签匹配第二用户的兴趣标签，获取所述第一用户的擅长标签与第二用户的兴趣标签之间的第二相似度； Match the first user's proficiency tag with the second user's interest tag, and obtain the second similarity between the first user's proficiency tag and the second user's interest tag;

根据所述第一用户的兴趣标签对应的分值、第二用户的擅长标签对应的分值、第一用户的擅长标签对应的分值、第二用户的兴趣标签对应的分值、所述第一相似度和第二相似度计算第一用户和第二用户之间的匹配度。 According to the score corresponding to the interest tag of the first user, the score corresponding to the proficiency tag of the second user, the score corresponding to the proficiency tag of the first user, the score corresponding to the interest tag of the second user, the third The first similarity degree and the second similarity degree calculate the matching degree between the first user and the second user.

9、根据权利要求 8所述的方法，其特征在于，所述兴趣标签包括兴趣类别，所述擅长标签包括擅长类别； 9. The method according to claim 8, wherein the interest tag includes an interest category, and the proficiency tag includes an expertise category;

根据所述第一用户的兴趣类别对应的分值、第二用户的擅长类别对应的分值、第一用户的擅长类别对应的分值、第二用户的兴趣类别对应的分值、所述第一相似度和第二相似度计算第一用户和第二用户之间的匹配度。 According to the score corresponding to the first user's interest category, the score corresponding to the second user's specialty category, the score corresponding to the first user's specialty category, the score corresponding to the second user's interest category, the third The first similarity degree and the second similarity degree calculate the matching degree between the first user and the second user.

10、根据权利要求 8或 9 所述的方法，其特征在于，按照如下公式计算所述第一用户和第二用户之间的匹配度：

10. The method according to claim 8 or 9, characterized in that the matching degree between the first user and the second user is calculated according to the following formula:

其中， match _ score (a，b)为第一用户 a 与第二用户 b之间的匹配度，当 n 为第一用户 a 的标签个数， m 为第二用户 b 的标签个数时， α和 β为常数；当将第一用户 a 的兴趣标签匹配第二用户 b 的擅长标签时， match (x， y)为所述第一相似度， ^为第一用户 a 的兴趣标签对应的分值， ^ 为第二用户 b 的擅长标签对应的分值；当将第一用户 a 的擅长标签匹配第二用户 b 的兴趣标签时， match (x， y)为所述第二相似度， ^为第一用户 a 的擅长标签的对应的分值， ^为第二用户 b 的兴趣标签对应的分值； Among them, match_score (a, b) is the matching degree between the first user a and the second user b. When n is the number of tags of the first user a and m is the number of tags of the second user b, α and β is a constant; when the interest tag of the first user a is matched with the proficiency tag of the second user b, match (x, y) is the first similarity, ^ is the score corresponding to the interest tag of the first user a , ^ is the score corresponding to the proficiency tag of the second user b; when the proficiency tag of the first user a is matched with the interest tag of the second user b, match (x, y) is the second similarity, ^ is The score corresponding to the good tag of the first user a, ^ is the score corresponding to the interest tag of the second user b;

当 n 为第一用户 a 的类别个数， m 为第二用户 b 的类别个数， α和 β 为常数；当将第一用户 a 的兴趣标签匹配第二用户 b 的擅长标签时， match (x， y)为所述第一相似度， ^为第一用户 a 的兴趣类别对应的分值， ^为第二用户 b 的擅长类别对应的分值；当将第一用户 a 的擅长标签匹配第二用户 b 的兴趣标签时， match (x， y)为所述第二相似度， ^为第一用户 a 的擅长类别的对应的分值， ^为第二用户 b 的兴趣类别对应的分值。 When n is the number of categories of the first user a, m is the number of categories of the second user b, α and β are constants; when the interest tags of the first user a match the good tags of the second user b, match ( x, y) are the first similarities, ^ is the score corresponding to the interest category of the first user a, ^ is the score corresponding to the specialty category of the second user b; when the first user a's specialty tag is matched When the interest tag of the second user b is used, match (x, y) is the second similarity, ^ is the score corresponding to the category that the first user a is good at, ^ is the score corresponding to the interest category of the second user b value.

11、一种用户推荐装置，其特征在于，所述装置包括： 11. A user recommendation device, characterized in that the device includes:

兴趣标签读取模块，用于读取用户的兴趣标签和所述兴趣标签对应的分值； Interest tag reading module, used to read the user's interest tag and the score corresponding to the interest tag;

擅长标签读取模块，用于读取用户的擅长标签和所述擅长标签对应的分值； The proficiency tag reading module is used to read the user's proficiency tags and the scores corresponding to the proficiency tags;

匹配度生成模块，用于根据所述兴趣标签对应的分值和所述擅长标签对应的分值生成用户之间的匹配度； A matching degree generation module, configured to generate a matching degree between users based on the score corresponding to the interest tag and the score corresponding to the proficiency tag;

用户推荐模块，用于根据所述匹配度选取待推荐用户进行推荐。 A user recommendation module is used to select users to be recommended for recommendation based on the matching degree.

12、根据权利要求 11 所述的装置，其特征在于，所述装置还包括：线上行为数据收集模块，用于收集用户的线上行为数据；兴趣标签挖掘模块，用于根据所述线上行为数据挖掘用户的兴趣标签，其中，所述兴趣标签挖掘模块包括： 12. The device according to claim 11, characterized in that, the device further includes: an online behavior data collection module, used to collect the user's online behavior data; An interest tag mining module is used to mine users' interest tags based on the online behavior data, wherein the interest tag mining module includes:

第一分词模块，用于对所述线上行为数据中的文档进行分词； The first word segmentation module is used to segment documents in the online behavior data;

第一分值计算模块，用于计算分词后得到的标签词的词频与用户的所有标签词的词频总和这两者之间的比率，作为所述标签词对应的分值； The first score calculation module is used to calculate the ratio between the word frequency of the tag word obtained after word segmentation and the sum of the word frequencies of all tag words of the user, as the score corresponding to the tag word;

兴趣标签选取模块，用于根据所述标签词对应的分值选取标签词作为用户的兴趣标签。 The interest tag selection module is used to select tag words as the user's interest tags based on the scores corresponding to the tag words.

13、根据权利要求 12 所述的装置，其特征在于，所述兴趣标签包括用户的兴趣类别，所述兴趣标签挖掘模块还包括： 13. The device according to claim 12, wherein the interest tag includes the user's interest category, and the interest tag mining module further includes:

第一归类模块，用于对所述分词后得到的标签词进行归类； The first classification module is used to classify the tag words obtained after the word segmentation;

第一类别分值计算模块，用于根据所述标签词对应的分值计算标签词所属类别对应的分值； The first category score calculation module is used to calculate the score corresponding to the category to which the tag word belongs based on the score corresponding to the tag word;

兴趣类别选取模块，用于根据所述标签词所属类别对应的分值选取类别作为用户的兴趣类别。 The interest category selection module is used to select a category as the user's interest category based on the score corresponding to the category to which the tag word belongs.

14、根据权利要求 11 所述的装置，其特征在于，所述装置还包括：专业相关数据收集模块，用于收集用户的专业相关数据，其中，所述专业相关数据包括问答社区数据、专业论坛数据中的至少一种； 14. The device according to claim 11, characterized in that the device further includes: a professional-related data collection module, used to collect the user's professional-related data, wherein the professional-related data includes question and answer community data, professional forums at least one of the data;

第一擅长标签挖掘模块，用于根据所述专业相关数据挖掘用户的擅长标签， The first specialty tag mining module is used to mine the user's specialty tags based on the professional-related data,

其中，所述第一擅长标签挖掘模块包括：第二分词模块，用于对所述专业相关数据中的文档进行分词； Among them, the first good tag mining module includes: The second word segmentation module is used to segment documents in the professional-related data;

第二分值计算模块，用于计算分词后得到的标签词的词频与用户的所有标签词的词频总和这两者之间的比率，作为所述标签词对应的分值； The second score calculation module is used to calculate the ratio between the word frequency of the tag word obtained after word segmentation and the sum of the word frequencies of all tag words of the user, as the score corresponding to the tag word;

第一擅长标签选取模块，用于根据所述标签词对应的分值选取标签词作为用户的擅长标签。 The first good tag selection module is used to select tag words as the user's good tags according to the scores corresponding to the tag words.

15、根据权利要求 14 所述的装置，其特征在于，所述装置还包括：个人信息收集模块，用于收集用户的个人信息； 15. The device according to claim 14, characterized in that the device further includes: a personal information collection module, used to collect the user's personal information;

词频概率计算模块，用于计算分词后得到的标签词的词频与用户的所有标签词的词频总和这两者之间的比率，作为所述标签词对应的词频概率；置信度计算模块，用于根据所述个人信息获取对应的标签词，以及根据所述个人信息计算获取的标签词所对应的置信度； A word frequency probability calculation module, used to calculate the ratio between the word frequency of the tag word obtained after word segmentation and the sum of the word frequencies of all tag words of the user, as the word frequency probability corresponding to the tag word; a confidence calculation module, used for Obtain corresponding tag words based on the personal information, and calculate the confidence corresponding to the acquired tag words based on the personal information;

第三分值计算模块，用于对标签词对应的词频概率和置信度进行拟合，得到所述标签词对应的分值。 The third score calculation module is used to fit the word frequency probability and confidence level corresponding to the tag word to obtain the score corresponding to the tag word.

16、根据权利要求 15 所述的装置，其特征在于，所述第三分值计算模块通过如下公式进行所述拟合得到所述标签词对应的分值： 16. The device according to claim 15, characterized in that the third score calculation module performs the fitting through the following formula to obtain the score corresponding to the tag word:

17、根据权利要求 14或 15 所述的装置，其特征在于，所述擅长标签包括用户的擅长类别，所述第一擅长标签挖掘模块还包括：第三归类模块，用于对所述分词后得到的标签进行归类； 17. The device according to claim 14 or 15, wherein the proficiency tag includes the user's proficiency category, and the first proficiency tag mining module further includes: The third classification module is used to classify the labels obtained after the word segmentation;

第三类别分值计算模块，用于根据所述标签词对应的分值计算标签词所属类别对应的分值； The third category score calculation module is used to calculate the score corresponding to the category to which the tag word belongs based on the score corresponding to the tag word;

第二擅长类别选取模块，用于根据所述标签词所述类别对应的分值选取类别作为用户的擅长类别。 The second specialty category selection module is used to select a category as the user's specialty category based on the score corresponding to the category described in the tag word.

18、根据权利要求 11 所述的装置，其特征在于，所述匹配度生成模块包括： 18. The device according to claim 11, characterized in that the matching degree generation module includes:

第一匹配模块，用于将第一用户的兴趣标签匹配第二用户的擅长标签，获取所述第一用户的兴趣标签与所述第二用户的擅长标签之间的第一相似度；以及用于将第一用户的擅长标签匹配第二用户的兴趣标签，获取所述第一用户的擅长标签与第二用户的兴趣标签之间的第二相似度； The first matching module is used to match the interest tag of the first user to the proficiency tag of the second user, and obtain the first similarity between the interest tag of the first user and the proficiency tag of the second user; and use In order to match the first user's proficiency tag to the second user's interest tag, obtain a second degree of similarity between the first user's proficiency tag and the second user's interest tag;

第一匹配度计算模块，用于根据所述第一用户的兴趣标签对应的分值、第二用户的擅长标签对应的分值、第一用户的擅长标签对应的分值、第二用户的兴趣标签对应的分值、所述第一相似度和第二相似度计算第一用户和第二用户之间的匹配度。 The first matching degree calculation module is configured to calculate the score corresponding to the interest tag of the first user, the score corresponding to the proficiency tag of the second user, the score corresponding to the proficiency tag of the first user, and the interest of the second user. The score corresponding to the tag, the first similarity and the second similarity are used to calculate the matching degree between the first user and the second user.

19、根据权利要求 18 所述的装置，其特征在于，所述兴趣标签包括兴趣类别，所述擅长标签包括擅长类别； 19. The device according to claim 18, wherein the interest tag includes an interest category, and the proficiency tag includes an expertise category;

所述匹配度生成模块还用于根据所述兴趣类别对应的分值和擅长类别对应的分值生成两个用户之间的匹配度， The matching degree generation module is also used to generate a matching degree between two users based on the scores corresponding to the interest categories and the scores corresponding to the specialty categories,

其中，所述匹配度生成模块还包括： Wherein, the matching degree generation module also includes:

第二匹配度计算模块，用于根据所述第一用户的兴趣类别对应的分值、第二用户的擅长类别对应的分值、第一用户的擅长类别对应的分值、第二用户的兴趣类别对应的分值、所述第一相似度和第二相似度计算第一用户和第二用户之间的匹配度。 The second matching degree calculation module is used to calculate the score corresponding to the interest category of the first user, The score corresponding to the second user's specialty category, the score corresponding to the first user's specialty category, the score corresponding to the second user's interest category, the first similarity and the second similarity are calculated by calculating the first user and the second similarity. Matching degree between two users.

20、根据权利要求 18或 19 所述的装置，其特征在于，所述装置按照如下公式计算所述第一用户和第二用户之间的匹配度： 20. The device according to claim 18 or 19, characterized in that the device calculates the matching degree between the first user and the second user according to the following formula:

si Si

Ι δ― Om{¾ ¾- = *^: 直 tl 》 Ι δ― Om{¾ ¾- = * ^: straight tl》

其中， match _ score (a，b)为第一用户 a 与第二用户 b之间的匹配度，当由所述第一匹配度计算模块计算所述第一用户和第二用户之间的匹配度时， n 为第一用户 a 的标签个数， m 为第二用户 b 的标签个数时， α和 β为常数；当将第一用户 a 的兴趣标签匹配第二用户 b 的擅长标签时， match (x， y)为所述第一相似度， ^为第一用户 a 的兴趣标签对应的分值， ^ 为第二用户 b 的擅长标签对应的分值；当将第一用户 a 的擅长标签匹配第二用户 b 的兴趣标签时， match (x， y)为所述第二相似度， ^为第一用户 a 的擅长标签的对应的分值， ^为第二用户 b 的兴趣标签对应的分值；

Where, match_score (a, b) is the matching degree between the first user a and the second user b. When the first matching degree calculation module calculates the matching between the first user and the second user When n is the number of tags of the first user a, m is the number of tags of the second user b, α and β are constants; when the interest tags of the first user a are matched with the good tags of the second user b , match (x, y) is the first similarity, ^ is the score corresponding to the interest tag of the first user a, ^ is the score corresponding to the good tag of the second user b; when the first user a's When the good label matches the interest label of the second user b, match (x, y) is the second similarity, ^ is the corresponding score of the good label of the first user a, ^ is the interest label of the second user b the corresponding score;

当由所述第二匹配度计算模块计算所述第一用户和第二用户之间的匹配度时， n 为第一用户 a 的类别个数， m 为第二用户 b 的类别个数， α和 β 为常数；当将第一用户 a 的兴趣标签匹配第二用户 b 的擅长标签时， match (x， y)为所述第一相似度， ^为第一用户 a 的兴趣类别对应的分值， ^为第二用户 b 的擅长类别对应的分值；当将第一用户 a 的擅长标签匹配第二用户 b 的兴趣标签时， match (x， y)为所述第二相似度， ^为第一用户 a 的擅长类别的对应的分值， ^为第二用户 b 的兴趣类别对应的分值。 When the second matching degree calculation module calculates the matching degree between the first user and the second user, n is the number of categories of the first user a, m is the number of categories of the second user b, α and β are constants; when the interest tag of the first user a is matched with the proficiency tag of the second user b, match (x, y) is the first similarity, ^ is the score corresponding to the interest category of the first user a value, ^ is the score corresponding to the specialty category of the second user b; when the specialty label of the first user a is matched with the interest label of the second user b, match (x, y) is the second similarity, ^ For the category that the first user a is good at The corresponding score, ^ is the score corresponding to the interest category of the second user b.

21. 一种存储一个或多个指令序列的非暂时计算机可读存储介质，所述一个或多个指令序列在被一个或多个处理器执行时使得所述一个或多个处理器： 21. A non-transitory computer-readable storage medium storing one or more sequences of instructions that, when executed by one or more processors, cause the one or more processors to: