WO2020042501A1 - 基金经理社团划分方法、***、计算机设备和存储介质 - Google Patents

基金经理社团划分方法、***、计算机设备和存储介质 Download PDF

Info

Publication number
WO2020042501A1
WO2020042501A1 PCT/CN2018/124590 CN2018124590W WO2020042501A1 WO 2020042501 A1 WO2020042501 A1 WO 2020042501A1 CN 2018124590 W CN2018124590 W CN 2018124590W WO 2020042501 A1 WO2020042501 A1 WO 2020042501A1
Authority
WO
WIPO (PCT)
Prior art keywords
community
fund
entity
nodes
relationship
Prior art date
Application number
PCT/CN2018/124590
Other languages
English (en)
French (fr)
Inventor
陈泽晖
胡逸凡
谢云
黄鸿顺
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020042501A1 publication Critical patent/WO2020042501A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Definitions

  • the present application relates to the field of financial technology, and in particular, to a method, a system, a computer device, and a storage medium for dividing a fund manager community.
  • Knowledge map is a kind of visualization of knowledge domain or map of knowledge domain. It displays complex knowledge domain through data mining, information processing, knowledge measurement and graphic drawing, reveals the dynamic development law of knowledge domain, and provides practical research for disciplines. Valuable reference. According to coverage, knowledge maps can be divided into general knowledge maps and industry knowledge maps. The general knowledge map is mainly used in intelligent search and other fields. Industry knowledge graphs often need to be built on industry-specific data.
  • the fund knowledge map is a kind of industry knowledge map in the financial field, which provides investors with a visual reference tool, but the fund knowledge map is only a stack of relationships and does not further dig people's network relationships.
  • a method for dividing a fund manager community includes:
  • Extracting multiple entities from a fund knowledge map where the fund knowledge map is stored in a graph database in the form of a map, and the fund knowledge map includes entities and relationships;
  • Each entity is set as a node, each said node is set as an initial community, the weight W between the two nodes is set as degree, and the fast clustering Fast Newman algorithm is called to community the nodes Divide to get the result of community division;
  • the community division result is stored in a community division table, and the community division table is located in the graph database.
  • a fund manager community classification system includes:
  • the entity extraction unit is configured to extract multiple entities from a fund knowledge map, where the fund knowledge map is stored in a map database in a map form, and the fund knowledge map includes entities and relationships;
  • An acquisition weight unit configured to acquire a relationship between every two said entities, and merge them into a weight W;
  • the community division unit is set to set each entity as a node, each said node is set to an initial community, the weight W between the two nodes is set to degree, and the fast clustering Fast Newman algorithm is called, Performing community division on the nodes to obtain a community division result;
  • the saving unit is configured to save the community division result into a community division table, and the community division table is located in the graph database.
  • a computer device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to perform the following steps:
  • Extracting multiple entities from a fund knowledge map where the fund knowledge map is stored in a graph database in the form of a map, and the fund knowledge map includes entities and relationships;
  • Each entity is set as a node, each said node is set as an initial community, the weight W between the two nodes is set as degree, and the fast clustering Fast Newman algorithm is called to community the nodes Divide to get the result of community division;
  • the community division result is stored in a community division table, and the community division table is located in the graph database.
  • a storage medium storing computer-readable instructions.
  • the one or more processors execute the following steps:
  • Extracting multiple entities from a fund knowledge map where the fund knowledge map is stored in a graph database in the form of a map, and the fund knowledge map includes entities and relationships;
  • Each entity is set as a node, each said node is set as an initial community, the weight W between the two nodes is set as degree, and the fast clustering Fast Newman algorithm is called to community the nodes Divide to get the result of community division;
  • the community division result is stored in a community division table, and the community division table is located in the graph database.
  • the method, device, computer equipment and storage medium for dividing the fund manager community mentioned above include extracting multiple entities from the fund knowledge map.
  • the fund knowledge map is stored in a map database in the form of a map.
  • the fund knowledge map includes entities and relationships.
  • the relationship between the entities is merged into a weight W; each entity is set as a node, each node is set as an initial community, the weight W between the two nodes is set as a degree, and the fast clustering Fast Newman algorithm is called
  • the community is divided into nodes to obtain the community division result; the community division result is stored in the community division table, and the community division table is located in the graph database.
  • This application uses the Fast Newman method to perform community clustering.
  • the Fast Newman algorithm is combined with the practical application of fund managers. Based on the relationship between fund managers, the weights are different to find the optimal community division status as the final community division result. To achieve the determination and presentation of people's network relationships.
  • FIG. 1 is a flowchart of a method for dividing a fund manager community according to an embodiment of the present application
  • FIG. 2 is a flowchart of step S1 in an embodiment
  • step S4 is a flowchart of step S4 in an embodiment
  • FIG. 4 is a structural diagram of a division system of a fund manager community according to an embodiment of the present application.
  • FIG. 1 is a flowchart of a method for dividing a fund manager community according to an embodiment of the present application. As shown in FIG. 1, a method for dividing a fund manager community includes the following steps:
  • Step S1 extracting multiple entities: extracting multiple entities from the fund knowledge graph, the fund knowledge graph is stored in a graph database in a graph form, and the fund knowledge graph includes entities and relationships.
  • Knowledge Graph is also called scientific knowledge graph. In the library and information industry, it is called knowledge domain visualization or knowledge domain mapping map. It is a series of various graphs showing the relationship between the development process and structure of knowledge. Describe knowledge resources and their carriers, and mine, analyze, construct, map, and display knowledge and their interconnections. In short, the knowledge graph is a knowledge system described by entities, attributes, and relationships.
  • the fund knowledge map in this step is a knowledge map of fund managers.
  • the fund manager is the entity, the company to which the fund manager belongs, the graduate school, the mentor, the code or abbreviation of the managed fund company, and the name of the managed fund as the relationship.
  • the type of fund being managed by the fund manager, the size of the fund being managed, the investment style, the investment cycle, the unit net value and the cumulative net value are taken as attributes.
  • several entities are extracted from the fund knowledge map stored in the pre-existing graph database, and the entities are each fund manager.
  • Step S2 Obtaining weights: acquiring the relationship between each two entities and combining them into a weight W.
  • the relationship between an entity and other entities includes the company to which the fund manager belongs, the graduate institution, the mentor, the code or abbreviation of the managed fund company, the name of the managed fund, etc.
  • the weights corresponding to the different relationships between the two entities are not The same, and there is no relationship, only one relationship, or more than two relationships between the two entities. In this step, the sum of the weights corresponding to all the relationships between the two entities is W.
  • Step S3 community division: each entity is set as a node, each node is set as an initial community, the weight W between the two nodes is set as degree, and the fast clustering Fast Newman algorithm is called to community the nodes Divide to get the results of community division.
  • Fast clustering Newman algorithm is a clustering algorithm published in 2004, referred to as F-B algorithm.
  • the Fast Newman algorithm treats each node as a community, and each iteration selects the two communities that produce the largest Q value until the entire network merges into a community.
  • the entire process can be represented as a tree diagram, from which the hierarchy with the highest Q value is selected to obtain the final community structure, including:
  • Modularity is used to measure whether the division of a community is a relatively good result.
  • a relatively good result has a high degree of similarity within the community and a low degree of similarity outside the community.
  • the degree of modularity is defined as the ratio of the total number of edges in the community to the total number of edges in the network minus an expected value.
  • the expected value is the sum of the total number of edges in the community formed by the same community allocation when the network is set to a random network.
  • the proportion of the total number of edges in the network is determined by the Q value.
  • the Q value is the largest, the network is an ideal division.
  • the range of Q value is between 0-1.
  • the larger the Q value the higher the accuracy of the community structure of the network division. In actual network analysis, the highest point of the Q value generally appears between 0.3 and 0.7.
  • step S4 the community division result is saved: the community division result is stored in the community division table, and the community division table is located in the graph database.
  • step S3 The community division result obtained in step S3 is saved for subsequent query use.
  • a fast clustering Newman algorithm is used to divide several entities in the fund knowledge map into communities, divide the optimal community division results, and save them to realize the mining and division of people's network relationships.
  • the generation process of the fund knowledge map in step S1 includes:
  • Step S101 knowledge extraction: extracting multiple pieces of fund knowledge data from an external information source and setting it as a knowledge metabase, and the fund knowledge data includes a fund manager, a affiliated company, a graduate school, a mentor, a managed fund company code or short name, The name of the fund under management, the type of fund being managed, the size of the fund being managed, the investment style, the investment cycle, the unit net value and the cumulative net value.
  • the data in the database is structured data
  • the data in the database is extracted by setting a rule script to obtain multiple pieces of fund knowledge data.
  • the external information source is a website
  • the chart data in the figure is semi-structured data. Data extraction is performed through crawlers or regular expression matching to obtain multiple pieces of fund knowledge data.
  • the external information source is a fund research report, fund manager resume, or community comment, the external information source is false. Structured text data is extracted through natural language processing to obtain multiple pieces of fund knowledge data.
  • step S102 the knowledge is merged: the fund manager in the knowledge metabase is set as a unified mark, and if two pieces of fund knowledge data have the same unified mark, the two funds knowledge data are merged.
  • Step S103 knowledge storage: setting the fund manager as an entity, affiliated company, graduate school, mentor, managed fund company code or short name and managed fund name as relationship, type of fund being managed, fund being managed
  • the scale, investment style, investment cycle, unit net value and cumulative net value are set as attributes, and the knowledge metabase is stored in a graph database in a graph form to generate a fund knowledge graph.
  • This implementation generates knowledge map of the fund through knowledge extraction, knowledge merger and knowledge storage to provide entities and relationships for subsequent community division.
  • step S2 the relationship between the entity and other entities is combined into a weight W, including:
  • the weight W between an entity and another entity is obtained, it can be understood that the weight W is disassembled with weighted edges,
  • this application adopts a weighted relationship, that is, the relationship between fund managers is weighted.
  • the weight is obtained by merging the relationships, and the degree of importance between the two nodes can be determined in the subsequent community division.
  • the formula for calculating the modularity Q value in step S3 includes:
  • v and w are any two nodes, and there are m connections between the two nodes.
  • 2m is the degree of the entire network
  • e ij represents the edge of one node in community i and the other node in community j, then e ii represents the number of all edges in community i and the number of all edges in the entire network.
  • a ratio that is, the degree within a community is higher than the degree of the entire network
  • a i represents the ratio of the degrees of the nodes in the i community to the degree of the entire network.
  • the modularity Q value when the two communities are combined can be obtained, and the modularity Q value ranges from 0-1.
  • step S4 includes:
  • step S401 a request is obtained: a certain fund manager community information request input by a user is obtained through a preset user query interface.
  • a query window or a query field can be set on the user query interface.
  • the user enters the name of a fund manager to obtain a community information request for the fund manager.
  • Step S402 query the community division table: access the graph database, and query the community division table according to the fund manager.
  • the community classification table contains the result of the division of the fund manager and the corresponding community.
  • the result of the division of the community is the relationship and attributes of the fund manager, and also the relationship between the fund manager and other fund managers.
  • Step S403 data conversion and return: extract the nodes and relationships of the community where the fund manager is located, and send them to the data visualization D3.js software in json data format. After the nodes and relationships are converted into a visualization chart by D3.js software, return to User query interface.
  • the Json (JavaScript Object Notation) data format is a lighter and simpler data exchange format than XML.
  • the Json data format is a JavaScript native format. No special API or toolkit is required to process JSON data in JavaScript.
  • the rule of JSON is: The object is an unordered "name / value” pair. An object begins with “ ⁇ ” (left parenthesis) and ends with “ ⁇ ” (right parenthesis). Each "name” is followed by a “:” (colon); ",” (comma) is used as a data transmission format between "name / value” pairs.
  • Json does not need to send header information with specific content types from the server, which makes the parameter transfer of the Json data format simpler and more practical, and it is more suitable for the data transfer between the database and D3.js software in this step.
  • Data conversion script is a scripting language that can convert data into Json format. You can use third-party tools to convert data into Json data format.
  • Data visualization D3.js software can freely design charts, suitable for displaying rich and varied chart styles, and it is completely free, and the code is open source. Due to the variety of chart types, almost all development needs can be met.
  • This embodiment uses D3.js software to convert nodes and relationships into visual charts for users to view, which can better show the community division and relationship of a certain fund manager.
  • a fund manager community division method is adopted. After extracting multiple entities in the fund knowledge map, each entity obtains a weight W from other entities, and the multiple entities are divided into communities by the Fast Newman algorithm to obtain the final community. The division results are stored in the community division table. In order to facilitate the user to query the community division result of the entity, the user is also interacted with the user through a preset user query interface, and the community division result is presented through a visual chart for the user to view.
  • the above entities refer to fund managers in this application.
  • the community division results obtained by the method of this application are the community division results of fund managers and other fund managers.
  • fund managers In the field of funds, fund managers often have factional points due to their mentors, graduate schools, or the same company. According to the relationship between the two fund managers, there is also a tendency to hold groups. They usually Will affect each other.
  • the method of this application divides multiple fund managers with close relationships into communities. There are different communities in the network relationship.
  • the connection relationship between fund managers in a certain society is dense, and the connection between fund managers in a society is dense.
  • the relationship is sparse.
  • a fund manager community division system is proposed, as shown in FIG. 4, and includes the following units:
  • Extract entity unit set to extract multiple entities from the fund knowledge map, the fund knowledge map is stored in a graph database in the form of a map, and the fund knowledge map includes entities and relationships;
  • Get weight unit set to get the relationship between every two entities, and merge them into a weight W;
  • the community division unit is set to set each entity as a node, each node as an initial community, the weight W between the two nodes is set to degree, and the fast clustering Fast Newman algorithm is called to community the nodes. Divide to get the result of community division;
  • the saving unit is configured to save a community division result into a community division table, and the community division table is located in a graph database.
  • extracting the entity unit includes:
  • Fund knowledge data includes fund managers, affiliated companies, graduate schools, mentors, managed fund company codes or Short name, the name of the fund under management, the type of fund being managed, the size of the fund being managed, the investment style, the investment cycle, the unit net value and the cumulative net value;
  • the unified marking module is set to set the fund manager in the knowledge metabase as a unified mark. If two pieces of fund knowledge data have the same unified mark, the two funds knowledge data are merged;
  • Generate fund knowledge map module set to set the fund manager as an entity, affiliated company, graduate school, mentor, managed fund company code or short name and managed fund name as relationship, the type of fund being managed, and the type being managed
  • the fund size, investment style, investment cycle, unit net value, and cumulative net value are set as attributes, and the knowledge metabase is stored in a graph database in a graph database to generate a fund knowledge graph.
  • a knowledge metabase module is provided, and when the external information source is a database, the data in the database is structured data, and the data in the database is extracted by setting a rule script to obtain multiple funds knowledge Data; when the external information source is a website, the chart data in the website is semi-structured data, and data extraction is performed through crawler or regular expression matching to obtain multiple pieces of fund knowledge data; when the external information source is fund research reports, funds When the manager's resume or community comment, the external information source is unstructured text data. Data extraction is performed through natural language processing to obtain multiple pieces of fund knowledge data.
  • the community division unit is further configured to calculate the modularity Q value when two initial communities are combined, and the two communities with the largest increase or the smallest decrease in Q value are merged to form another community, and the calculation is repeated. Merge until all communities are merged into one large community and stop, and find out the community division result when the Q value is the largest during the merger.
  • the formula for calculating the modularity Q value includes:
  • v and w are any two nodes, and there are m connections between the two nodes.
  • 2m is the degree of the entire network
  • e ij represents the edge of one node in community i and the other node in community j, then e ii represents the number of all edges in community i and the number of all edges in the entire network.
  • a ratio that is, the degree within a community is higher than the degree of the entire network
  • a i represents the ratio of the degrees of the nodes in the i community to the degree of the entire network.
  • it further includes a query unit configured to obtain a certain fund manager community information request entered by the user through a preset user query interface; accessing the graph database and querying the community division table according to the fund manager; The nodes and relationships are extracted and sent to the data visualization D3.js software in json data format.
  • the D3.js software converts the nodes and relationships into visual charts and returns them to the user query interface.
  • a computer device which includes a memory and a processor.
  • the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor is caused to implement the foregoing when the computer-readable instructions are executed.
  • a storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, the one or more processors are caused to execute the fund manager community in the foregoing embodiments. Steps in partitioning method.
  • the storage medium may be a non-volatile storage medium.
  • the program may be stored in a computer-readable storage medium.
  • the storage medium may include: Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Technology Law (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基金经理社团划分方法、***、计算机设备和存储介质。该方法包括:从基金知识图谱中提取多个实体,基金知识图谱包括实体和关系(S1);获取两个实体之间的关系,合并成权重W(S2);将每个实体均设置为一个节点,将每个节点设置为一个初始社团,两个节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对节点进行社团划分,得到社团划分结果(S3);将社团划分结果保存入社团划分表中(S4)。通过Fast Newman方法进行社团聚类,找出最优的社区划分状态,作为最终社区划分结果,实现了对人的网络关系的确定和呈现。

Description

基金经理社团划分方法、***、计算机设备和存储介质
本申请要求于2018年08月27日提交中国专利局、申请号为2018109775850、发明名称为“基金经理社团划分方法、***、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及金融技术领域,尤其涉及一种基金经理社团划分方法、***、计算机设备和存储介质。
背景技术
知识图谱是一种知识域可视化或知识领域映射地图,它把复杂的知识领域通过数据挖掘、信息处理、知识计量和图形绘制而显示出来,揭示知识领域的动态发展规律,为学科研究提供切实的、有价值的参考。根据覆盖范围而言,知识图谱可分为通用知识图谱和行业知识图谱。通用知识图谱主要应用于智能搜索等领域。行业知识图谱通常需要依靠特定行业的数据来构建。
基金知识图谱正是一种金融领域的行业知识图谱,其为投资者们提供了可视化参考工具,但是基金知识图谱仅仅是一种关系的堆叠,并没有对人的网络关系进一步挖掘。
发明内容
有鉴于此,有必要针对基金知识图谱缺少对人的网络关系的问题,提供一种基金经理社团划分方法、***、计算机设备和存储介质。
一种基金经理社团划分方法,包括:
从基金知识图谱中提取多个实体,所述基金知识图谱采用图谱形式存储在图数据库中,所述基金知识图谱包括实体和关系;
获取每两个所述实体之间的关系,合并成权重W;
将每个实体均设置为一个节点,将每个所述节点设置为一个初始社团,两 个所述节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果;
将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中。
一种基金经理社团划分***,包括:
提取实体单元,设置为从基金知识图谱中提取多个实体,所述基金知识图谱采用图谱形式存储在图数据库中,所述基金知识图谱包括实体和关系;
获取权重单元,设置为获取每两个所述实体之间的关系,合并成权重W;
社团划分单元,设置为将每个实体均设置为一个节点,将每个所述节点设置为一个初始社团,两个所述节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果;
保存单元,设置为将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
从基金知识图谱中提取多个实体,所述基金知识图谱采用图谱形式存储在图数据库中,所述基金知识图谱包括实体和关系;
获取每两个所述实体之间的关系,合并成权重W;
将每个实体均设置为一个节点,将每个所述节点设置为一个初始社团,两个所述节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果;
将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中。
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
从基金知识图谱中提取多个实体,所述基金知识图谱采用图谱形式存储在图数据库中,所述基金知识图谱包括实体和关系;
获取每两个所述实体之间的关系,合并成权重W;
将每个实体均设置为一个节点,将每个所述节点设置为一个初始社团,两个所述节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果;
将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中。
上述基金经理社团划分方法、装置、计算机设备和存储介质,包括从基金知识图谱中提取多个实体,基金知识图谱采用图谱形式存储在图数据库中,基金知识图谱包括实体和关系;获取每两个实体之间的关系,合并成权重W;将每个实体均设置为一个节点,将每个节点设置为一个初始社团,两个节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对节点进行社团划分,得到社团划分结果;将社团划分结果保存入社团划分表中,社团划分表位于图数据库中。本申请通过Fast Newman方法进行社团聚类,将Fast Newman算法结合基金经理的实际应用出发,根据基金经理之间的关系不同,权重不同,来找出最优的社区划分状态,作为最终社区划分结果,实现了对人的网络关系的确定和呈现。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。
图1为本申请一个实施例中的基金经理社团划分方法的流程图;
图2为一个实施例中步骤S1的流程图;
图3为一个实施例中步骤S4的流程图;
图4为本申请一个实施例中基金经理社团划分***的结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。
图1为本申请一个实施例中的基金经理社团划分方法的流程图,如图1所示,一种基金经理社团划分方法,包括以下步骤:
步骤S1,提取多个实体:从基金知识图谱中提取多个实体,基金知识图谱采用图谱形式存储在图数据库中,基金知识图谱包括实体和关系。
知识图谱(Knowledge Graph/Vault)又称为科学知识图谱,在图书情报界称为知识域可视化或知识领域映射地图,是显示知识发展进程与结构关系的一系列各种不同的图形,用可视化技术描述知识资源及其载体,挖掘、分析、构建、绘制和显示知识及它们之间的相互联系。简单来说知识图谱是以实体、属性以及关系来描述的一种知识体系。
本步骤中的基金知识图谱是关于基金经理的知识图谱,其中,基金经理作为实体,基金经理所属公司、毕业院校、导师、管理过的基金公司代码或简称、管理过的基金名称作为关系,基金经理正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值作为属性。本步骤从预存在图数据库中的基金知识图谱中提取若干实体,实体即为各个基金经理。
步骤S2,获取权重:获取每两个实体之间的关系,合并成权重W。
某一实体与其他实体之间的关系包括基金经理所属公司、毕业院校、导师、管理过的基金公司代码或简称、管理过的基金名称等,两个实体之间不同的关系对应的权重不一样,且两个实体之间存在没有关系、只有一项关系或两项以上关系等情况。本步骤获取两个实体之间所有关系对应的权重总和为W。
步骤S3,社团划分:将每个实体均设置为一个节点,将每个节点设置为一 个初始社团,两个节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对节点进行社团划分,得到社团划分结果。
快速聚类Fast Newman算法是2004年发表的一种聚类算法,简称F-B算法。Fast Newman算法将每个节点看作是一个社团,每次迭代选择产生最大Q值的两个社团合并,直至整个网络融合成一个社团。整个过程可表示成一个树状图,从中选择Q值最大的层次划分得到最终的社团结构,具体包括:
计算两个初始社团结合时的模块度Q值,将Q值增加最大的或者减少最少的两个社团进行合并形成另一社团,重复进行计算和合并,直到所有社团合并成一个大社团时停止,找出合并过程中Q值最大时的社团划分结果。
其中,模块度(Modularity)是用来衡量一个社团的划分是不是相对比较好的结果。一个相对好的结果在社团内部的节点相似度较高,而在社团外部节点的相似度较低。模块度的大小定义为社团内部的总边数和网络中总边数的比例减去一个期望值,该期望值是将网络设定为随机网络时同样的社团分配所形成的社团内部的总边数和网络中总边数的比例的大小,于是模块度通过Q值来确定,Q值取值最大的时候则是此网路较理想的划分。Q值的范围在0-1之间,Q值越大说明网络划分的社团结构准确度越高,在实际的网络分析中,Q值的最高点一般出现在0.3-0.7之间。
步骤S4,保存社团划分结果:将社团划分结果保存入社团划分表中,社团划分表位于图数据库中。
将步骤S3得到的社团划分结果进行保存,以便于后续查询使用。
本实施例通过快速聚类Fast Newman算法将基金知识图谱中若干实体进行社团划分,划分出最优的社团划分结果,并进行保存,实现了对人的网络关系的挖掘和划分。
在一个实施例中,如图2所示,步骤S1中基金知识图谱的生成过程,包括:
步骤S101,知识抽取:从外部信息源中抽取多条基金知识数据,设置为知识元库,基金知识数据据包括基金经理、所属公司、毕业院校、导师、管理过的基金公司代码或简称、管理过的基金名称、正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值。
本步骤中,当外部信息源是数据库时,数据库中的数据为结构化数据,通过设定规则脚本对数据库中的数据进行抽取,得到多条基金知识数据;当外部信息源是网站时,网站中的图表数据为半结构化数据,通过爬虫或正规表达式匹配来进行数据抽取,得到多条基金知识数据;当外部信息源是基金研报、基金经理简历或社区评论时,外部信息源是非结构化的文本数据,通过自然语言处理来进行数据抽取,得到多条基金知识数据。
步骤S102,知识合并:将知识元库中的基金经理设置为统一标记,如果两条基金知识数据具有同一项统一标记,则对两条基金知识数据进行合并。
步骤S103,知识存储:将基金经理设置为实体、所属公司、毕业院校、导师、管理过的基金公司代码或简称和管理过的基金名称设置为关系,正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值设置为属性,对知识元库采用图谱形式存储在图数据库中,生成基金知识图谱。
本实施通过知识抽取、知识合并和知识存储生成基金知识图谱,为后续社团划分提供实体和关系。
在一个实施例中,步骤S2,实体与其他实体之间的关系,合并成权重W,包括:
实体与另一实体之间的关系是同一所属公司,则实体与另一实体之间的所属公司的关系对应的权重W=1;实体与另一实体之间的关系是毕业于同一所毕业院校,则实体与另一实体之间的毕业院校的关系对应的权重W=1;实体与另一实体之间的关系是具有共同的导师,则实体与另一实体之间的导师的关系对应的权重W=2;实体与另一实体之间的关系是管理过的基金公司代码或简称相同,则实体与另一实体之间的管理过的基金公司代码或简称的关系对应的权重W=2;实体与另一实体之间的关系是管理过的基金名称相同,则实体与另一实体之间的管理过的基金名称的关系对应的权重W=2;对实体与另一实体之间的所有关系求和,形成权重W。在得到实体与另一实体之间的权重W时,可以理解为对权重W进行带权边拆解,带权边拆解即为权重W的边拆为W条权重为1的边。
实体与另一实体之间可能具有多个关系,比如毕业于同一所毕业院校且具 有共同的导师时,则实体与另一实体之间的毕业院校的关系对应的权重W=1,导师的关系对应的权重W=2,则两个实体之间的关系对应的权重W=3。
由于基金经理之间往往具有抱团倾向,所以本申请采用有权重的关系,即基金经理之间的关系是有权重的。本实施例,通过对关系进行合并得到权重,在后续的社团划分时,能确定两个节点之间的重要程度。
在一个实施例中,步骤S3中的模块度Q值的计算公式,包括:
Figure PCTCN2018124590-appb-000001
其中,
Figure PCTCN2018124590-appb-000002
v和w是任一两个节点,两个节点之间共有m个连接关系,当两个节点直接相连时A vw=1,否则A vw=0,k v、k w分别表示节点v、w的度,2m是整个网络的度,δ(c v,c w)判断节点v和w是否在同一个社区内,在同一个社区内δ(c v,c w)=1,否则δ(c v,c w)=0;e ij表示一个节点在社区i内,另一个节点在社区j内的边,那么e ii就表示在社区i内所有边个数与整个网络所有边的个数一个比值,即一个社区内部的度比上整个网络的度,而a i则表示i社区内的节点的度占整个网络的度的比值。
通过本实施例可以得到两个社团结合时的模块度Q值,且模块度Q值的取值范围在0-1之间。
在一个实施例中,如图3所示,步骤S4,包括:
步骤S401,获得请求:通过预设的用户查询界面获得用户输入的某一基金经理社团信息请求。
用户查询界面上可以设置有查询窗口或查询字段,用户输入某一基金经理的名字,即可获得对此基金经理的社团信息请求。
步骤S402,查询社团划分表:访问图数据库,根据基金经理查询社团划分表。
社团划分表中含有此基金经理及对应的社团划分结果,社团划分结果即为此基金经理的关系和属性,还涉及到此基金经理与其他基金经理划分好的关系。
步骤S403,数据转换和返回:将基金经理所在社团的节点和关系进行提取, 并以json数据格式发送给数据可视化D3.js软件,D3.js软件将节点和关系转换为可视化图表后,返回给用户查询界面。
Json(JavaScript Object Notation)数据格式是一种比xml更轻巧、更简单的数据交换格式,Json数据格式是JavaScript原生格式,在JavaScript中处理JSON数据不需要任何特殊的API或工具包。JSON的规则为:对象是一个无序的“‘名称/值’对”集合。一个对象以“{”(左括号)开始,“}”(右括号)结束。每个“名称”后跟一个“:”(冒号);“‘名称/值’对”之间使用“,”(逗号)分隔作为一种数据传输格式。Json不需要从服务器端发送含有特定内容类型的首部信息,致使Json数据格式的参数传递更为简单实用,更适用于本步骤中数据库和D3.js软件之间的数据传递。在对基金经理所在社团的节点和关系进行提取后,可以通过数据转换脚本将数据转换为json数据格式。数据转换脚本是可以将数据转换成Json格式的脚本语言,可以采用第三方工具将数据转换为Json数据格式。
数据可视化D3.js软件可自由设计图表,适合展示丰富多样的图表样式,且完全免费,代码开源。由于图表类型非常丰富,因此几乎可以满足所有开发需求。本实施例通过D3.js软件将节点和关系转换为可视化图表,供用户查看,能更好的展现某一基金经理的社团划分和关系情况。
本申请基金经理社团划分方法,在基金知识图谱中提取多个实体后,对每个实体都获取与其他实体之间的权重W,通过Fast Newman算法对多个实体进行社团划分,得到最终的社团划分结果,并存储在社团划分表中。为了便于使用者查询实体的社团划分结果,还通过预设的用户查询界面与使用者交互,将社团划分结果通过可视化图表进行呈现,供用户查看。
上述实体在本申请中指基金经理,本申请的方法得到的社团划分结果为基金经理与其他基金经理的社团划分结果。在基金领域,基金经理之间由于导师、毕业院校或者在同一家公司等情况下,往往具有派系之分,据有关系的两个基金经理之间还具有抱团倾向,他们通常在基金买卖时会相互影响。在多个基金经理的网络关系中,人们无法直观的知道两个基金经理之间的关系程度。因此本申请的方法,将具有紧密关系的多个基金经理进行社团划分,网络关系中有 不同的社团,某一社团内基金经理之间的连接关系比较稠密,而社团之间的基金经理的连接关系比较稀疏。使用者在了解某一基金经理的情况时,通过社团划分后的可视化图表可以容易的知道与其有紧密关系的其他基金经理的现状,在进行基金筛选时,为使用者提供更优的参考价值。
在一个实施例中,提出了一种基金经理社团划分***,如图4所示,包括如下单元:
提取实体单元,设置为从基金知识图谱中提取多个实体,基金知识图谱采用图谱形式存储在图数据库中,基金知识图谱包括实体和关系;
获取权重单元,设置为获取每两个实体之间的关系,合并成权重W;
社团划分单元,设置为将每个实体均设置为一个节点,将每个节点设置为一个初始社团,两个节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对节点进行社团划分,得到社团划分结果;
保存单元,设置为将社团划分结果保存入社团划分表中,社团划分表位于图数据库中。
在一个实施例中,提取实体单元,包括:
设置知识元库模块,设置为从外部信息源中抽取多条基金知识数据,设置为知识元库,基金知识数据据包括基金经理、所属公司、毕业院校、导师、管理过的基金公司代码或简称、管理过的基金名称、正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值;
统一标记模块,设置为将知识元库中的基金经理设置为统一标记,如果两条基金知识数据具有同一项统一标记,则对两条基金知识数据进行合并;
生成基金知识图谱模块,设置为将基金经理设置为实体、所属公司、毕业院校、导师、管理过的基金公司代码或简称和管理过的基金名称设置为关系,正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值设置为属性,对知识元库采用图谱形式存储在图数据库中,生成基金知识图谱。
在一个实施例中,设置知识元库模块,还设置为当外部信息源是数据库时,数据库中的数据为结构化数据,通过设定规则脚本对数据库中的数据进行抽取, 得到多条基金知识数据;当外部信息源是网站时,网站中的图表数据为半结构化数据,通过爬虫或正规表达式匹配来进行数据抽取,得到多条基金知识数据;当外部信息源是基金研报、基金经理简历或社区评论时,外部信息源是非结构化的文本数据,通过自然语言处理来进行数据抽取,得到多条基金知识数据。
在一个实施例中,获取权重单元,还设置为:实体与另一实体之间的关系是同一所属公司,则实体与另一实体之间的所属公司的关系对应的权重W=1;实体与另一实体之间的关系是毕业于同一所毕业院校,则实体与另一实体之间的毕业院校的关系对应的权重W=1;实体与另一实体之间的关系是具有共同的导师,则实体与另一实体之间的导师的关系对应的权重W=2;实体与另一实体之间的关系是管理过的基金公司代码或简称相同,则实体与另一实体之间的管理过的基金公司代码或简称的关系对应的权重W=2;实体与另一实体之间的关系是管理过的基金名称相同,则实体与另一实体之间的管理过的基金名称的关系对应的权重W=2;对实体与另一实体之间的所有关系求和,形成权重W。
在一个实施例中,社团划分单元,还设置为计算两个初始社团结合时的模块度Q值,将Q值增加最大的或者减少最少的两个社团进行合并形成另一社团,重复进行计算和合并,直到所有社团合并成一个大社团时停止,找出合并过程中Q值最大时的社团划分结果。
在一个实施例中,模块度Q值的计算公式,包括:
Figure PCTCN2018124590-appb-000003
其中,
Figure PCTCN2018124590-appb-000004
v和w是任一两个节点,两个节点之间共有m个连接关系,当两个节点直接相连时A vw=1,否则A vw=0,k v、k w分别表示节点v、w的度,2m是整个网络的度,δ(c v,c w)判断节点v和w是否在同一个社区内,在同一个社区内δ(c v,c w)=1,否则δ(c v,c w)=0;e ij表示一个节点在社区i内,另一个节点在社区j内的边,那么e ii就表示在社区i内所有边个数与整个网络所有边的个数一个比值,即一个社区内部的度比上整个网络的度,而a i则表示i社区内的节点的度占整个网络的度的比值。
在一个实施例中,还包括查询单元,设置为通过预设的用户查询界面获得用户输入的某一基金经理社团信息请求;访问图数据库,根据基金经理查询社团划分表;将基金经理所在社团的节点和关系进行提取,并以json数据格式发送给数据可视化D3.js软件,D3.js软件将节点和关系转换为可视化图表后,返回给用户查询界面。
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行计算机可读指令时实现上述各实施例里基金经理社团划分方法中的步骤。
在一个实施例中,提出了一种存储有计算机可读指令的存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施例里基金经理社团划分方法中的步骤。其中,存储介质可以为非易失性存储介质。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请一些示例性实施例,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种基金经理社团划分方法,包括:
    从基金知识图谱中提取多个实体,所述基金知识图谱采用图谱形式存储在图数据库中,所述基金知识图谱包括所述实体和关系;
    获取每两个所述实体之间的关系,合并成权重W;
    将每个实体均设置为一个节点,将每个所述节点设置为一个初始社团,两个所述节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果;
    将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中。
  2. 根据权利要求1所述的基金经理社团划分方法,其中,所述基金知识图谱的生成过程,包括:
    从外部信息源中抽取多条基金知识数据,设置为知识元库,所述基金知识数据据包括基金经理、所属公司、毕业院校、导师、管理过的基金公司代码或简称、管理过的基金名称、正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值;
    将所述知识元库中的基金经理设置为统一标记,如果两条所述基金知识数据具有同一项所述统一标记,则对两条所述基金知识数据进行合并;
    将所述基金经理设置为实体、所述所属公司、毕业院校、导师、管理过的基金公司代码或简称和管理过的基金名称设置为关系,所述正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值设置为属性,对所述知识元库采用图谱形式存储在图数据库中,生成基金知识图谱。
  3. 根据权利要求2所述的基金经理社团划分方法,其中,所述从外部信息源中抽取多条基金知识数据,包括:
    当所述外部信息源是数据库时,所述数据库中的数据为结构化数据,通过设定规则脚本对所述数据库中的数据进行抽取,得到多条所述基金知识数据;
    当所述外部信息源是网站时,所述网站中的图表数据为半结构化数据,通过爬虫或正规表达式匹配来进行数据抽取,得到多条所述基金知识数据;
    当所述外部信息源是基金研报、基金经理简历或社区评论时,所述外部信 息源是非结构化的文本数据,通过自然语言处理来进行数据抽取,得到多条所述基金知识数据。
  4. 根据权利要求1所述的基金经理社团划分方法,其中,所述每两个所述实体之间的关系,合并成权重W,包括:
    所述实体与另一实体之间的关系是同一所述所属公司,则所述实体与另一实体之间的所述所属公司的关系对应的权重W=1;
    所述实体与另一实体之间的关系是毕业于同一所所述毕业院校,则所述实体与另一实体之间的所述毕业院校的关系对应的权重W=1;
    所述实体与另一实体之间的关系是具有共同的所述导师,则所述实体与另一实体之间的所述导师的关系对应的权重W=2;
    所述实体与另一实体之间的关系是所述管理过的基金公司代码或简称相同,则所述实体与另一实体之间的所述管理过的基金公司代码或简称的关系对应的权重W=2;
    所述实体与另一实体之间的关系是所述管理过的基金名称相同,则所述实体与另一实体之间的所述管理过的基金名称的关系对应的权重W=2;
    对所述实体与另一实体之间的所有关系求和,形成权重W。
  5. 根据权利要求1所述的基金经理社团划分方法,其中,所述调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果,包括:
    计算两个所述初始社团结合时的模块度Q值,将Q值增加最大的或者减少最少的两个社团进行合并形成另一社团,重复进行计算和合并,直到所有社团合并成一个大社团时停止,找出合并过程中Q值最大时的社团划分结果。
  6. 根据权利要求5所述的基金经理社团划分方法,其中,所述模块度Q值的计算公式,包括:
    Figure PCTCN2018124590-appb-100001
    其中,
    Figure PCTCN2018124590-appb-100002
    v和w是任一两个节点,两个节点之间共有m个连接关系,当两个节点直接相连时A vw=1,否则A vw=0,k v、k w分别表示节点v、w的度,2m是整个网络的度,δ(c v,c w)判断节点v和w是否在同一个社区内,在同一个社区内δ(c v,c w)=1,否则δ(c v,c w)=0;
    e ij表示一个节点在社区i内,另一个节点在社区j内的边,那么e ii就表示在社区i内所有边个数与整个网络所有边的个数一个比值,即一个社区内部的度比上整个网络的度,而a i则表示i社区内的节点的度占整个网络的度的比值。
  7. 根据权利要求1所述的基金经理社团划分方法,其中,所述将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中后,还包括:
    通过预设的用户查询界面获得用户输入的某一基金经理社团信息请求;
    访问所述图数据库,根据所述基金经理查询所述社团划分表;
    将所述基金经理所在社团的节点和关系进行提取,并以json数据格式发送给数据可视化D3.js软件,所述D3.js软件将所述节点和关系转换为可视化图表后,返回给所述用户查询界面。
  8. 一种基金经理社团划分***,包括:
    提取实体单元,设置为从基金知识图谱中提取实体,所述基金知识图谱采用图谱形式存储在图数据库中,所述基金知识图谱包括实体和关系;
    获取权重单元,设置为获取每两个所述实体之间的关系,合并成权重W;
    社团划分单元,设置为将每个实体均设置为一个节点,将每个所述节点设置为一个初始社团,两个所述节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果;
    保存单元,设置为将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中。
  9. 根据权利要求8所述的基金经理社团划分***,其中,所述提取实体单元,包括:
    设置知识元库模块,设置为从外部信息源中抽取多条基金知识数据,设置为知识元库,所述基金知识数据据包括基金经理、所属公司、毕业院校、导师、 管理过的基金公司代码或简称、管理过的基金名称、正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值;
    统一标记模块,设置为将所述知识元库中的基金经理设置为统一标记,如果两条所述基金知识数据具有同一项所述统一标记,则对两条所述基金知识数据进行合并;
    生成基金知识图谱模块,设置为将所述基金经理设置为实体、所述所属公司、毕业院校、导师、管理过的基金公司代码或简称和管理过的基金名称设置为关系,所述正在管理的基金类型、正在管理的基金规模、投资风格、投资周期、单位净值和累计净值设置为属性,对所述知识元库采用图谱形式存储在图数据库中,生成基金知识图谱。
  10. 根据权利要求9所述的基金经理社团划分***,其中,所述设置知识元库模块,还设置为当所述外部信息源是数据库时,所述数据库中的数据为结构化数据,通过设定规则脚本对所述数据库中的数据进行抽取,得到多条所述基金知识数据;当所述外部信息源是网站时,所述网站中的图表数据为半结构化数据,通过爬虫或正规表达式匹配来进行数据抽取,得到多条所述基金知识数据;当所述外部信息源是基金研报、基金经理简历或社区评论时,所述外部信息源是非结构化的文本数据,通过自然语言处理来进行数据抽取,得到多条所述基金知识数据。
  11. 根据权利要求8所述的基金经理社团划分***,其中,所述获取权重单元,还设置为:
    所述实体与另一实体之间的关系是同一所述所属公司,则所述实体与另一实体之间的所述所属公司的关系对应的权重W=1;
    所述实体与另一实体之间的关系是毕业于同一所所述毕业院校,则所述实体与另一实体之间的所述毕业院校的关系对应的权重W=1;
    所述实体与另一实体之间的关系是具有共同的所述导师,则所述实体与另一实体之间的所述导师的关系对应的权重W=2;
    所述实体与另一实体之间的关系是所述管理过的基金公司代码或简称相同,则所述实体与另一实体之间的所述管理过的基金公司代码或简称的关系对 应的权重W=2;
    所述实体与另一实体之间的关系是所述管理过的基金名称相同,则所述实体与另一实体之间的所述管理过的基金名称的关系对应的权重W=2;
    对所述实体与另一实体之间的所有关系求和,形成权重W。
  12. 根据权利要求8所述的基金经理社团划分***,其中,所述社团划分单元,还设置为计算两个所述初始社团结合时的模块度Q值,将Q值增加最大的或者减少最少的两个社团进行合并形成另一社团,重复进行计算和合并,直到所有社团合并成一个大社团时停止,找出合并过程中Q值最大时的社团划分结果。
  13. 根据权利要求12所述的基金经理社团划分***,其中,所述模块度Q值的计算公式,包括:
    Figure PCTCN2018124590-appb-100003
    其中,
    Figure PCTCN2018124590-appb-100004
    v和w是任一两个节点,两个节点之间共有m个连接关系,当两个节点直接相连时A vw=1,否则A vw=0,k v、k w分别表示节点v、w的度,2m是整个网络的度,δ(c v,c w)判断节点v和w是否在同一个社区内,在同一个社区内δ(c v,c w)=1,否则δ(c v,c w)=0;
    e ij表示一个节点在社区i内,另一个节点在社区j内的边,那么e ii就表示在社区i内所有边个数与整个网络所有边的个数一个比值,即一个社区内部的度比上整个网络的度,而a i则表示i社区内的节点的度占整个网络的度的比值。
  14. 根据权利要求8所述的基金经理社团划分***,其中,还包括查询单元,设置为通过预设的用户查询界面获得用户输入的某一基金经理社团信息请求;访问所述图数据库,根据所述基金经理查询所述社团划分表;将所述基金经理所在社团的节点和关系进行提取,并以json数据格式发送给数据可视化D3.js软件,所述D3.js软件将所述节点和关系转换为可视化图表后,返回给所 述用户查询界面。
  15. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    从基金知识图谱中提取多个实体,所述基金知识图谱采用图谱形式存储在图数据库中,所述基金知识图谱包括所述实体和关系;
    获取每两个所述实体之间的关系,合并成权重W;
    将每个实体均设置为一个节点,将每个所述节点设置为一个初始社团,两个所述节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果;
    将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中。
  16. 根据权利要求15所述的计算机设备,其中,所述调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果时,使得所述处理器执行以下步骤:
    计算两个所述初始社团结合时的模块度Q值,将Q值增加最大的或者减少最少的两个社团进行合并形成另一社团,重复进行计算和合并,直到所有社团合并成一个大社团时停止,找出合并过程中Q值最大时的社团划分结果。
  17. 根据权利要求16所述的计算机设备,其中,所述模块度Q值的计算公式,包括:
    Figure PCTCN2018124590-appb-100005
    其中,
    Figure PCTCN2018124590-appb-100006
    v和w是任一两个节点,两个节点之间共有m个连接关系,当两个节点直接相连时A vw=1,否则A vw=0,k v、k w分别表示节点v、w的度,2m是整个网络的度,δ(c v,c w)判断节点v和w是否在同一个社区内,在同一个社区内δ(c v,c w)=1, 否则δ(c v,c w)=0;
    e ij表示一个节点在社区i内,另一个节点在社区j内的边,那么e ii就表示在社区i内所有边个数与整个网络所有边的个数一个比值,即一个社区内部的度比上整个网络的度,而a i则表示i社区内的节点的度占整个网络的度的比值。
  18. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:
    从基金知识图谱中提取多个实体,所述基金知识图谱采用图谱形式存储在图数据库中,所述基金知识图谱包括所述实体和关系;
    获取每两个所述实体之间的关系,合并成权重W;
    将每个实体均设置为一个节点,将每个所述节点设置为一个初始社团,两个所述节点之间的权重W设置为度,调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果;
    将所述社团划分结果保存入社团划分表中,所述社团划分表位于所述图数据库中。
  19. 根据权利要求18所述的存储介质,其中,所述调用快速聚类Fast Newman算法,对所述节点进行社团划分,得到社团划分结果时,使得一个或多个所述处理器执行以下步骤:
    计算两个所述初始社团结合时的模块度Q值,将Q值增加最大的或者减少最少的两个社团进行合并形成另一社团,重复进行计算和合并,直到所有社团合并成一个大社团时停止,找出合并过程中Q值最大时的社团划分结果。
  20. 根据权利要求19所述的存储介质,其中,所述模块度Q值的计算公式,包括:
    Figure PCTCN2018124590-appb-100007
    其中,
    Figure PCTCN2018124590-appb-100008
    v和w是任一两个节点,两个节点之间共有m个连接关系,当两个节点直接 相连时A vw=1,否则A vw=0,k v、k w分别表示节点v、w的度,2m是整个网络的度,δ(c v,c w)判断节点v和w是否在同一个社区内,在同一个社区内δ(c v,c w)=1,否则δ(c v,c w)=0;
    e ij表示一个节点在社区i内,另一个节点在社区j内的边,那么e ii就表示在社区i内所有边个数与整个网络所有边的个数一个比值,即一个社区内部的度比上整个网络的度,而a i则表示i社区内的节点的度占整个网络的度的比值。
PCT/CN2018/124590 2018-08-27 2018-12-28 基金经理社团划分方法、***、计算机设备和存储介质 WO2020042501A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810977585.0 2018-08-27
CN201810977585.0A CN109359199A (zh) 2018-08-27 2018-08-27 基金经理社团划分方法、***、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2020042501A1 true WO2020042501A1 (zh) 2020-03-05

Family

ID=65349975

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124590 WO2020042501A1 (zh) 2018-08-27 2018-12-28 基金经理社团划分方法、***、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN109359199A (zh)
WO (1) WO2020042501A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116403A (zh) * 2020-09-28 2020-12-22 中国建设银行股份有限公司 一种信息推荐方法、装置及设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135890A (zh) * 2019-04-15 2019-08-16 深圳壹账通智能科技有限公司 基于知识关系挖掘的产品数据推送方法及相关设备
CN110427494B (zh) * 2019-07-29 2022-11-15 北京明略软件***有限公司 知识图谱的展示方法、装置、存储介质及电子装置
CN111209317A (zh) * 2020-01-15 2020-05-29 同济大学 一种知识图谱异常社区检测方法及装置
CN113312517A (zh) * 2020-02-26 2021-08-27 京东方科技集团股份有限公司 基金知识图谱获取方法、装置和电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298579A (zh) * 2010-06-22 2011-12-28 北京大学 面向科技文献的论文、作者和期刊排序模型及排序方法
CN102521337A (zh) * 2011-12-08 2012-06-27 华中科技大学 一种基于海量知识网络的学术社区***
CN103020302A (zh) * 2012-12-31 2013-04-03 中国科学院自动化研究所 基于复杂网络的学术核心作者挖掘及相关信息抽取方法和***
US9589072B2 (en) * 2011-06-01 2017-03-07 Microsoft Technology Licensing, Llc Discovering expertise using document metadata in part to rank authors

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100318537A1 (en) * 2009-06-12 2010-12-16 Microsoft Corporation Providing knowledge content to users
CN105718528B (zh) * 2016-01-15 2019-06-21 上海交通大学 基于论文间引用关系的学术地图展示方法
CN107273104B (zh) * 2016-04-08 2021-05-28 创新先进技术有限公司 一种配置数据结构的处理方法及装置
CN107016072B (zh) * 2017-03-23 2020-05-15 成都市公安科学技术研究所 基于社交网络知识图谱的知识推理***及方法
CN107133398B (zh) * 2017-04-28 2020-09-01 河海大学 一种基于复杂网络的河流径流量预测方法
CN107194498B (zh) * 2017-04-28 2020-09-01 河海大学 一种水文监测网络的优化方法
CN108287866A (zh) * 2017-12-18 2018-07-17 成都理工大学 一种大规模网络中基于节点密度的社区发现方法
CN108182265B (zh) * 2018-01-09 2021-06-29 清华大学 针对关系网络的多层迭代筛选方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298579A (zh) * 2010-06-22 2011-12-28 北京大学 面向科技文献的论文、作者和期刊排序模型及排序方法
US9589072B2 (en) * 2011-06-01 2017-03-07 Microsoft Technology Licensing, Llc Discovering expertise using document metadata in part to rank authors
CN102521337A (zh) * 2011-12-08 2012-06-27 华中科技大学 一种基于海量知识网络的学术社区***
CN103020302A (zh) * 2012-12-31 2013-04-03 中国科学院自动化研究所 基于复杂网络的学术核心作者挖掘及相关信息抽取方法和***

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116403A (zh) * 2020-09-28 2020-12-22 中国建设银行股份有限公司 一种信息推荐方法、装置及设备

Also Published As

Publication number Publication date
CN109359199A (zh) 2019-02-19

Similar Documents

Publication Publication Date Title
WO2020042501A1 (zh) 基金经理社团划分方法、***、计算机设备和存储介质
US11520812B2 (en) Method, apparatus, device and medium for determining text relevance
US9607048B2 (en) Generation of synthetic context frameworks for dimensionally constrained hierarchical synthetic context-based objects
US10019442B2 (en) Method and system for peer detection
CN109299090B (zh) 基金中心度计算方法、***、计算机设备和存储介质
US10726018B2 (en) Semantic matching and annotation of attributes
US11170306B2 (en) Rich entities for knowledge bases
JP4878624B2 (ja) 文書処理装置および文書処理方法
KR102593171B1 (ko) 정보 처리 방법, 장치, 전자 기기 및 저장 매체
CN111512315A (zh) 文档元数据的按块提取
CN108304380A (zh) 一种融合学术影响力的学者人名消除歧义的方法
Tseng et al. Journal clustering of library and information science for subfield delineation using the bibliometric analysis toolkit: CATAR
CN106528648A (zh) 结合Redis内存数据库的分布式RDF关键词近似搜索方法
US11341418B2 (en) Ascriptive and descriptive entities for process and translation: a limited iterative ontological notation
CN113190593A (zh) 一种基于数字人文知识图谱的搜索推荐方法
Samal et al. Network-centric indicators for fragility in global financial indices
Partyka et al. Enhanced geographically typed semantic schema matching
Wang et al. Extracting a core structure from heterogeneous information network using h-subnet and meta-path strength
CN110941662A (zh) 科研合作关系的图示化方法、***、存储介质、及终端
CN113516553A (zh) 信用风险的预警方法及装置
Maddumage et al. R programming for Social Network Analysis-A Review
Niu et al. Entity resolution with attribute and connection graph
CN111222918A (zh) 关键词挖掘方法、装置、电子设备及存储介质
CN115458103B (zh) 医疗数据处理方法、装置、电子设备及可读存储介质
Ren et al. Key nodes mining for complex networks based on local gravity model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18932180

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.06.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18932180

Country of ref document: EP

Kind code of ref document: A1