CN113051410A

CN113051410A - Scientific research cooperative group discovery method based on density clustering

Info

Publication number: CN113051410A
Application number: CN201911380838.7A
Authority: CN
Inventors: 陈盛之; 李千目
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2021-06-29

Abstract

The invention discloses a scientific research cooperative group discovery method based on density clustering. The method comprises the following steps: importing batch thesis data, and taking the batch thesis data as a training set; preprocessing the thesis data, and extracting expert cooperation information, namely an expert name set in the same thesis; constructing an expert cooperation relationship network by utilizing expert cooperation information, and calculating a shortest path between an expert and the expert by utilizing a Dijkstra algorithm; constructing expert cooperation of each expert, belonging to the field, by utilizing the shortest path between the experts, wherein belonging to the field represents the radius of the field; determining core experts by utilizing the cooperation of the experts belonging to the number of the experts in the field, and constructing a core expert set; and obtaining all the cooperative community clusters in the expert cooperative network according to the core expert set. The invention can quickly find out expert cooperation teams and core experts in a given paper set, and provides effective support for subsequent literature measurement.

Description

Scientific research cooperative group discovery method based on density clustering

Technical Field

The invention relates to the field of discovery of cooperative groups, in particular to a scientific research cooperative group discovery method based on density clustering.

Background

Scientific research cooperation refers to the work form of individual and individual, individual and group, and group of scientific research personnel collaborating with each other according to plans to complete the same scientific research task. Scientific research is a complex and arduous group labor, and the interplay between people in scientific research activities directly influences the completion of scientific research cooperation and scientific research plans. Scientific research has been in a relatively decentralized, disorganized state over a considerable period of past history, and is a "free study" conducted by individuals, and has been largely completed by those "masters" and "inventors".

With the progress of modern scientific research, the importance of scientific research cooperation is being recognized by more and more people. The scientific research cooperation is a need for overcoming scientific difficulty and promoting scientific and technological progress, and how to organize and coordinate the scientific research cooperation at present is an important subject faced by managers. There are many cross-domain collaborations in the scientific research field, and the collaboration is often embodied in the form of a paper. Tracking and mastering the cooperative team relationship of experts is particularly important in the scientific education promotion. However, the discovery of the scientific research cooperation team at present mainly depends on a community discovery related algorithm or manual marking, and the algorithm is high in complexity, low in accuracy and long in time consumption.

Disclosure of Invention

The invention aims to provide a scientific research cooperative group discovery method capable of quickly and accurately finding out expert cooperative groups and core experts in a given paper set, and the core of the scientific research groups and the cores of the scientific research groups are automatically excavated by utilizing the characteristic of good adaptivity of density clustering.

The technical solution for realizing the purpose of the invention is as follows: a scientific research cooperative group discovery method based on density clustering comprises the following steps:

step 1, importing batch thesis data, and taking the batch thesis data as a training set;

step 2, preprocessing the thesis data, and extracting expert cooperation information, namely an expert name set in the same thesis;

step 3, constructing an expert cooperation relationship network by utilizing expert cooperation information, and calculating the shortest path between the experts by utilizing a Dijkstra algorithm;

step 4, establishing expert cooperation of each expert in the domain by using the shortest path between the experts, wherein the expert cooperation belongs to the domain, and the radius of the domain is represented by the belonging to the domain;

step 5, determining core experts by utilizing the cooperation of the experts belonging to the number of the experts in the field, and constructing a core expert set;

and 6, acquiring all the cooperative community clusters in the expert cooperative network according to the core expert set.

Further, the expert cooperation information is used to construct an expert cooperation relationship network in step 3, and a Dijkstra algorithm is used to calculate the shortest path between the experts, specifically as follows:

step 3.1, according to the list information of the authors of the paper, the authors of the paper are taken as nodes, the authors of the paper are connected together by using edges, the reciprocal of the number of the paper is taken as the weight of the edges, the undirected weighted graph G (V, E) of the expert cooperation information is constructed, and the weight w of the edges between the node i and the node j_ijObtained by the following formula:

v represents a point set of the undirected weighted graph of the expert cooperation information, E represents an edge set of the undirected weighted graph of the expert cooperation information, and Count represents the number of cooperation papers among experts;

step 3.2, calculating the shortest path between the expert and the expert in the undirected weighted graph G of the expert cooperation information by utilizing a Dijkstra algorithm, and specifically comprising the following steps:

step 3.2.1, input expert cooperation information undirected weighted graph G (V, E), input target expert name as source point V₀；

Step 3.2.2, representing undirected weighted graph arcs [ m ] by adjacency matrix arcs][n]Representing edges<v_m,v_n>If there is no edge<v_m,v_n>Then arcs [ m ]][n]Infinity, where m, n ∈ { m | v_m∈V}；

Step 3.2.3, set S record the peak of the shortest path, make set S initial value as { v }₀}；

Step 3.2.4, set the array dist [ 2 ]]Recording from the source point v₀To the other respectiveA vertex v_iCurrent shortest path length of, dist i]The initial value is arcs [0 ]][i]Where i ∈ { i | v [ ]_i∈V}；

Step 3.2.5, selecting V from vertex set V-S_jSatisfies dist [ j]＝Min{dist[i]|v_i∈V-S}，v_jI.e. a currently determined one of the variables v₀The end point of the starting shortest path is that the set S is S ═ S { U { v }_j}；

Step 3.2.6, modify the Slave Source Point v₀To each vertex V in the set V-S_kShortest path length of (2): if dist [ j ]]+arcs[j][k]<dist[k]Let dist [ k ]]＝dist[j]+arcs[j][k]；

Step 3.2.7, repeating the steps 3.2.5-3.2.6 until the set V-S is an empty set;

step 3.2.8, output array dist [ ]]Wherein the expert points v_iAnd a target expert source point v₀Has a shortest distance of dist [ i ]]。

Further, the step 4 of constructing the expert cooperation e field of each expert by using the shortest path between the experts specifically includes the following steps:

step 4.1, inputting a preset value epsilon and the expert cooperative relationship network generated in the step 3;

and 4.2, traversing the expert set, and selecting all experts with the distance not more than the e from the expert i as the field of the e of the expert for each expert i.

Further, in the step 5, the core experts are determined by using the cooperation of the experts belonging to the number of experts in the field, and a core expert set is constructed, specifically as follows:

step 5.1, inputting a preset value MinPts and the field belonging to the expert generated in the step 4;

step 5.2, calculating the total number of experts belonging to the field of the experts, and if the value is greater than a preset value MinPts, regarding the experts as core experts and putting the experts into a core expert set;

and 5.3, repeating the step 5.2 until all experts are traversed, and outputting a core expert set.

Further, the step 6 of obtaining all the collaboration group clusters in the expert collaboration network according to the core expert set specifically includes:

step 6.1, arbitrarily selecting a core expert in the core expert set, finding out all experts with reachable density and generating a cooperative group cluster of the experts; the density is defined by the expert as follows:

for expert i and expert j, if there is an expert sequence P₁,P₂,…,P_nIn which P is₁＝i，P_nJ, and P_m+1Is P_mThe expert j is called as the density reachable expert of the expert i, wherein the density reachable expert is defined as follows:

if the expert j belongs to the field of the expert i, the expert j is a direct expert with the density of the expert i;

6.2, removing the experts with the reachable density found in the step 6.1 from the rest of the core experts;

step 6.3, repeating the step 6.1-6.2 from the updated core expert set until all the core experts are traversed or removed;

and 6.4, outputting the cooperative group cluster of the experts.

Compared with the prior art, the invention has the remarkable advantages that: (1) by utilizing the characteristic of good adaptability of density clustering, the centers of scientific research teams and scientific research teams can be automatically excavated, the method is simple, and the efficiency and the accuracy are high; (2) the expert cooperation team and the core experts in a given paper set can be quickly found, and effective support is provided for subsequent literature measurement.

Drawings

FIG. 1 is a schematic flow chart of the scientific research cooperative group discovery method based on density clustering.

Detailed Description

A scientific research cooperative group discovery method based on density clustering comprises the following steps:

Step 3.2.2, representing undirected weighted graph arcs [ m ] by adjacency matrix arcs][n]Indicating side < v >_m,v_n>If there is no edge<v_m,v_n>Then arcs [ m ]][n]Infinity, where m, n ∈ { m | v_m∈V}；

Step 3.2.4, set the array dist [ 2 ]]Recording from the source point v₀To other respective vertex v_iCurrent shortest path length of, dist i]The initial value is arcs [0 ]][i]Where i ∈ { i | v [ ]_i∈V}；

Step 3.2.7, repeating the steps 3.2.5-3.2.6 until the set V-S is an empty set;

and 6.4, outputting the cooperative group cluster of the experts.

The invention is further described with reference to the following figures and detailed description.

Examples

With reference to fig. 1, the invention relates to a scientific research cooperative group discovery method based on density clustering, which comprises the following steps:

step 1, importing batch thesis data;

step 2, preprocessing the thesis data and extracting expert cooperation information;

step 3, constructing an expert cooperation relationship network by utilizing the expert cooperation information, and calculating the shortest path between the experts by utilizing a Dijkstra algorithm, wherein the method specifically comprises the following steps:

step 3.1, according to the list information of the authors of the thesis, the authors are used as nodes,connecting the authors of the co-written papers together by using edges, taking the reciprocal of the number of the co-written papers as the weight of the edges, constructing an undirected weighted graph G (V, E) of the expert cooperation information, and constructing the weight w of the edges between the nodes i and j_ijObtained by the following formula:

wherein V represents the point set of the undirected weighted graph of the expert cooperation information, E represents the edge set of the undirected weighted graph of the expert cooperation information, and Count represents the number of cooperative papers among experts.

Step 3.2, calculating the shortest path between the experts in the cooperative network G by utilizing a Dijkstra algorithm, and specifically comprising the following steps:

Step 3.2.5, selecting V from vertex set V-S_jSatisfies dist [ j]＝Min{dist[i]|v_i∈V-S}，v_jI.e. a currently determined one of the variables v₀An end point of a starting shortest path; let set S ═ U { v } v_j}；

Step 3.2.7, repeat step 3.2.5 and step 3.2.6 until the set V-S is an empty set;

step 3.2.8, output array dist [ ]]Wherein expert v_iAnd a target expert v₀Has a shortest distance of dist [ i ]]。

Step 4, establishing expert cooperation of each expert in the field by using the shortest path between the experts, wherein the expert cooperation belongs to the field as follows:

step 4.2, traversing the expert set, and for each expert i, selecting all experts with the distance from the expert i not more than the belonged to the field as belonged to the field of the expert;

and 5, determining core experts by utilizing the cooperation of the experts belonging to the number of the experts in the field, and constructing a core expert set, wherein the method specifically comprises the following steps:

Step 6, obtaining all the cooperative community clusters in the expert cooperative network according to the core expert set, and specifically comprising the following steps:

step 6.3, repeating the steps 6.1-6.2 from the updated core expert set until all the core experts are traversed or removed;

and 6.4, outputting the cooperative group cluster of the experts.

The method can automatically dig out the scientific research team and the core of the scientific research team by utilizing the characteristic of good adaptability of density clustering, and has the advantages of simple method, high efficiency and high accuracy; the expert cooperation team and the core experts in a given paper set can be quickly found, and effective support is provided for subsequent literature measurement.

Claims

1. A scientific research cooperative group discovery method based on density clustering is characterized by comprising the following steps:

2. The scientific research cooperative community finding method based on density clustering as claimed in claim 1, wherein the expert cooperation relationship network is constructed by using expert cooperation information in step 3, and the shortest path between an expert and an expert is calculated by using Dijkstra algorithm, specifically as follows:

step 3.1, according to the list information of the authors of the paper, the authors are used as nodes, the authors writing the paper together are connected by using edges, the reciprocal of the number of the written papers together is used as the weight of the edges, an undirected weighted graph G (V, E) of the expert cooperation information is constructed, and the weight w of the edges between the node i and the node j is calculated_ijObtained by the following formula:

step 3.2.1, input expert cooperation information undirected weighted graph G ═ V, E, input target expert name as source point V₀；

Step 3.2.7, repeating the steps 3.2.5-3.2.6 until the set V-S is an empty set;

3. The scientific research cooperative community finding method based on density clustering as claimed in claim 1, wherein the expert cooperation e field of each expert is constructed by using the shortest path between the experts in step 4, and the method is as follows:

4. The scientific research cooperative community finding method based on density clustering as claimed in claim 1, wherein the core experts are determined by the number of experts in the expert cooperation E field in step 5, and a core expert set is constructed, specifically as follows:

5. The scientific research cooperative community finding method based on density clustering as claimed in claim 1, wherein all cooperative community clusters in the expert cooperative network are obtained according to the core expert set in step 6, specifically as follows:

and 6.4, outputting the cooperative group cluster of the experts.