CN111813951A - Key point identification method based on technical map - Google Patents
Key point identification method based on technical map Download PDFInfo
- Publication number
- CN111813951A CN111813951A CN202010559077.8A CN202010559077A CN111813951A CN 111813951 A CN111813951 A CN 111813951A CN 202010559077 A CN202010559077 A CN 202010559077A CN 111813951 A CN111813951 A CN 111813951A
- Authority
- CN
- China
- Prior art keywords
- technical
- papers
- centrality
- key
- indexes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000011160 research Methods 0.000 claims abstract description 20
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000012847 principal component analysis method Methods 0.000 claims abstract description 3
- 238000013459 approach Methods 0.000 claims description 6
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000012827 research and development Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 11
- 238000005259 measurement Methods 0.000 abstract description 4
- 230000007547 defect Effects 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 238000011156 evaluation Methods 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 5
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004836 empirical method Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a key point identification method based on a technical atlas, which comprises the following steps: constructing a technical map; performing centrality calculation on the node data in the technical map to obtain key nodes; simplifying the technical indexes of multiple dimensions of the node data by adopting a principal component analysis method; and analyzing the relation between the key nodes and the technical indexes to obtain the key nodes under different dimensions. Compared with the prior art, the method comprehensively considers the network centrality index and the literature measurement of scientific and technological resources, overcomes the defects of single and practical performance deviation of key node indexes in the identification technology map, quantitatively calculates the related indexes of the technology map based on the related theory of the complex network technology, is beneficial to more accurately identifying the key nodes, finds the trend of technical research or a technical trend clue, and provides decision support for scientific and technological innovation.
Description
Technical Field
The invention relates to a data processing method, in particular to a key point identification method based on a technical map.
Background
In the technical map network, the key nodes in the network, namely the key technology and the hot spot technology, are identified, and the method has a great auxiliary effect on the development of the scientific layout work. The traditional discussion of key nodes in the network often exists in the centralized problem and node importance evaluation of a complex network, and the statistical properties of the network are measured through an empirical method. The key node is identified by singly applying a certain measure index or method, so that the one-sidedness is strong, each measure index or method can only reflect the status of the node in the network from a certain side face, and the condition is not in line with the actual situation. In the era of rapid development of the internet, simple measure index combination cannot meet the practical requirements, and higher requirements are provided for the accuracy of identifying key points.
Particularly, the application of the existing network is wider, the application of the network has more practical significance, the measurement degree index from the theoretical angle is not actually fit, and the accuracy of identifying the key node is reduced.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a key point identification method based on a technical graph, and solves the problems of single and practical key point index in the identification technical graph.
The purpose of the invention can be realized by the following technical scheme:
a key point identification method based on a technical atlas comprises the following steps:
constructing a technical map;
performing centrality calculation on the node data in the technical map to obtain key nodes;
simplifying the technical indexes of multiple dimensions of the node data by adopting a principal component analysis method;
and analyzing the relation between the key nodes and the technical indexes to obtain the key nodes under different dimensions.
The technical map is constructed by adopting an entity, relation and attribute extraction method according to scientific and technological achievements of a plurality of websites and databases and fusing knowledge.
The website and the database comprise at least one of a peer-to-peer knowledge network, a national research network, a self-built resource library, research and development organization data, policy and regulation data, industry dynamic data, a patent database and an industry standard database.
The centrality comprises a centrality degree, an approach centrality degree and an intermediate centrality degree.
The dimensions of the technical indexes comprise project horizontal dimensions, talent horizontal dimensions and scientific research result horizontal dimensions.
The technical indexes of the project horizontal dimension comprise total number of projects, fund project categories and scientific research expenditure investment.
The technical indexes of the talent horizontal dimension comprise average age of talents, average scholarly history of talents and number of talents.
In the horizontal dimension of the scientific achievements, the scientific achievements include papers, patents and other achievements.
The technical indexes related to the papers comprise total number of papers, total frequency of quoted papers, number of papers of core periodicals, total frequency of quoted core periodicals, number of fund papers, total frequency of quoted funds, proportion of papers of core periodicals, frequency of quoted total papers, frequency of quoted core periodicals, frequency of quoted fund and H index, the technical indexes related to patents comprise total number of patents and number of invented patents, and the technical indexes related to other achievements comprise achievement awards, achievement identification results, standard number, major compilations or minor major compilations.
And analyzing the relation between the key nodes and the technical indexes by adopting a linear regression method.
Compared with the prior art, the method comprehensively considers the network centrality index and the literature measurement of scientific and technological resources, overcomes the defects of single and practical performance deviation of key node indexes in the identification technology map, quantitatively calculates the related indexes of the technology map based on the related theory of the complex network technology, is beneficial to more accurately identifying the key nodes, finds the trend of technical research or a technical trend clue, and provides decision support for scientific and technological innovation.
Drawings
FIG. 1 is a flowchart of a method for identifying key points based on a technical atlas according to the present embodiment;
FIG. 2 is a technical map constructed in the present example;
fig. 3 is a graph showing the cumulative contribution ratio of each evaluation index in the present embodiment.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Examples
As shown in fig. 1, a method for identifying key points based on a technical atlas includes the following steps:
1) construction of technical maps
Metadata is acquired from the same-party knowledge network, the national research network, the self-built resource base, the external expert and research and development organization data, the internal project and scientific and technological achievement data, policy and regulation data, industry dynamic data, patent data and industry standard data are added, entity, relation and attribute extraction is carried out, entity disambiguation and coreference resolution are carried out on the extracted information, an ontology is extracted, and a technical map is constructed, as shown in fig. 2.
2) In consideration of the statistical indexes of the complex network, the key nodes are positioned based on the sizes of indexes such as degree centrality, approach centrality, betweenness centrality and the like, the nodes with high betweenness centrality and high-frequency characteristics are key technologies in the field and represent the research hotspot subject of the period;
the degree centrality is the sum of the direct connections of one node to other nodes. Because the connection of the technical map is directional, the technical map can be divided into a point-in centrality and a point-out centrality. And comprehensively considering the point-in centrality and the point-out centrality, wherein the calculation formula of the centrality of the node is as follows:where u is a node, n is the number of nodes in the graph, XvuIndicating whether the nodes v and u are directly connected. Centrality is the most direct measure characterizing the centrality of a node in network analysis, which reflects the cohesion of a node. The higher the centrality of a node, the more important the node is in the network;
the recenterness is the reciprocal of the sum of the shortest path distances from one node to all other nodes. Which reflects the proximity between a node and other nodes in the network. The standardized calculation formula of the proximity centrality of the node is as follows:where u is a node, n is the number of nodes in the graph, and d (u, v) is the shortest path distance between another node v and u. Because the connection of the technical atlas is directional, the technical atlas can be divided into an approach centrality and an exit centrality. The approach centrality reflects the integration force of the node and the approach centrality reflects the radiation force of the node;
the betweenness centrality is the number of shortest paths through a node. I.e. the number of times a node acts as a bridge for the shortest path between any other two nodes. The calculation formula of the node betweenness centrality is as follows: where u is a node, p is the total number of shortest paths between nodes s and t, and p (u) is the number of shortest paths between nodes s and t through node u. The higher the number of times a node acts as an "intermediary", the greater the degree of its intermediary, which acts as a "traffic hub" in the network.
3) The method is based on the literature measurement of scientific resources and starts from two aspects of scientific research investment and scientific research achievements;
the scientific research investment is divided into scientific research projects and talent echelons, the scientific research projects comprise total number of projects, fund projects and scientific research fund investment, and the talent echelons comprise average age of talents, average academic history of talents and talent number;
scientific research achievements comprise papers, patents, standards, monographs and achievements, wherein the factors to be considered by the papers are total number of the papers, total frequency of quoted papers, number of core journal papers, total frequency of quoted core journals, number of fund papers, total frequency of fund quoted papers, occupation ratio of core journal papers, total frequency of quoted papers, frequency of quoted core journal papers, frequency of quoted fund papers and H index, the patents comprise total number of patents and number of invented patents, the achievements comprise achievement awards and achievement identifications, and also comprise standard number, main edition or sub-main edition and the like;
4) and converting the multidimensional evaluation indexes defined in 2) and 3) into mutually independent comprehensive evaluation indexes through principal component analysis, eliminating the correlation among the evaluation indexes and simplifying the critical index number of the evaluation nodes.
The invention constructs a technical map for the co-occurrence relation of 200 technologies in scientific and technical data, and evaluates the criticality of the nodes from the dimensions of a network topology structure, a project level, a talent level and scientific research results. Respectively calculating 27 evaluation indexes corresponding to each technology to form a 200 × 27 matrix, and performing principal component analysis on the matrix to obtain a characteristic root, a contribution rate and an accumulated contribution rate, wherein the accumulated contribution rate is shown in fig. 3:
as can be seen from the figure, the cumulative contribution rate of the first 5 principal components reaches 90.79%. Therefore, only the first 5 principal components are selected to sufficiently represent the information contained in the 27 evaluation indexes. The evaluation matrix can be reduced to 200 x 5 by calculating the product of the original index weight matrix corresponding to the first 5 principal components and the evaluation index matrix.
5) By using a linear regression expression, the contribution rates of the former 5 principal components are used as the weights of the principal components, and a key comprehensive value of the node can be obtained. Based on the result of 4), obtaining a comprehensive function for evaluating the criticality of the node:
Z=0.3284*y1+0.1531*y2+0.2157*y3+0.1196*y4+0.0911*y5
through function calculation, the obtained numerical values are sequenced, key nodes can be obtained, and the key nodes are marked with striking colors in the network, so that the key nodes are convenient to identify. In addition, the method can also be adopted for a network formed by subjects such as research fields, authors, research institutions and the like to identify key node nodes in the network.
Claims (10)
1. A key point identification method based on a technical atlas is characterized by comprising the following steps:
constructing a technical map;
performing centrality calculation on the node data in the technical map to obtain key nodes;
simplifying the technical indexes of multiple dimensions of the node data by adopting a principal component analysis method;
and analyzing the relation between the key nodes and the technical indexes to obtain the key nodes under different dimensions.
2. The method for identifying key points based on technical atlases as claimed in claim 1, wherein the technical atlases are constructed by extracting methods of entities, relations and attributes according to scientific and technological achievements of a plurality of websites and databases through knowledge fusion.
3. The method as claimed in claim 2, wherein the website and database includes at least one of a peer-to-peer network, a national research network, a self-built resource base, research and development institution data, policy and regulation data, industry dynamic data, a patent database, and an industry standard database.
4. The method for identifying key points based on technical atlases as claimed in claim 1, wherein the centrality comprises a centrality, a centrality approach and a centrality betweenness.
5. The method for identifying key points based on technical atlases as claimed in claim 1, wherein the dimensions of the technical index include project horizontal dimension, talent horizontal dimension, and scientific research result horizontal dimension.
6. The method as claimed in claim 5, wherein the technical indicators of the project horizontal dimension include total number of projects, fund project category and scientific research expenditure investment.
7. The method as claimed in claim 5, wherein the technical indicators of the horizontal dimension of talents include average age, average scholarship and number of talents.
8. The method as claimed in claim 5, wherein the scientific achievements in the horizontal dimension include papers, patents, and other achievements.
9. The method according to claim 8, wherein the technical indicators related to the papers include total number of papers, total frequency of quoted papers, total number of core journal papers, total frequency of quoted core journals, total number of fund papers, total frequency of fund quoted papers, percentage of core journal papers, percentage of total fund quoted papers, percentage of total papers, percentage of core journal papers, percentage of total fund quoted frequencies, and H-index, the technical indicators related to the patents include total number of patents and number of patents, and the technical indicators related to other achievements include achievement, award identification result, standard number, main edition or sub-main edition.
10. The method of claim 1, wherein the relationship between the key nodes and the technical index is analyzed by linear regression.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010559077.8A CN111813951A (en) | 2020-06-18 | 2020-06-18 | Key point identification method based on technical map |
PCT/CN2020/136036 WO2021253758A1 (en) | 2020-06-18 | 2020-12-14 | Key node identification method based on technology graph |
AU2020327352A AU2020327352B2 (en) | 2020-06-18 | 2020-12-14 | Key node identification method based on technology graph |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010559077.8A CN111813951A (en) | 2020-06-18 | 2020-06-18 | Key point identification method based on technical map |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111813951A true CN111813951A (en) | 2020-10-23 |
Family
ID=72845160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010559077.8A Pending CN111813951A (en) | 2020-06-18 | 2020-06-18 | Key point identification method based on technical map |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN111813951A (en) |
AU (1) | AU2020327352B2 (en) |
WO (1) | WO2021253758A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021253758A1 (en) * | 2020-06-18 | 2021-12-23 | 国网上海市电力公司 | Key node identification method based on technology graph |
WO2023207013A1 (en) * | 2022-04-26 | 2023-11-02 | 广州广电运通金融电子股份有限公司 | Graph embedding-based relational graph key personnel analysis method and system |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114417837B (en) * | 2022-01-19 | 2024-02-13 | 合肥工业大学 | Scientific and technological big data popularity and frontier measurement method based on subject evolution trend |
CN114567562B (en) * | 2022-03-01 | 2024-02-06 | 重庆邮电大学 | Method for identifying key nodes of coupling network of power grid and communication network |
CN116595192B (en) * | 2023-05-18 | 2023-11-21 | 中国科学技术信息研究所 | Technological front information acquisition method and device, electronic equipment and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106295692A (en) * | 2016-08-05 | 2017-01-04 | 北京航空航天大学 | Product initial failure root primordium recognition methods based on dimensionality reduction Yu support vector machine |
CN109446342A (en) * | 2018-10-30 | 2019-03-08 | 沈阳师范大学 | A kind of education of middle and primary schools knowledge mapping analysis method and system based on He Ximan index |
CN110490331A (en) * | 2019-08-23 | 2019-11-22 | 北京明略软件***有限公司 | The processing method and processing device of knowledge mapping interior joint |
WO2020048058A1 (en) * | 2018-09-03 | 2020-03-12 | 平安科技(深圳)有限公司 | Fund knowledge inference method and system, computer device, and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2008338259A1 (en) * | 2007-12-17 | 2009-06-25 | Leximancer Pty Ltd | Methods for determining a path through concept nodes |
CN110032665B (en) * | 2019-03-25 | 2023-11-17 | 创新先进技术有限公司 | Method and device for determining graph node vector in relational network graph |
CN111813951A (en) * | 2020-06-18 | 2020-10-23 | 国网上海市电力公司 | Key point identification method based on technical map |
-
2020
- 2020-06-18 CN CN202010559077.8A patent/CN111813951A/en active Pending
- 2020-12-14 AU AU2020327352A patent/AU2020327352B2/en active Active
- 2020-12-14 WO PCT/CN2020/136036 patent/WO2021253758A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106295692A (en) * | 2016-08-05 | 2017-01-04 | 北京航空航天大学 | Product initial failure root primordium recognition methods based on dimensionality reduction Yu support vector machine |
WO2020048058A1 (en) * | 2018-09-03 | 2020-03-12 | 平安科技(深圳)有限公司 | Fund knowledge inference method and system, computer device, and storage medium |
CN109446342A (en) * | 2018-10-30 | 2019-03-08 | 沈阳师范大学 | A kind of education of middle and primary schools knowledge mapping analysis method and system based on He Ximan index |
CN110490331A (en) * | 2019-08-23 | 2019-11-22 | 北京明略软件***有限公司 | The processing method and processing device of knowledge mapping interior joint |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021253758A1 (en) * | 2020-06-18 | 2021-12-23 | 国网上海市电力公司 | Key node identification method based on technology graph |
WO2023207013A1 (en) * | 2022-04-26 | 2023-11-02 | 广州广电运通金融电子股份有限公司 | Graph embedding-based relational graph key personnel analysis method and system |
Also Published As
Publication number | Publication date |
---|---|
AU2020327352B2 (en) | 2023-01-05 |
AU2020327352A1 (en) | 2022-01-20 |
WO2021253758A1 (en) | 2021-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111813951A (en) | Key point identification method based on technical map | |
JP4920023B2 (en) | Inter-object competition index calculation method and system | |
US20190272329A1 (en) | Statistical process control and analytics for translation supply chain operational management | |
US20060106755A1 (en) | Tracking usage of data elements in electronic business communications | |
Ji et al. | Complexity analysis approach for prefabricated construction products using uncertain data clustering | |
CN106056287A (en) | Equipment and method for carrying out data quality evaluation on data set based on context | |
CN105868956A (en) | Data processing method and device | |
Reda et al. | Towards a data quality assessment in big data | |
Yanhui et al. | A comparative study of first and all-author bibliographic coupling analysis based on Scientometrics | |
CN111143394A (en) | Knowledge data processing method, knowledge data processing device, knowledge data processing medium and electronic equipment | |
Qureshi et al. | OpenRank–a novel approach to rank universities using objective and publicly verifiable data sources | |
CN107798137B (en) | A kind of multi-source heterogeneous data fusion architecture system based on additive models | |
CN113610626A (en) | Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium | |
Shi et al. | [Retracted] Research on Fast Recommendation Algorithm of Library Personalized Information Based on Density Clustering | |
CN115934963B (en) | Commercial draft big data analysis method and application map for enterprise finance acquisition | |
Chen et al. | [Retracted] Credibility Analysis of Accounting Cloud Service Based on Complex Network | |
CN115827994A (en) | Data processing method, device, equipment and storage medium | |
Liu et al. | Application of master data classification model in enterprises | |
US6823294B1 (en) | Method and system for measuring circuit design capability | |
Li et al. | Research on optimization of process parameters of traditional Chinese medicine based on data mining technology | |
Soheili et al. | An evaluation of information behaviour studies through the Scholarly Capital Model | |
Wang et al. | A data quality improvement method based on the greedy algorithm | |
Qu et al. | Research on identification of key processes in machining process based on PageRank algorithm | |
Sikdar et al. | On the effectiveness of the scientific peer-review system: a case study of the Journal of High Energy Physics | |
KR102276448B1 (en) | An invention pattern analysis system using patent classification codes and method of analyzing invention patterns using the patent classification code |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |