CN113886587A - Data classification method based on deep learning and map building method - Google Patents

Data classification method based on deep learning and map building method Download PDF

Info

Publication number
CN113886587A
CN113886587A CN202111176377.9A CN202111176377A CN113886587A CN 113886587 A CN113886587 A CN 113886587A CN 202111176377 A CN202111176377 A CN 202111176377A CN 113886587 A CN113886587 A CN 113886587A
Authority
CN
China
Prior art keywords
article
keywords
matching degree
articles
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111176377.9A
Other languages
Chinese (zh)
Inventor
姚洲鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Fanews Technology Co ltd
Original Assignee
Hangzhou Fanews Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Fanews Technology Co ltd filed Critical Hangzhou Fanews Technology Co ltd
Priority to CN202111176377.9A priority Critical patent/CN113886587A/en
Publication of CN113886587A publication Critical patent/CN113886587A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data classification method based on deep learning and a map building method, wherein the data classification method comprises the following steps: extracting core keywords in the basic articles, calculating the weight values of the core keywords, establishing a first weight correspondence table, extracting the keywords of each basic article, calculating the industry matching degree of each article according to the first weight correspondence table to obtain a first matching degree threshold value, and iterating the basic articles according to the first matching degree; repeating the steps according to the iterative articles to obtain a second weight correspondence and a second matching degree threshold; and judging whether the new article belongs to the target industry or not by utilizing the second matching degree threshold value. The method extracts keywords by using the basic articles, divides the keywords into title keywords and text keywords, gives different adjustment factors, can more effectively calculate the industry matching degree, and then more accurately replaces iteration by using the articles with higher matching degree, releases the space for storing historical data, and more quickly obtains the optimal model.

Description

Data classification method based on deep learning and map building method
Technical Field
The application relates to a data classification method based on deep learning and an establishment method of an industry knowledge graph based on the data classification method, in particular to a self-learning article data classification method based on deep learning.
Background
The text clustering technology can be applied to industry data analysis, massive article data from various fields can be collected by the system through a web crawler every day, the articles are effectively induced and classified by utilizing an algorithm, a user can be helped to quickly know current industry information, and efficient further analysis and processing can be carried out.
At present, for data aggregation and data model establishment in a specific industry, a clustering algorithm is generally adopted to cluster data, and then manual statistical classification is carried out on the clustered data. However, the clustering algorithm needs to store all the historical document information, which causes storage burden; moreover, a large amount of newly added texts exist in each business every day, and when the number of the texts is increased, the number of the historical document information is increased, so that the analysis and calculation efficiency of the clustering algorithm is reduced, therefore, the clustering algorithm is only suitable for scenes with small data volume, the efficiency is reduced when the data volume is large, and meanwhile, the manual classification cost is increased.
Disclosure of Invention
In order to solve the problem of low efficiency when clustering articles in the same industry by a clustering algorithm, the application provides an article classification method, which classifies the articles by using keyword weight and the matching degree of the articles and models.
A data classification method based on deep learning comprises the following steps:
acquiring a plurality of basic articles, extracting a plurality of core keywords from the basic articles, calculating the weight values of the core keywords, and establishing a first weight correspondence table according to the core keywords and the weight values;
extracting title keywords and text keywords in each basic article, inquiring the weight values of the title keywords and the text keywords according to a first weight correspondence table, and calculating the industry matching degree of each basic article according to the weight values of the title keywords and the text keywords;
obtaining a first matching degree threshold value according to the industry matching degree of each basic article;
extracting title keywords and text keywords of an article to be matched, calculating the industry matching degree of the article to be matched, replacing a basic article with the lowest industry matching degree in the basic articles by the article to be matched when the industry matching degree of the article to be matched is greater than a first matching degree threshold value, and stopping iteration to obtain an article classification model when the industry matching degrees of all the basic articles are greater than the first matching degree threshold value;
repeating the steps by using the articles in the article classification model to obtain a second weight corresponding table, and obtaining a second matching degree threshold according to the second weight corresponding table;
and calculating the industry matching degree of the article to be calculated by using the article classification model, and judging that the article to be calculated belongs to the target industry when the industry matching degree of the article to be calculated is greater than the second matching degree threshold value.
Further, the method for calculating the weight value of the core keyword specifically includes:
Figure BDA0003295241200000021
wordw [ i ] indicates the weight of the ith keyword in the model, fq [ i ] indicates the frequency of the ith keyword in the basic article, fqm [ i ] indicates the frequency of the ith keyword in all the basic articles, and k indicates the number of the basic articles.
Further, the method for calculating the industry matching degree of each basic article specifically comprises the following steps:
Figure BDA0003295241200000022
titlew is a title adjusting factor, content text adjusting factor, title [ i ] is a weight value corresponding to the ith keyword appearing in the title in the first weight corresponding table, and content [ i ] is a weight value corresponding to the ith keyword appearing in the text in the first weight corresponding table.
Further, an algorithm for extracting a plurality of core keywords from a plurality of the basic articles is a TextRank algorithm.
Further, the method for acquiring a plurality of basic articles comprises the following steps: data are collected by using a crawler technology, stored by using an Elastic Search cluster, and retrieved in full text by using a Hanlp word segmentation device.
Further, calculating the industry matching degree of the article to be calculated specifically as follows:
Figure BDA0003295241200000023
titlew is a title adjusting factor, content text adjusting factor, title [ i ] is a weight value corresponding to the ith keyword appearing in the title in the second weight corresponding table, and content [ i ] is a weight value corresponding to the ith keyword appearing in the title in the second weight corresponding table.
The invention also provides a map establishing method, which is used for obtaining article data of the target industry by using the data classification method based on deep learning and establishing a knowledge map of the target industry according to the article data.
Further, a method for establishing a target industry knowledge graph according to the article data specifically comprises the following steps: the method comprises the steps of sampling articles according to keyword relevance to obtain sampled articles, extracting a plurality of keywords in the sampled articles, calculating derivatives of the keywords through mutual information entropy, and establishing an industry knowledge graph according to the keywords and the derivatives.
Further, the method also comprises the steps of calculating the weight values of the keywords and the derivatives, and sequencing the keywords and the derivatives according to the weight values.
The invention also provides an article classification device, which comprises a memory and a processor, wherein the memory is used for storing a data processing program, and the data processing program executes the deep learning-based data classification method as claimed in the right when being read and executed by the processor.
The invention has the beneficial effects that:
the invention utilizes the basic articles to extract keywords and divide the keywords into title keywords and text keywords, and different adjustment factors are given, so that the industry matching degree of a certain article can be more effectively calculated, and an article classification model is formed. And then, the article classification model is subjected to more accurate replacement iteration by using the article with higher matching degree, all historical data are not required to be stored, the storage space of the system is released, the calculated amount can be reduced by fewer articles, and the optimal model can be obtained more quickly in the process of optimizing the model. And then, quickly identifying the target industry article through a matching degree threshold value. After the latest articles of a certain industry are quickly obtained, the knowledge graph is established according to the fresh articles, the massive data is quickly and efficiently retrieved, and the method is beneficial to quickly concentrating themes in the massive data of various industries and acquiring valuable information.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of an article classification method;
FIG. 2 is a schematic diagram of a method of industry knowledge graph creation.
Detailed Description
In order to make the purpose, features and advantages of the present application more obvious and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the embodiments described below are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The invention is further elucidated with reference to the drawings and the embodiments.
In the description of the present application, it is to be understood that the terms "upper", "lower", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing the present application and simplifying the description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present application.
Example 1
The application provides a data classification method based on deep learning, which comprises the steps of firstly randomly selecting articles to build a model, then utilizing other articles to enable the model to carry out self-learning, obtaining a final article classification model through iteration, and judging industry articles by using the final article classification model. The method specifically comprises the following steps:
the method comprises the steps of collecting data by using a crawler technology, storing by using an Elastic Search cluster, performing full-text retrieval by using a Hanlp word segmentation device, acquiring a plurality of basic articles of the same industry from data of an industry website, extracting a plurality of core keywords from the plurality of basic articles by using a TextRank algorithm, calculating weight values of the core keywords, and establishing a first weight correspondence table according to the core keywords and the weight values.
Taking the forestry industry as an example, we picked 5000 articles of the china forestry information network as basic articles. And extracting 500 core keywords in total from all the basic articles, calculating the weight values of the core keywords, and establishing a first weight corresponding table by using the serial numbers, the core keywords and the weight values.
And extracting title keywords and text keywords in each basic article, inquiring the weight values of the title keywords and the text keywords according to the first weight correspondence table, and calculating the industry matching degree of each basic article according to the weight values of the title keywords and the text keywords.
Obtaining a first matching degree threshold value according to the industry matching degree of each basic article;
extracting title keywords and text keywords of the articles to be matched, calculating the industry matching degree of the articles to be matched, replacing one basic article with the lowest industry matching degree in the basic articles by the articles to be matched when the industry matching degree of the articles to be matched is larger than a first matching degree threshold, and stopping iteration to obtain an article classification model when the industry matching degrees of all the basic articles are larger than the first matching degree threshold. The method is a self-updating learning process of the model, and aims to search a more similar article forming model with higher matching degree for subsequent detection of the article to be detected, so that the article classification is more accurate.
And repeating the steps by using the articles in the article classification model to obtain a second weight corresponding table, and obtaining a second matching degree threshold according to the second weight corresponding table.
And calculating the industry matching degree of the article to be calculated by using the article classification model, and judging that the article to be calculated belongs to the target industry when the industry matching degree of the article to be calculated is greater than the second matching degree threshold value.
The method for calculating the weight value of the core keyword specifically comprises the following steps:
Figure BDA0003295241200000051
wordw [ i ] indicates the weight of the ith keyword in the model, fq [ i ] indicates the frequency of the ith keyword in the basic article, fqm [ i ] indicates the frequency of the ith keyword in all the basic articles, and k indicates the number of the basic articles.
The first weight correspondence table of 500 core keywords is obtained through the above formula calculation, where fq [ i ] corresponds to wordw [ i ], for example:
serial number Core key word Weight of
1 Forest (forest) 985
2 Woodlands 658
3 Grassland 780
4 …… ……
As shown in the above chart, the core tube corresponding to wordw [1] detects a forest, and the weight value of the forest is 985.
The method for calculating the industry matching degree of each basic article specifically comprises the following steps:
Figure BDA0003295241200000052
titlew is a title adjusting factor used for controlling the importance proportion of the title in calculation and is generally set to be 1;
a Contentw text adjustment factor, which is used for controlling the importance proportion of text content in calculation and is generally set to be 3;
title [ i ] refers to the weight value of the ith keyword appearing in the title corresponding to the first weight correspondence table. For example, the title of the article is 'forest or grassland occupying the main body in vegetation in China', then the core keywords in the title include forest and grassland, the corresponding relation is found in the first weight table, then title [1] is the weight value of forest and is the weight value 985 of wordw [1 ];
content [ i ] indicates a weight value corresponding to the ith keyword appearing in the title in the first weight correspondence table.
By the method, the industry matching degree of the 5000 basic articles is 8000-20000 through calculation, then the first matching degree threshold can be set to a numerical value between 8000 and 20000, and the first matching degree threshold is 12000 in the embodiment.
And then obtaining a batch of articles to be matched, extracting title keywords and text keywords of the articles to be matched, calculating the industry matching degree by using the industry matching degree calculation formula, and replacing the article with the lowest industry matching degree in the basic articles, namely replacing the basic article with the industry matching degree of 8000 when the industry matching degree of the article is larger than a first matching degree threshold value 12000. And performing iteration in the above manner by using other articles to be matched, and finally stopping iteration when the industry matching degrees of the articles in the model are all larger than the first matching degree threshold value 12000 to obtain a final article classification model.
Taking the article in the current article classification model as a basic article, repeating the steps of calculating the weight value and calculating the matching degree, and finally obtaining a second weight correspondence table and a second matching degree threshold value for subsequently judging whether the new article belongs to the article of the target industry. When it needs to be explained, the second matching degree threshold may be the lowest industry matching degree in all articles in the updated article classification model, or may be a numerical value between the lowest matching degree and the highest matching degree, and the second matching degree threshold may be formulated according to the requirement.
When a new article appears, calculating the industry matching degree of the new article, and judging by using a second matching degree threshold, specifically:
Figure BDA0003295241200000061
titlew is a title adjusting factor used for controlling the importance proportion of the title in calculation and is generally set to be 1;
a Contentw text adjustment factor, which is used for controlling the importance proportion of text content in calculation and is generally set to be 3;
title [ i ] refers to the weight value of the ith keyword appearing in the title corresponding to the second weight correspondence table. For example, the title of the article is 'forest or grassland occupying the main body in vegetation in China', then the core keywords in the title include forest and grassland, the corresponding relation is found in the second weight table, then title [1] is the weight value of forest and is the weight value 985 of wordw [1 ];
content [ i ] indicates the weight value corresponding to the ith keyword appearing in the text in the second weight correspondence table.
When the industry matching degree of the new article is larger than the second matching threshold, the new article belongs to the target industry and can be used as a database established by a subsequent industry knowledge graph.
Example 2
The embodiment provides a method for establishing a knowledge graph of a target industry, and particularly provides a method for establishing a knowledge graph of a target industry. The method specifically comprises the following steps:
the method comprises the steps of sampling an article according to keyword relevance to obtain a sampled article, extracting a plurality of keywords in the sampled article, calculating derivatives of the keywords through mutual information entropy, calculating weight values of all the keywords and the derivatives, sequencing the keywords and the derivatives according to the weight values, establishing a topological relation between the keywords and the derivatives, forming a mesh structure diagram, and obtaining an industry knowledge map.
The method for calculating the weight value specifically comprises the following steps:
Figure BDA0003295241200000071
wherein Weight is used to represent Weight;
subsetFreq is used for representing the word frequency of the word in a result set with high relevancy;
subsetSize is used to represent the size of the result set;
superfeq is used to represent the word frequency of the entire database;
superSize is used to indicate the size of the entire database;
natureBoost is used for expressing part-of-speech weights, wherein the weights of nouns and verbs are more than the weights of other parts-of-speech;
fieldBoost is used to represent field weights, where title weights are greater than content weights.
As shown in fig. 2, when the weight of the keyword is larger, the circle represented by the word is larger, and the circle of the derivative word is smaller than that of the keyword. Specific methods of establishment can be found in the description section of the patent applications published under the numbers "CN 112100399A" and "CN 112100330A".
And creating a knowledge graph model by each created keyword and the corresponding derived word group, and sequencing all the keywords and the derived words from top to bottom according to the self weight sequence, so that a user can intuitively inquire the derived word groups related to the keywords, look up the weight of each derived word in the derived word groups, and further perform corresponding search by adopting the knowledge graph model. Therefore, the node set can be obtained by adopting the knowledge graph model, searching can be carried out according to the node set, graph searching corresponding to the node set in the mass data can be rapidly and efficiently found, and the method is beneficial to rapid subject concentration in the mass data and valuable information acquisition in various industries.
Example 3
This embodiment provides an article classification device, comprising a memory and a processor, wherein the memory is used for storing a data processing program, and the data processing program, when read and executed by the processor, executes the deep learning-based data classification method of embodiment 1.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed.
The units may or may not be physically separate, and components displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing.
More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wire segments, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless section, wire section, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims (10)

1. A data classification method based on deep learning is characterized by comprising the following steps:
acquiring a plurality of basic articles, extracting a plurality of core keywords from the basic articles, calculating the weight values of the core keywords, and establishing a first weight correspondence table according to the core keywords and the weight values;
extracting title keywords and text keywords in each basic article, inquiring the weight values of the title keywords and the text keywords according to a first weight correspondence table, and calculating the industry matching degree of each basic article according to the weight values of the title keywords and the text keywords;
obtaining a first matching degree threshold value according to the industry matching degree of each basic article;
extracting title keywords and text keywords of an article to be matched, calculating the industry matching degree of the article to be matched, replacing a basic article with the lowest industry matching degree in the basic articles by the article to be matched when the industry matching degree of the article to be matched is greater than a first matching degree threshold value, and stopping iteration to obtain an article classification model when the industry matching degrees of all the basic articles are greater than the first matching degree threshold value;
repeating the steps by using the articles in the article classification model to obtain a second weight corresponding table, and obtaining a second matching degree threshold according to the second weight corresponding table;
and calculating the industry matching degree of the article to be calculated by using the article classification model, and judging that the article to be calculated belongs to the target industry when the industry matching degree of the article to be calculated is greater than the second matching degree threshold value.
2. The deep learning-based data classification method according to claim 1, wherein the method for calculating the weight value of the core keyword specifically comprises:
Figure FDA0003295241190000011
wordw [ i ] indicates the weight of the ith keyword in the model, fq [ i ] indicates the frequency of the ith keyword in the basic article, fqm [ i ] indicates the frequency of the ith keyword in all the basic articles, and k indicates the number of the basic articles.
3. The deep learning-based data classification method according to claim 2, wherein the calculation method for calculating the industry matching degree of each basic article specifically comprises the following steps:
Figure FDA0003295241190000012
titlew is a title adjusting factor, content text adjusting factor, title [ i ] is a weight value corresponding to the ith keyword appearing in the title in the first weight corresponding table, and content [ i ] is a weight value corresponding to the ith keyword appearing in the title in the first weight corresponding table.
4. The deep learning based data classification method according to claim 1, wherein an algorithm for extracting a plurality of core keywords from a plurality of the basic articles is a TextRank algorithm.
5. The deep learning based data classification method according to claim 1, wherein the method for obtaining a plurality of basic articles comprises: data are collected by using a crawler technology, stored by using an Elastic Search cluster, and retrieved in full text by using a Hanlp word segmentation device.
6. The deep learning-based data classification method according to claim 2, wherein the industry matching degree of the article to be calculated is calculated, and specifically:
Figure FDA0003295241190000021
titlew is a title adjusting factor, content text adjusting factor, title [ i ] is a weight value corresponding to the ith keyword appearing in the title in the second weight corresponding table, and content [ i ] is a weight value corresponding to the ith keyword appearing in the text in the second weight corresponding table.
7. A map establishing method is characterized in that article data of a target industry are obtained by applying the data classification method based on deep learning of any one of claims 1 to 6, and a target industry knowledge map is established according to the article data.
8. The graph establishing method according to claim 7, wherein the method for establishing the target industry knowledge graph according to the article data specifically comprises the following steps: the method comprises the steps of sampling articles according to keyword relevance to obtain sampled articles, extracting a plurality of keywords in the sampled articles, calculating derivatives of the keywords through mutual information entropy, and establishing an industry knowledge graph according to the keywords and the derivatives.
9. The method for building a spectrum according to claim 8, further comprising calculating weight values of the keywords and derivatives, and sorting the keywords and derivatives according to the weight values.
10. An article classification device, comprising a memory and a processor, wherein the memory is used for storing a data processing program, and the data processing program is used for executing the deep learning-based data classification method of any one of claims 1 to 6 when being read and executed by the processor.
CN202111176377.9A 2021-10-09 2021-10-09 Data classification method based on deep learning and map building method Pending CN113886587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111176377.9A CN113886587A (en) 2021-10-09 2021-10-09 Data classification method based on deep learning and map building method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111176377.9A CN113886587A (en) 2021-10-09 2021-10-09 Data classification method based on deep learning and map building method

Publications (1)

Publication Number Publication Date
CN113886587A true CN113886587A (en) 2022-01-04

Family

ID=79005894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111176377.9A Pending CN113886587A (en) 2021-10-09 2021-10-09 Data classification method based on deep learning and map building method

Country Status (1)

Country Link
CN (1) CN113886587A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817425A (en) * 2022-06-28 2022-07-29 成都交大大数据科技有限公司 Method, device and equipment for classifying cold and hot data and readable storage medium
CN115641149A (en) * 2022-08-27 2023-01-24 北京华宜信科技有限公司 Customized data asset valuation method
CN116484027A (en) * 2023-06-20 2023-07-25 北京中科智易科技股份有限公司 Military equipment map system established based on knowledge map technology
CN116910277A (en) * 2023-09-13 2023-10-20 之江实验室 Knowledge graph construction method, resource searching method, computer equipment and medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817425A (en) * 2022-06-28 2022-07-29 成都交大大数据科技有限公司 Method, device and equipment for classifying cold and hot data and readable storage medium
CN114817425B (en) * 2022-06-28 2022-09-02 成都交大大数据科技有限公司 Method, device and equipment for classifying cold and hot data and readable storage medium
CN115641149A (en) * 2022-08-27 2023-01-24 北京华宜信科技有限公司 Customized data asset valuation method
CN115641149B (en) * 2022-08-27 2023-06-27 北京华宜信科技有限公司 Customized data asset valuation method
CN116484027A (en) * 2023-06-20 2023-07-25 北京中科智易科技股份有限公司 Military equipment map system established based on knowledge map technology
CN116484027B (en) * 2023-06-20 2023-08-22 北京中科智易科技股份有限公司 Military equipment map system established based on knowledge map technology
CN116910277A (en) * 2023-09-13 2023-10-20 之江实验室 Knowledge graph construction method, resource searching method, computer equipment and medium
CN116910277B (en) * 2023-09-13 2024-02-27 之江实验室 Knowledge graph construction method, resource searching method, computer equipment and medium

Similar Documents

Publication Publication Date Title
CN113886587A (en) Data classification method based on deep learning and map building method
US8560531B2 (en) Search tool that utilizes scientific metadata matched against user-entered parameters
CN107291895B (en) Quick hierarchical document query method
CN110457405B (en) Database auditing method based on blood relationship
CN113420190A (en) Merchant risk identification method, device, equipment and storage medium
CN111914159B (en) Information recommendation method and terminal
CN112035598A (en) Intelligent semantic retrieval method and system and electronic equipment
CN102411563A (en) Method, device and system for identifying target words
CN107122382A (en) A kind of patent classification method based on specification
CN110046889B (en) Method and device for detecting abnormal behavior body and server
CN109376352A (en) A kind of patent text modeling method based on word2vec and semantic similarity
CN105787097A (en) Distributed index establishment method and system based on text clustering
CN111326236A (en) Medical image automatic processing system
CN111221954A (en) Method, device, storage medium and terminal for constructing household appliance maintenance question-answer library
CN110688593A (en) Social media account identification method and system
CN110609952A (en) Data acquisition method and system and computer equipment
CN114327964A (en) Method, device, equipment and storage medium for processing fault reasons of service system
CN114265927A (en) Data query method and device, storage medium and electronic device
CN116881430A (en) Industrial chain identification method and device, electronic equipment and readable storage medium
CN113127464B (en) Agricultural big data environment feature processing method and device and electronic equipment
CN109657060B (en) Safety production accident case pushing method and system
CN111026940A (en) Network public opinion and risk information monitoring system and electronic equipment for power grid electromagnetic environment
CN108664548B (en) Network access behavior characteristic group dynamic mining method and system under degradation condition
CN116401338A (en) Design feature extraction and attention mechanism based on data asset intelligent retrieval input and output requirements and method thereof
CN115712720A (en) Rainfall dynamic early warning method based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination