CN114117215A - Government affair data personalized recommendation system based on mixed mode - Google Patents
Government affair data personalized recommendation system based on mixed mode Download PDFInfo
- Publication number
- CN114117215A CN114117215A CN202111383044.3A CN202111383044A CN114117215A CN 114117215 A CN114117215 A CN 114117215A CN 202111383044 A CN202111383044 A CN 202111383044A CN 114117215 A CN114117215 A CN 114117215A
- Authority
- CN
- China
- Prior art keywords
- government affair
- affair data
- data
- recommendation
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000001914 filtration Methods 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims description 10
- 230000011218 segmentation Effects 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000003058 natural language processing Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 3
- 238000012800 visualization Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 5
- 238000009826 distribution Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Tourism & Hospitality (AREA)
- Software Systems (AREA)
- Economics (AREA)
- Educational Administration (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a government affair data personalized recommendation system based on a mixed mode, which comprises a data retrieval and preprocessing module, a theme extraction and text clustering module, a content-based government affair data recommendation module, a collaborative filtering-based government affair data recommendation module and a mixed method-based government affair data recommendation module.
Description
Technical Field
The invention relates to a government affair big data analysis technology, in particular to a government affair data personalized recommendation system based on a mixed mode.
Background
The government data open platform is an official platform which is built by government departments and used for publishing owned data to the public, the data platform is like a one-stop market and is used for the public to obtain the data which are required by the public and come from different government departments, but in the face of huge and mixed government data, a data user is difficult to obtain the data meeting the self requirement from massive government data, so that the problem under the data utilization efficiency is more severe.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a government affair data personalized recommendation system based on a mixed mode.
In order to achieve the purpose, the invention adopts the following technical scheme:
a mixed-mode-based government affair data personalized recommendation system comprises: a data retrieval and preprocessing module, a theme extraction and text clustering module, a content-based government affair data recommending module, a collaborative filtering-based government affair data recommending module, a hybrid method-based government affair data recommending module, wherein,
the data retrieval and preprocessing module is used for performing natural language processing on the retrieved data in the specific industry field and converting the text vector into a word vector; sorting, updating and iterating the word segmentation data sets to obtain an optimal word segmentation result set;
the topic extraction and text clustering module is used for extracting topics from government affair data by adopting an LDA (latent dirichlet allocation) model to obtain a document-topic probability matrix and summarizing the topics by using the most relevant semantic words;
calculating an initial clustering center value of a K-means algorithm according to the document-theme probability matrix, further performing text clustering by using the K-means algorithm, setting a clustering number, an initial clustering center and iteration times, and realizing the clustering division of government affair data;
the content-based government affair data recommendation module finds the key concern theme of the target user, calculates based on the similarity of the average theme, and summarizes the government affair data to form a content-recommendation-based government affair data list;
the government affair data recommending module based on collaborative filtering finds out the potential interest topic of the target user, counts other interest user sets in the field, recommends the government affair data concerned by the interest users to the target user, and forms a government affair data list based on collaborative filtering recommendation;
and the government affair data recommendation module based on the mixed method combines two recommendation methods based on content and collaborative filtering, and adopts a combination mode of weighting, mixing, feature combination and the like to display an optimal recommendation result.
Preferably, in the data retrieval and preprocessing module, the jiebaR package in the R language is used to perform natural language processing such as text word segmentation, word stop removal, word screening and the like on the government affair data set; and performing optimized sorting such as dictionary updating, multi-turn iteration and the like on the word segmentation result set.
Preferably, in the topic extraction and text clustering module, the lda package of the R language is used to perform topic extraction on the experimental corpus; displaying the theme visualization result by using an LDAvis package, adjusting the number, alpha value and beta value of the themes, determining the optimal number of the themes by using multidimensional scale analysis, and judging the excellent degree of the theme model extraction result; and fusing the LDA model and a K-means algorithm, determining an initial clustering center on the dimensionality of K subjects according to a document-subject probability matrix extracted by the LDA, setting the clustering number and the iteration times, and realizing the clustering division of the patent text.
Preferably, in the content-based government affair data recommendation module, the key concern topics with the largest browsing and downloading number of target users are counted, and the government affair data sets of the key concern topics are summarized; and summarizing the government affair data sets of the technical subject, calculating cosine similarity, and ranking to form a government affair data list with descending average cosine similarity.
Preferably, in the collaborative filtering-based government affair data recommendation module, the potential interest topic of the target user is counted, other interest users under the potential interest topic are found, and a government affair data list with decreasing average cosine similarity is formed by cosine similarity calculation and ranking.
Preferably, the government affair data recommendation module based on the hybrid method combines the content-based and collaborative recommendation-based government affair data recommendation methods, so that the target user is concerned about the emphasis of the target user on the topic of focus, and the requirements of the target user on the potential topic of interest are considered, and the optimal recommendation result of the government affair data is formed.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the LDA model is an unsupervised machine learning technique. The invention adopts the LDA model to extract the subject of the patent text. The model assumes that words are generated from a mixture of topics, and that each topic is a polynomial distribution over a fixed vocabulary, with the topics being shared by all documents in the collection, each document having a specific topic proportion, sampled from a Dirichlet distribution. As a production model, the structural model is complete and clear, and a high-efficiency probabilistic inference algorithm is adopted to process large-scale data, so that the model is a topic identification model which is widely researched and used at present.
2. The K-means clustering algorithm is an unsupervised learning algorithm and is one of ten classical algorithms for data mining. The invention adopts a K-means algorithm to realize the division of the data text. In consideration of the technical discussion of the data text, the technical subject has the characteristics of specificity and deepening, so that the data text is only classified into one subject cluster when the clustering is carried out. Clustering analysis is an important research content in knowledge discovery, and aims to divide a data set into a plurality of classes, so that intra-class differences are small and inter-class differences are large. As an algorithm based on division, the method has the advantages of simple thought, easy implementation and time complexity close to linearity, has high efficiency and scalability on large-scale data mining, and is widely applied to research of text clustering.
3. Cosine similarity is a similarity calculation method which is most widely applied and is suitable for calculating similarity between patent texts. Mathematically, the difference between two individuals is measured by the cosine value of the included angle between two vectors in the vector space, the text vector is constructed by the word frequency vector, and the text similarity is compared. Because the cosine similarity emphasizes the difference between the two samples in the direction, and the Euclidean distance calculation is based on the absolute numerical value of each dimension characteristic, the dimension indexes are required to be ensured to be at the same scale level, and therefore the cosine similarity is adopted for calculating the data text similarity between enterprises and colleges and universities.
The core technology related by the invention extends through the whole process of 'data retrieval-data processing-data storage-data analysis-data application', covers natural language processing, theme modeling, text clustering, data visualization and the like, and has important significance for promoting government affair data to be open to the society, solving the open utilization predicament of government affair data and supporting public digital culture construction.
Drawings
FIG. 1 is a block diagram of an embodiment of the present invention.
Detailed Description
To facilitate understanding and practice of the invention by those of ordinary skill in the art, the invention is described in further detail below with reference to the accompanying drawings, it being understood that the present examples are set forth merely to illustrate and explain the invention and are not intended to limit the invention.
As shown in fig. 1, a mixed-mode-based personalized government affair data recommendation system includes a data retrieval and preprocessing module, a topic extraction and text clustering module, a content-based government affair data recommendation module, a collaborative filtering-based government affair data recommendation module, and a hybrid-method-based government affair data recommendation module, wherein:
and the data retrieval and preprocessing module is used for retrieving government affair data in a specific industry field by taking a certain province and government affair data open platform as a data source and cleaning the data. Extracting government affair data names, forming an analysis corpus, constructing a professional dictionary in the technical field, and performing natural language processing such as word segmentation, word stop, word screening and the like by means of a jiebaR packet of R language;
the theme extraction and text clustering module is used for carrying out theme modeling by utilizing lda packages of the R language; and visually displaying the theme result by using the LDAvis package, and judging the excellent degree of the theme model extraction result based on multi-dimensional scale analysis. In order to enable the themes to be relatively independent and the theme similarity to be small, the number of the themes is set to be 10, and the alpha and beta values are fixed to be 0.02 and 0.7;
extracting the topics by using an LDA model to obtain the most relevant semantic words under each topic, and summarizing the topics; training a document-theme probability model matrix, and calculating the clustering number and the initial clustering center of the K-means algorithm; importing the document-theme probability model matrix into an SPSS, and setting a clustering number and an initial clustering center to obtain a government affair data text clustering result;
the content-based government affair data recommendation module finds the key concern theme of the target user, calculates based on the similarity of the average theme, and summarizes the government affair data to form a content-recommendation-based government affair data list;
the government affair data recommending module based on collaborative filtering finds out the potential interest topic of the target user, counts other interest user sets in the field, recommends the government affair data concerned by the interest users to the target user, and forms a government affair data list based on collaborative filtering recommendation;
and the government affair data recommendation module based on the mixed method combines two recommendation methods based on content and collaborative filtering, and adopts a combination mode of weighting, mixing, feature combination and the like to display an optimal recommendation result.
The government affair data recommending module is used for analyzing the browsing and downloading quantity of the government affair data of the target user under each technical theme, determining the key concern theme and the potential interest theme of the target user, and counting other interest users under the potential interest theme; and forming a government affair data recommendation list for the target user by adopting a mixed recommendation mode based on the content and the collaborative filtering according to the analysis result.
One example of use is:
the specific implementation process is as follows:
(1) data retrieval and preprocessing: the block chain is used as the research industry field, the block chain is used as a name to be searched in a public data open platform of Shanghai city to obtain related texts, policies and data, and a target user selects the Shen technology (Shenzhen) Limited company;
(2) topic modeling and text clustering: when the industry field theme is extracted, 10 themes respectively extract the most relevant 10 words to summarize the 10 themes; when the government affair data are clustered, eliminating the theme 3 to obtain 9 important themes; calculating initial clustering centers (0.721157151, 0.724248556, 0.713041588, 0.733758854, 0.72711089, 0.736014371, 0.703095687, 0.702814238, 0.69800075 and 0.734391872) of 9 themes (block chain deployment, medical industry application, intelligent contracts, identity authentication, encryption technology, consensus mechanism, data tracing, Token and block chain finance) by using a K-means algorithm, training a document-theme probability matrix, importing the document-theme probability matrix into an SPSS, and setting a clustering number and the initial clustering centers to obtain the government affair data distribution condition of each theme;
(3) and (3) government affair data recommendation and display: according to the government affair data text clustering result, 4 topics with a large number of downloads and browses are determined to be key attention topics of the safe science and technology, other topics with 6 patents with a small distribution are determined to be potential interest topics, and other interest users with the largest number of browses and downloads under the potential interest topics are counted; and fusing the content recommendation based on the key attention topic and the collaborative filtering recommendation based on other interested users to form a final government affair data recommendation list.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. A government affair data personalized recommendation system based on a mixed mode is characterized by comprising:
a data retrieval and preprocessing module, a theme extraction and text clustering module, a content-based government affair data recommending module, a collaborative filtering-based government affair data recommending module, a hybrid method-based government affair data recommending module, wherein,
the data retrieval and preprocessing module is used for performing natural language processing on the retrieved data in the specific industry field and converting the text vector into a word vector; sorting, updating and iterating the word segmentation data sets to obtain an optimal word segmentation result set;
the topic extraction and text clustering module is used for extracting topics from government affair data by adopting an LDA (latent dirichlet allocation) model to obtain a document-topic probability matrix and summarizing the topics by using the most relevant semantic words;
calculating an initial clustering center value of a K-means algorithm according to the document-theme probability matrix, further performing text clustering by using the K-means algorithm, setting a clustering number, an initial clustering center and iteration times, and realizing the clustering division of government affair data;
the content-based government affair data recommendation module finds the key concern theme of the target user, calculates based on the similarity of the average theme, and summarizes the government affair data to form a content-recommendation-based government affair data list; the government affair data recommending module based on collaborative filtering finds out the potential interest topic of the target user, counts other interest user sets in the field, recommends the government affair data concerned by the interest users to the target user, and forms a government affair data list based on collaborative filtering recommendation;
and the government affair data recommendation module based on the mixed method combines two recommendation methods based on content and collaborative filtering, and adopts a combination mode of weighting, mixing, feature combination and the like to display an optimal recommendation result.
2. The government affair data personalized recommendation system based on the mixed mode according to claim 1, wherein in the data retrieval and preprocessing module, natural language processing such as text word segmentation, stop word removal, word screening and the like is performed on the government affair data set by using a jiebaR package of an R language; and performing optimized sorting such as dictionary updating, multi-turn iteration and the like on the word segmentation result set.
3. The government affair data personalized recommendation system based on the mixed mode according to claim 1, wherein in the topic extraction and text clustering module, lda packages in R language are used for topic extraction of the experimental corpus; displaying the theme visualization result by using an LDAvis package, adjusting the number, alpha value and beta value of the themes, determining the optimal number of the themes by using multidimensional scale analysis, and judging the excellent degree of the theme model extraction result; and fusing the LDA model and a K-means algorithm, determining an initial clustering center on the dimensionality of K subjects according to a document-subject probability matrix extracted by the LDA, setting the clustering number and the iteration times, and realizing the clustering division of the patent text.
4. The personalized government affair data recommendation system based on the mixed mode according to claim 1, wherein in the content-based government affair data recommendation module, the key topics of interest with the largest number of browsed and downloaded by the target user are counted, and the government affair data sets of the key topics of interest are summarized; and summarizing the government affair data sets of the technical subject, calculating cosine similarity, and ranking to form a government affair data list with descending average cosine similarity.
5. The government affair data personalized recommendation system based on the mixed mode according to claim 1, wherein in the government affair data recommendation module based on the collaborative filtering, a potential interest topic of a target user is counted, other interest users under the potential interest topic are found, and a government affair data list with descending average cosine similarity is formed by cosine similarity calculation and ranking.
6. The personalized government affair data recommendation system based on the mixed mode as claimed in claim 1, wherein the government affair data recommendation module based on the mixed method combines the content-based and collaborative recommendation-based government affair data recommendation methods, so as to not only pay attention to the emphasis of the target user on the topic of focus but also give consideration to the requirements of the target user on the topic of potential interest, thereby forming the best recommendation result of the government affair data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111383044.3A CN114117215A (en) | 2021-11-22 | 2021-11-22 | Government affair data personalized recommendation system based on mixed mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111383044.3A CN114117215A (en) | 2021-11-22 | 2021-11-22 | Government affair data personalized recommendation system based on mixed mode |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114117215A true CN114117215A (en) | 2022-03-01 |
Family
ID=80439033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111383044.3A Pending CN114117215A (en) | 2021-11-22 | 2021-11-22 | Government affair data personalized recommendation system based on mixed mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114117215A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115098596A (en) * | 2022-05-25 | 2022-09-23 | 开普数智科技(广东)有限公司 | Government affair related data combing method, device and equipment and readable storage medium |
-
2021
- 2021-11-22 CN CN202111383044.3A patent/CN114117215A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115098596A (en) * | 2022-05-25 | 2022-09-23 | 开普数智科技(广东)有限公司 | Government affair related data combing method, device and equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Negara et al. | Topic modelling twitter data with latent dirichlet allocation method | |
CN110297988B (en) | Hot topic detection method based on weighted LDA and improved Single-Pass clustering algorithm | |
Shi et al. | Learning-to-rank for real-time high-precision hashtag recommendation for streaming news | |
CN111797214A (en) | FAQ database-based problem screening method and device, computer equipment and medium | |
CN113434636B (en) | Semantic-based approximate text searching method, semantic-based approximate text searching device, computer equipment and medium | |
CN111078852A (en) | College leading-edge scientific research team detection system based on machine learning | |
CN110532378B (en) | Short text aspect extraction method based on topic model | |
CN111125297B (en) | Massive offline text real-time recommendation method based on search engine | |
Jiang et al. | A unified neural network approach to e-commerce relevance learning | |
Henderi et al. | Unsupervised Learning Methods for Topic Extraction and Modeling in Large-scale Text Corpora using LSA and LDA | |
CN114117215A (en) | Government affair data personalized recommendation system based on mixed mode | |
CN111259110A (en) | College patent personalized recommendation system | |
Wawrzinek et al. | Semantic facettation in pharmaceutical collections using deep learning for active substance contextualization | |
CN106951511A (en) | A kind of Text Clustering Method and device | |
Parida et al. | Ranking of Odia text document relevant to user query using vector space model | |
Xu et al. | Research on Tibetan hot words, sensitive words tracking and public opinion classification | |
Mahdipour et al. | Automatic Persian text summarizer using simulated annealing and genetic algorithm | |
CN113934910A (en) | Automatic optimization and updating theme library construction method and hot event real-time updating method | |
Quemy | European court of human right open data project | |
Pu et al. | A semantic-based short-text fast clustering method on hotline records in Chengdu | |
CN111259150A (en) | Document representation method based on word frequency co-occurrence analysis | |
Ahmed et al. | Clustering technique on search engine dataset using data mining tool | |
TWI290684B (en) | Incremental thesaurus construction method | |
Edi | Topic modelling Twitter data with latent Dirichlet allocation method | |
Nikitinsky et al. | An information retrieval system for technology analysis and forecasting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |