CN113343118A - Hot event discovery method under mixed new media - Google Patents
Hot event discovery method under mixed new media Download PDFInfo
- Publication number
- CN113343118A CN113343118A CN202110444596.4A CN202110444596A CN113343118A CN 113343118 A CN113343118 A CN 113343118A CN 202110444596 A CN202110444596 A CN 202110444596A CN 113343118 A CN113343118 A CN 113343118A
- Authority
- CN
- China
- Prior art keywords
- event
- topic
- topics
- modeling
- news
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000001514 detection method Methods 0.000 claims abstract description 8
- 230000011218 segmentation Effects 0.000 claims abstract description 4
- 238000009826 distribution Methods 0.000 claims description 17
- 230000001537 neural effect Effects 0.000 claims description 17
- 230000008569 process Effects 0.000 claims description 13
- 239000013598 vector Substances 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000003058 natural language processing Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000009193 crawling Effects 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 3
- 238000012544 monitoring process Methods 0.000 abstract description 3
- 238000005065 mining Methods 0.000 abstract description 2
- 238000011160 research Methods 0.000 abstract description 2
- 230000008093 supporting effect Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for discovering hot events under a mixed new media, which comprises the following steps: firstly, performing word segmentation and slicing processing on online news portal website data in a specific time period, and discovering and mining various topic events based on a probabilistic topic model; then according to the information of the topic, the keywords, the named entity and the like of the event, searching and acquiring social information related to the event and user behavior relation data thereof from social network media; and finally, judging whether the event belongs to the hot event or not according to the report quantity of the event in the news portal website and the propagation scale of the event in the social network. The research result of the algorithm has an important supporting effect on the practical application in the aspects of network event retrieval, online public opinion monitoring, emergency detection, related safety decision and the like.
Description
Technical Field
The invention relates to a method for discovering social hot events in a mixed media environment, belonging to the technical field of internet monitoring.
Background
Currently, social networks (such as micro blogs, micro messages, and the like) are social new media which are most active, rich in content, and most widely influenced by users, and form a mixed online new media environment together with various online news portal networks. Some social events are known by people through news portal reports, and are transferred and fermented through various social media, so that netizens are fiercely discussed, network public opinion games are developed, and finally internet social hotspot events are formed.
The invention constructs a mixed new media environment by comprehensively considering the functional action and the interaction relation of the social new media and the news portal website in the Internet. On the basis, the topics of the events are found through mining a news portal website, news corpus data and social media data are obtained facing the events, and the social hotspot events are judged, so that people are helped to deeply understand and grasp the current situation and the future development trend of the social hotspot events in the network environment. The research result of the invention has important support effect on the practical application of network event retrieval, online public opinion monitoring, emergency detection, related safety decision and the like.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method provides a model which can effectively extract the potential topic information in the document and judge whether the topic information is a hot event or not.
In order to solve the technical problems, the technical scheme adopted by the invention is that after data is preprocessed, a document is vectorized and expressed and is subjected to modeling by a neural topic model, and then topics obtained by modeling are combined.
In order to achieve the purpose, the technical scheme of the invention is as follows: a method for discovering hot events under mixed new media comprehensively considers the functions and the relations of social new media and a news portal website in the Internet, constructs a mixed new media environment, obtains news corpus data and social media data facing social hot events, and discovers topics through the data of the mixed media, so that people are helped to deeply understand and grasp the current situation and the future development trend of the social hot events in a network environment, and the method comprises the following steps:
step 1) preprocessing the collected data by news data, including removing hypertext links, stop words, punctuation marks and digital useless information, and performing word segmentation by using a HanLP natural language processing tool;
step 2) dispersing the document to each time slice according to time sequence, wherein the time interval is 1 day, so that the subsequent evolution analysis processing is facilitated, and all events examine the document within 30 days of the occurrence of the event, namely 30 time slices;
step 3), vectorizing the text, and expressing the text by using a document pre-trained by BERT to improve the continuity of the topic;
step 4) topic modeling, namely performing topic modeling by using a neural topic model, wherein the input bag-of-words representation is replaced by context embedding;
step 5) modeling the topics obtained in the step 4), merging the topics,
step 6), after event detection of the news portal website is completed, the microblog content of each event in the social network and the user social relationship of each event need to be associated;
and 7) according to a certain judgment standard, judging that the event is a hot spot event when a certain threshold value is exceeded.
The division of the time slices in the step 2) has important influence on the evolution of the processing time in a period of time and the heat change rule thereof, and the time slices are fixed in 30 days in the invention and can be adaptively set according to the time length of the crawled news content.
The text vectorization in the step 3) replaces the bag-of-words representation of the input topic model with context embedding, namely, a neural coding layer of document representation pre-trained from a BERT language model is introduced before the topic modeling process. First, a dictionary of topic corpus is built by calling the BERT _ serving packet and a BERT word vector model is trained. And each document obtains a matrix formed by word vectors, and well matched data is stored so as to facilitate task processing of subsequent topic modeling.
And in the step 4), when topic modeling is performed, the vectorized text data in the step 3) is used as a context embedded model, the neural topic model used in the invention is a generation model based on a neural variational inference framework, is inspired by a variational automatic encoder, and selects Gaussian distribution generation parameters, wherein the Gaussian parameters can be obtained by linear calculation.
Step 5) after modeling the topics, merging the topics, setting a threshold value zeta to measure the distance between the two topics, and if the distance between the two topics is greater than the threshold value, judging the two topics to be the same topic and merging the topics; otherwise, the two topics are different, and the two topics do not need to be combined.
In the steps 6) and 7), the microblog platform provides rich topic classification and content label information, integrates the time, the named entity and the keyword information obtained in the event detection process, searches microblog content related to the event key information from the microblog, calculates the cosine distance between the event key information and the content, classification and label of the search result, detects the similarity relation between the event and the microblog, and establishes the event-news-microblog association relation. For the judgment of the hot event, the invention combines the social network attribute of the event, and calculates the heat value of the topic obtained in the step 5) by using a formula (1):
wherein N ise、SeAnd CeRespectively representing the number of news reports, the number of user forwarding and the number of comments of the event e, and N, S and C respectively representing the total number of corresponding indexes; α, β, γ respectively represent proportionality coefficients (e.g., 0.6, 0.2, 0.2) set according to the importance of the above factors, when the integrated calorific value (range is [0,1 ]]) And if the ratio of the report to the discussion of the event e exceeds 0.4 (namely, if the ratio of the report to the discussion of the event e exceeds 40 percent), the event e is judged to be the hot event.
Compared with the prior art, the invention has the following advantages:
1. the invention improves the modeling method of topics under the mixed media, comprehensively considers the functions and the relations of the new social media and the news portal website in the Internet, obtains news corpus data and social media data facing social hot events, and discovers the current hot topics through the data of the mixed media.
2. The NTM neural topic model is provided based on the variational automatic encoder framework, and because the encoder and the decoder in the variational automatic encoder can carry out combined training through back propagation, compared with the traditional probability model, the complexity of the mathematical derivation process during the training of the NTM model is lower, and the extension is easy.
3. The NTM model used by the invention receives the document representation after BERT training as input, the topic modeling part consists of an encoder and a decoder, the process of generating topics by the NTM is similar to the data reconstruction process, and the bag-of-words representation of the input topic model is replaced by context embedding, namely, before the topic modeling process, a neural coding layer of the document representation pre-trained by the BERT language model is introduced, so that the interpretability and the consistency of the topics are improved.
4. According to the method, news report data of the main news media in a period of time are crawled by a certain keyword, the evolution situation of news in a period of time can be tracked, the time slice of news evolution is divided in a self-adaptive mode, and the stage change of a hot event is judged by combination or not.
Drawings
Fig. 1 is a flowchart illustrating a hot event determination process according to the present invention.
FIG. 2 is a topic model diagram of the present invention.
Detailed Description
The invention is further illustrated by the following description of specific embodiments, which are intended to be illustrative only and not to be limiting of the scope of the invention, and various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the limits of the appended claims.
Example 1: referring to fig. 1 and 2, a method for discovering a hotspot event under a mixed new media includes the following steps:
step 1) preprocessing the collected data by news data, including removing hypertext links, stop words, punctuation marks and digital useless information, and performing word segmentation by using a HanLP natural language processing tool;
step 2) dispersing the document to each time slice according to time sequence, wherein the time interval is 1 day, so that the subsequent evolution analysis processing is facilitated, and all events examine the document within 30 days of the occurrence of the event, namely 30 time slices;
step 3), vectorizing the text, and expressing the text by using a document pre-trained by BERT to improve the continuity of the topic;
step 4) topic modeling, namely performing topic modeling by using a neural topic model, wherein the input bag-of-words representation is replaced by context embedding;
step 5) modeling the topics obtained in the step 4), merging the topics,
step 6), after event detection of the news portal website is completed, the microblog content of each event in the social network and the user social relationship of each event need to be associated;
and 7) according to a certain judgment standard, judging that the event is a hot spot event when a certain threshold value is exceeded.
The division of the time slices in the step 2) has important influence on the evolution of the processing time in a period of time and the heat change rule thereof, and the time slices are fixed in 30 days in the invention and can be adaptively set according to the time length of the crawled news content.
The text vectorization in the step 3) replaces the bag-of-words representation of the input topic model with context embedding, namely, a neural coding layer of document representation pre-trained from a BERT language model is introduced before the topic modeling process. First, a dictionary of topic corpus is built by calling the BERT _ serving packet and a BERT word vector model is trained. And each document obtains a matrix formed by word vectors, and well matched data is stored so as to facilitate task processing of subsequent topic modeling.
And in the step 4), when topic modeling is performed, the vectorized text data in the step 3) is used as a context embedded model, the neural topic model used in the invention is a generation model based on a neural variational inference framework, is inspired by a variational automatic encoder, and selects Gaussian distribution generation parameters, wherein the Gaussian parameters can be obtained by linear calculation.
Step 5) after modeling the topics, merging the topics, setting a threshold value zeta to measure the distance between the two topics, and if the distance between the two topics is greater than the threshold value, judging the two topics to be the same topic and merging the topics; otherwise, the two topics are different, and the two topics do not need to be combined.
In the steps 6) and 7), the microblog platform provides rich topic classification and content label information, integrates the time, the named entity and the keyword information obtained in the event detection process, searches microblog content related to the event key information from the microblog, calculates the cosine distance between the event key information and the content, classification and label of the search result, detects the similarity relation between the event and the microblog, and establishes the event-news-microblog association relation. For the judgment of the hot event, the invention combines the social network attribute of the event, and calculates the heat value of the topic obtained in the step 5) by using a formula (1):
wherein N ise、SeAnd CeRespectively representing the number of news reports, the number of user forwarding and the number of comments of the event e, and N, S and C respectively representing the total number of corresponding indexes; α, β, γ respectively represent proportionality coefficients (e.g., 0.6, 0.2, 0.2) set according to the importance of the above factors, when the integrated calorific value (range is [0,1 ]]) And if the ratio of the report to the discussion of the event e exceeds 0.4 (namely, if the ratio of the report to the discussion of the event e exceeds 40 percent), the event e is judged to be the hot event.
Application example 1: referring to fig. 2, the method for topic modeling of a document based on a neural topic model according to the present invention includes the following steps:
step 1. encoding procedure
Generating a Gaussian prior distribution theta for the document d by using an encoder:
1) a document representation s is obtained after the BERT processing.
s=BERT(d) (1)
2) The document representation s is projected towards the hidden layer, which is concatenated with the bag of words representation BoW of document d.
h=[s,BoW] (2)
3) Mu and log sigma, which are hyper-parameters set by the present invention for computing gaussian unit variance, are obtained by two independent multi-layer feed-forward neural networks. Wherein f (-) denotes a neural perceptron with a ReLU activation function, weight W1,W2And deviation b1,b2Are learnable parameters that are shared between different inputs.
μ=W1f(h)+b1 (3)
logσ=W2f(h)+b2 (4)
4) Selecting hidden variables z-N (mu, sigma)2) Wherein N (μ, σ)2) In a multidimensional Gaussian distribution, the z component is a Gaussian distribution random variable which is independent of each other. The hidden variable z can be expressed as:
where epsilon can be considered as an auxiliary noise variable. ε may be sampled from the normal distribution N (0, I).
Step 2. decoding procedure
Assume that there are K topics in a given corpus C, each topic K being distributed by a topic vocabulary(k) And each document d in the C corresponds to a topic set represented by a variable theta, wherein theta is a K-dimensional distribution vector and is constructed by Gaussian softmax. Therefore, the decoder takes the following steps to simulate the way each document d is generated:
1) deriving a Gaussian prior distribution θ from an implicit variable z, where wθAre variables that can be trained.
θ=softmax(wθz) (6)
2) Deducing each vocabulary w in the document d from the variable theta, where fφDistribution of words to topics(k) Weight matrix of
In summary, based on the lower bound of variation, the objective function of the NTM model defined by the present invention is:
LNTM=Eq(z|d)[p(d|z)]-DKL[q(z|d)||p(z|μ,σ)] (8)
the first term in equation (8) is the reconstruction loss, the second term is the Kullback-Leibler divergence loss, and p (z | μ, σ) represents the standard normal prior. q (z | d) and p (z | μ, σ) denote an encoding process and a decoding process, respectively.
To achieve back propagation during model training, a re-parameterization technique is used, as shown in equation (5), by sampling the noise ε from the normal distribution N (0, I), to obtain θ. To calculate LNTMThe gradient of the model adopts the Adam algorithm as a gradient descent algorithm.
Step 3. merging of the same topics
The method for calculating the distribution distance is usually adopted for identifying the same topic, because the topics obtained after modeling are distributed on the same dimension, and because the distribution distances among different topics are determined and are not related to the sequence of the topics, the similarity among the topics can be measured through the symmetrical Kullback-Leibler distance.
Let wiIs the probability distribution of the ith word in a topic,is the topic vocabulary distribution of the kth topic, then the topic k1And k2The KL distance of (a) can be calculated by equation (10):
while the symmetric KL distance can be further calculated using the KL distance:
as can be seen from equations (9) and (10), the smaller the KL distance between two topics, the closer the KL distance to 0, the closer the two probability distributions are, and the higher the similarity between the two topics. If the KL distance between two topics is larger, the probability distribution difference of the two topics is larger. A threshold value ζ is set, and if the KL distance between two topics is greater than the threshold value, the two topics are determined to be the same topic, and the topics need to be merged. Otherwise, the two topics are different, and the two topics do not need to be combined.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the basis of the above-mentioned technical solutions belong to the scope of the present invention.
Claims (6)
1. A hot event discovery method under a mixed new media is characterized by comprising the following steps:
step 1) preprocessing the collected data by news data, including removing hypertext links, stop words, punctuation marks and digital useless information, and performing word segmentation by using a HanLP natural language processing tool;
step 2) dispersing the document to each time slice according to time sequence, wherein the time interval is 1 day, so that the subsequent evolution analysis processing is facilitated, and all events examine the document within 30 days of the occurrence of the event, namely 30 time slices;
step 3), vectorizing the text, and expressing the text by using a document pre-trained by BERT to improve the continuity of the topic;
step 4) topic modeling, namely performing topic modeling by using a neural topic model, wherein the input bag-of-words representation is replaced by context embedding;
step 5) modeling the topics obtained in the step 4), and then merging the topics;
step 6), after event detection of the news portal website is completed, associating microblog content of each event in the social network and the social relation of the event to a user;
and 7) calculating the heat value of the topic, and judging that the topic is a hot event when the heat value exceeds a certain threshold value.
2. The method for discovering social hotspot events in the mixed media environment according to claim 1, wherein the time slice division in the step 2) has an important influence on the evolution of the processing time in a period of time and the change rule of the heat degree, and can be fixed in 30 days in the invention or can be adaptively set according to the time length of crawling news contents.
3. The method for discovering social hotspot events in a mixed media environment as claimed in claim 1, wherein in the step 3), text vectorization replaces bag-of-word representation of the input topic model with context embedding, that is, before the topic modeling process, a neural coding layer represented by a document pre-trained from a BERT language model is introduced, and first, a dictionary of a self-constructed topic corpus is called by a BERT _ serving packet and a BERT word vector model is trained, each document obtains a matrix formed by word vectors, and the well-matched data is stored for task processing of subsequent topic modeling.
4. The method for discovering social hotspot events in the mixed media environment according to claim 1, wherein during topic modeling in the step 4), the vectorized text data in the step 3) is used as a context embedding model, the neural topic model used in the invention is a generation model based on a neural variation inference framework, is inspired by a variation automatic encoder, and selects Gaussian distribution generation parameters, wherein the Gaussian parameters can be obtained by linear computation.
5. The method for discovering social hotspot events in the mixed media environment according to claim 1, wherein in the step 5), after modeling the topics, merging the topics is required, a threshold value ζ is set to measure the distance between the two topics, and if the distance between the two topics is greater than the threshold value, the two topics are determined as the same topic and the topics are required to be merged; otherwise, the two topics are different, and the two topics do not need to be combined.
6. The method for discovering social hotspot events in the mixed media environment according to claim 1, wherein in steps 6) and 7), the microblog platform provides rich topic classification and content tag information, integrates the time, named entity and keyword information obtained in the event detection process, searches microblog content related to the event key information from the microblog, then calculates the cosine distance between the event key information and the content, classification and tag of the search result to detect the similarity between the event and the microblog, establishes the event-news-microblog association relationship, and for the discrimination of the hotspot events, the social network attribute of the event is combined, and the heat value of the topic obtained in step 5) is calculated by using a formula (1):
wherein N ise、SeAnd CeRespectively representing the number of news reports, the number of user forwarding and the number of comments of the event e, and N, S and C respectively representing the total number of corresponding indexes; α, β, γ respectively represent proportionality coefficients (e.g., 0.6, 0.2, 0.2) set according to the importance of the above factors, when the integrated calorific value (range is [0,1 ]]) And if the ratio of the report to the discussion of the event e exceeds 0.4 (namely, if the ratio of the report to the discussion of the event e exceeds 40 percent), the event e is judged to be the hot event.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110444596.4A CN113343118A (en) | 2021-04-23 | 2021-04-23 | Hot event discovery method under mixed new media |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110444596.4A CN113343118A (en) | 2021-04-23 | 2021-04-23 | Hot event discovery method under mixed new media |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113343118A true CN113343118A (en) | 2021-09-03 |
Family
ID=77468472
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110444596.4A Pending CN113343118A (en) | 2021-04-23 | 2021-04-23 | Hot event discovery method under mixed new media |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113343118A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113822069A (en) * | 2021-09-17 | 2021-12-21 | 国家计算机网络与信息安全管理中心 | Emergency early warning method and device based on meta-knowledge and electronic device |
TWI825535B (en) * | 2021-12-22 | 2023-12-11 | 中華電信股份有限公司 | System, method and computer-readable medium for formulating potential hot word degree |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120079020A1 (en) * | 2010-09-27 | 2012-03-29 | Korea Institute Of Science And Technology | Highlight providing system and method based on hot topic event detection |
CN107203513A (en) * | 2017-06-06 | 2017-09-26 | 中国人民解放军国防科学技术大学 | Microblogging text data fine granularity topic evolution analysis method based on probabilistic model |
CN107644089A (en) * | 2017-09-26 | 2018-01-30 | 武大吉奥信息技术有限公司 | A kind of hot ticket extracting method based on the network media |
CN111324801A (en) * | 2020-02-17 | 2020-06-23 | 昆明理工大学 | Hot event discovery method in judicial field based on hot words |
-
2021
- 2021-04-23 CN CN202110444596.4A patent/CN113343118A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120079020A1 (en) * | 2010-09-27 | 2012-03-29 | Korea Institute Of Science And Technology | Highlight providing system and method based on hot topic event detection |
CN107203513A (en) * | 2017-06-06 | 2017-09-26 | 中国人民解放军国防科学技术大学 | Microblogging text data fine granularity topic evolution analysis method based on probabilistic model |
CN107644089A (en) * | 2017-09-26 | 2018-01-30 | 武大吉奥信息技术有限公司 | A kind of hot ticket extracting method based on the network media |
CN111324801A (en) * | 2020-02-17 | 2020-06-23 | 昆明理工大学 | Hot event discovery method in judicial field based on hot words |
Non-Patent Citations (2)
Title |
---|
JUNCHI ZHANG 等: "Topic-informed neural approach for biomedical event extraction", ARTIFICIAL INTELLIGENCE IN MEDICINE, 31 December 2020 (2020-12-31), pages 1 - 9 * |
张洪宽 等: "基于BERT的端到端中文篇章事件抽取", 中国计算语言学大会, 1 November 2020 (2020-11-01), pages 1 - 12 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113822069A (en) * | 2021-09-17 | 2021-12-21 | 国家计算机网络与信息安全管理中心 | Emergency early warning method and device based on meta-knowledge and electronic device |
CN113822069B (en) * | 2021-09-17 | 2024-03-12 | 国家计算机网络与信息安全管理中心 | Sudden event early warning method and device based on meta-knowledge and electronic device |
TWI825535B (en) * | 2021-12-22 | 2023-12-11 | 中華電信股份有限公司 | System, method and computer-readable medium for formulating potential hot word degree |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Poornima et al. | A comparative sentiment analysis of sentence embedding using machine learning techniques | |
CN109871451B (en) | Method and system for extracting relation of dynamic word vectors | |
CN110990564B (en) | Negative news identification method based on emotion calculation and multi-head attention mechanism | |
Sivakumar et al. | Review on word2vec word embedding neural net | |
CN111611809B (en) | Chinese sentence similarity calculation method based on neural network | |
CN110321563B (en) | Text emotion analysis method based on hybrid supervision model | |
JP3682529B2 (en) | Summary automatic evaluation processing apparatus, summary automatic evaluation processing program, and summary automatic evaluation processing method | |
CN110781679B (en) | News event keyword mining method based on associated semantic chain network | |
CN110909529B (en) | User emotion analysis and prejudgment system of company image promotion system | |
Mary et al. | Sentimental Analysis of Twitter Data using Machine Learning Algorithms | |
CN113343118A (en) | Hot event discovery method under mixed new media | |
CN112364161A (en) | Microblog theme mining method based on dynamic behaviors of heterogeneous social media users | |
CN117094291B (en) | Automatic news generation system based on intelligent writing | |
CN115017887A (en) | Chinese rumor detection method based on graph convolution | |
CN116756303A (en) | Automatic generation method and system for multi-topic text abstract | |
CN111046171B (en) | Emotion discrimination method based on fine-grained labeled data | |
CN116992886A (en) | BERT-based hot news event context generation method and device | |
CN113449508B (en) | Internet public opinion correlation deduction prediction analysis method based on event chain | |
WO2024087754A1 (en) | Multi-dimensional comprehensive text identification method | |
Hu et al. | TDRLM: Stylometric learning for authorship verification by Topic-Debiasing | |
CN113051886B (en) | Test question duplicate checking method, device, storage medium and equipment | |
CN115495671A (en) | Cross-domain rumor propagation control method based on graph structure migration | |
Alorini et al. | Machine learning enabled sentiment index estimation using social media big data | |
CN110489741B (en) | Microblog burst topic detection method based on burst word detection and filtering | |
Brown et al. | Simple and efficient identification of personally identifiable information on a public website |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |