CN111324725A - Topic acquisition method, terminal and computer readable storage medium - Google Patents

Topic acquisition method, terminal and computer readable storage medium Download PDF

Info

Publication number
CN111324725A
CN111324725A CN202010096076.4A CN202010096076A CN111324725A CN 111324725 A CN111324725 A CN 111324725A CN 202010096076 A CN202010096076 A CN 202010096076A CN 111324725 A CN111324725 A CN 111324725A
Authority
CN
China
Prior art keywords
topic
event element
words
similarity
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010096076.4A
Other languages
Chinese (zh)
Other versions
CN111324725B (en
Inventor
余正涛
彭仁杰
高盛祥
陈玮
毛存礼
朱恩昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202010096076.4A priority Critical patent/CN111324725B/en
Publication of CN111324725A publication Critical patent/CN111324725A/en
Application granted granted Critical
Publication of CN111324725B publication Critical patent/CN111324725B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a topic acquisition method, a terminal and a computer readable storage medium, wherein the method comprises the following steps: inputting a target text; obtaining a first topic set of the target text according to a preset topic model, wherein the first topic set comprises at least one topic word; analyzing the target text to obtain a first event element set of the target text, wherein the first event element set at least comprises one event element, and the event element refers to event information corresponding to the target text; obtaining a second topic set meeting topic correlation conditions according to the first topic set and the first event element set; calculating the relevance of the words in the second topic set and the target file, and calculating the similarity of the first event element set and the words in the target file; and optimizing the second topic set according to the correlation and the similarity to obtain a target topic set. The relevance between the topic and the event can be improved.

Description

Topic acquisition method, terminal and computer readable storage medium
Technical Field
The invention relates to a computer technology, in particular to a topic acquisition method, a terminal and a computer readable storage medium.
Background
With the development and accumulation of networks, the generation, transmission and consumption of contents have been deeply integrated into the lives of people, the analysis and processing of contents gradually enter the visual field of people, and the analysis of texts by using natural language processing, machine learning methods and the like can provide the user with help in aspects such as public opinion analysis, data marketing and the like. The topic discovery is to quickly and effectively mine the contents of events that people focus on by analyzing and discovering the topics existing in the text data, and has gradually become a popular research direction.
In the topic discovery method, topics can be sampled by using information of the whole corpus in the text, so that topic distribution on the whole corpus is obtained, and the problem of sparsity of the text can be solved well, for example, the above method is usually adopted for topic acquisition based on a word pair topic Model (BTM). However, topic sampling is performed on the whole text, so that the obtained topics are relatively divergent, and a large limitation exists in the correlation between the topics and the events.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a topic acquisition method, a terminal, and a computer-readable storage medium, which can improve the correlation between topics and events.
In a first aspect, an embodiment of the present invention provides a topic acquisition method, including:
inputting a target text;
obtaining a first topic set of the target text according to a preset topic model, wherein the first topic set comprises at least one topic word;
analyzing the target text to obtain a first event element set of the target text, wherein the first event element set at least comprises one event element, and the event element refers to event information corresponding to the target text;
obtaining a second topic set meeting topic correlation conditions according to the first topic set and the first event element set;
calculating the relevance of the words in the second topic set and the target file, and calculating the similarity of the first event element set and the words in the target file;
and optimizing the second topic set according to the correlation and the similarity to obtain a target topic set.
Wherein the obtaining a second topic set meeting topic correlation conditions according to the first topic set and the first event element set comprises:
and calculating to obtain semantic similarity according to the first topic set and the first event element set, and filtering to obtain a second topic set meeting topic correlation conditions according to the semantic similarity.
The calculating according to the first topic set and the first event element set to obtain semantic similarity, and filtering according to the semantic similarity to obtain a second topic set meeting topic correlation conditions, including:
embedding the first topic set and the first event element set into a vector space for semantic representation;
calculating semantic similarity corresponding to each topic word in the first topic set through the semantic representation;
and selecting the topic words with the semantic similarity meeting the topic correlation condition from the first topic set as a second topic set.
Wherein the calculating the relevance of the words in the second topic set and the target document and the calculating the similarity of the first event element set and the words in the target document comprises:
calculating the relevance of the words in the second topic set and the target file according to mutual information;
embedding the words in the target file and the first event element set into word-level vector space representation by using word embedding so as to calculate the similarity between the first event element set and the words in the target file.
Wherein, the optimizing the second topic set according to the relevance and the similarity to obtain a target topic set includes:
presetting a first weight value of the correlation and a second weight value of the similarity;
weighting the correlation and the similarity according to the first weight value and the second weight value to obtain the weight value corresponding to each topic word in the second topic set;
and sequencing the weighted values in a descending order, and selecting topic words corresponding to the first N weighted values as a target topic set according to a sequencing result.
The preset topic model comprises a word pair topic model BTM.
In a second aspect, the embodiment of the present invention also provides a topic acquisition apparatus, which includes a module for executing the method of the first aspect.
In a third aspect, an embodiment of the present invention provides another terminal, which includes a processor, a communication interface, a display screen, and a memory, where the processor, the communication interface, the display screen, and the memory are connected to each other, where the memory is used to store a computer program that supports the terminal to execute the foregoing method, the computer program includes program instructions, and the processor is configured to call the program instructions to execute the foregoing method according to the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions, which, when executed by a processor, cause the processor to perform the method of the first aspect.
In a fifth aspect, an embodiment of the present invention provides a computer program product, which when run on a computer, causes the computer to perform the method of the first aspect.
The embodiment of the invention has the following beneficial effects: the method comprises the steps of obtaining a first topic set of a target text according to a preset topic model by inputting the target text, and analyzing the target text to obtain a first event element set of the target text, wherein the first event element set at least comprises an event element, and the event element refers to event information corresponding to the target text; obtaining a second topic set meeting topic correlation conditions according to the first topic set and the first event element set; calculating the relevance of the words in the second topic set and the target file, and calculating the similarity of the words in the first event element set and the target file; and optimizing the second topic set according to the relevance and the similarity to obtain a target topic set, and improving the relevance between topics and events by performing topic filtering and topic optimization on a target text, thereby effectively improving the quality of the obtained topics.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a topic acquisition method provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of a BTM model according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of another topic acquisition method provided by the embodiment of the invention;
fig. 4 is a schematic view of a scenario for topic retrieval according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a topic acquisition device provided by an embodiment of the invention;
fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described below with reference to the accompanying drawings. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the execution subject of the embodiment of the present invention may be various types of terminals, and the terminal may be, for example, a computer, a smart phone, a tablet computer, a wearable Device, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and other terminals capable of performing text processing, which is not limited in this respect.
Referring to fig. 1, which is a schematic flow chart of a topic acquisition method according to an embodiment of the present invention, the topic acquisition method shown in fig. 1 may include:
and S101, inputting a target text.
In a possible implementation manner, the target text may be a short text in social software, such as a microblog text, a WeChat friend circle text, or the like, or may also be a text in a blog, a log, or the like, which is not limited in this application.
In some possible embodiments, the target text may include at least one text, that is, two or more texts may be input at the same time to perform steps S102 to S106.
S102, obtaining a first topic set of the target text according to a preset topic model.
In some possible implementations, the preset topic model may include a word-pair topic model BTM.
For example, please refer to fig. 2, which is a schematic view of a BTM model according to an embodiment of the present invention, wherein z represents a topic-word distribution, wi, wj represent two words in a word, and | B | represents a number of words in a corpus.
The generation of the first topic set of the target text from the BTM model can be described as follows:
for each topic K ∈ {1,2, …, K }, a word-topic distribution β is generated.
For each text M ∈ {1,2, …, M }, a topic-text distribution is generated.
For each of the words, b ═ b, (bi, bj) assuming that a word is represented by b:
one topic Z, Z muli () is extracted from the topic distribution at corpus level.
From the extracted topic Z, 2 words bi, bj are extracted simultaneously, each resulting from a separate topic, bi, bj Muil () subject to the basic assumption.
The BTM topic model can model all words in the target text, and topics of the text cannot be directly derived, so assuming that the probability of topics in the text is equal to the topic expectation of the words generated in the text, the topics can be finally expressed as:
Figure BDA0002385316890000051
by the method, a first topic set corresponding to the target text can be obtained, and the first topic set can comprise at least one topic word.
S103, analyzing the target text to obtain a first event element set of the target text.
The first time element combination includes at least one event element, and the event element refers to event information corresponding to the target text.
For example, the event may be a trending event in the society, and the event information may include the time, place, people, process, rating, etc. of the event.
In some possible embodiments, words in the target text may be matched with recent trending events to obtain trending events corresponding to the target text, and then event elements of the trending events may be searched to obtain an event element set, that is, a first event element set that may be the target text.
And S104, obtaining a second topic set meeting topic correlation conditions according to the first topic set and the first event element set.
It should be noted that after the first topic set and the first event element set are obtained through the target text, the first topic set may be filtered through the first event set, so as to obtain a second topic set with a higher degree of correlation with the event.
In some possible embodiments, obtaining a second topic set satisfying the topic correlation condition according to the first topic set and the first event element set includes: and calculating semantic similarity according to the first topic set and the first event element set, and filtering according to the semantic similarity to obtain a second topic set meeting topic correlation conditions.
In some possible embodiments, calculating a semantic similarity according to the first topic set and the first event element set, and filtering a second topic set meeting topic correlation conditions according to the semantic similarity, includes: embedding the first topic set and the first event element set into a vector space for semantic representation; calculating semantic similarity corresponding to each topic word in the first topic set through the semantic representation; and selecting the topic words with the semantic similarity meeting the topic correlation condition from the first topic set as a second topic set.
The vector space may be, for example, a word-level vector space. The first topic set and the first event element set can be embedded into a word-level vector space by a word embedding method for semantic representation, the semantic similarity of the first topic set and the first event element set is calculated through the semantic representation, topics irrelevant to an event can be filtered according to the semantic similarity, and the topic set with high relevance to the event is obtained and used as a second topic set.
The topic correlation condition may be that the semantic similarity is greater than a preset threshold. For example, topic words with semantic similarity meeting the topic correlation condition are selected from the first topic set as the second topic set, and topics with semantic similarity larger than a preset threshold value are selected from the first topic set as the second topic set.
S105, calculating the relevance of the words in the second topic set and the target file, and calculating the similarity of the first event element set and the words in the target file.
It should be noted that, after the second topic set is obtained, the second topic set may be optimized, so that the topic set can well correlate and characterize the event.
And S106, optimizing the second topic set according to the correlation and the similarity to obtain a target topic set.
Specifically, the relevance of the extracted second topic set and the words in the target file may be calculated first, and then the similarity between the first event element set and the words in the high target file may be calculated at the same time. And optimizing the topics in the second topic set by utilizing the relevance and the similarity to obtain a target topic set.
Therefore, according to the embodiment of the invention, the event-related topics can be screened from the first topic set extracted from the preset topic model by combining the event elements, and the topic words related to the event are optimized, so that the topic words with higher correlation degree with the event are obtained, the event can be well represented, and the quality of the topic words is improved. Please refer to fig. 3, which is a flowchart illustrating another topic obtaining method according to an embodiment of the present invention. The topic acquisition method shown in fig. 3 may include:
s301, inputting a target text.
S302, obtaining a first topic set of the target text according to a preset topic model, wherein the first topic set comprises at least one topic word.
S303, analyzing the target text to obtain a first event element set of the target text.
The first event element set at least comprises one event element, and the event element refers to event information corresponding to the target text.
S304, obtaining a second topic set meeting topic correlation conditions according to the first topic set and the first event element set.
It should be noted that, the steps shown in S301 to S304 in the embodiment of the present invention may refer to the steps corresponding to S101 to S104 in the foregoing embodiment, which are not described herein again.
S305, calculating the relevance of the words in the second topic set and the target file according to mutual information.
For example, the calculating of the correlation between the second topic set and the words in the target file according to the mutual information may be to calculate the correlation between the second topic set and the words in the target file by using a KL distance (Kullback-Leibler divergence).
S306, embedding the words in the target file and the first event element set into word-level vector space representation by word embedding, so as to calculate the similarity between the first event element set and the words in the target file.
And S307, presetting a first weight value of the correlation and a second weight value of the similarity.
For example, a weight value corresponding to the correlation and a weight value corresponding to the similarity may be preset according to the empirical parameters. In the debugging process, the first weight value and the second weight value can be changed, so that the finally selected target topic set is higher in association degree with the event.
S308, weighting the correlation and the similarity according to the first weight value and the second weight value to obtain the weight value corresponding to each topic word in the second topic set.
For example, the first weight value is 0.4, the second weight value is 0.6, the correlation is expressed as s, the similarity is expressed as t, the process of the weighting processing can be expressed as 0.4 × s +0.6 × t, and the calculated result can be the weight value corresponding to each topic word in the second topic set.
S309, sorting the weighted values in descending order, and selecting topic words corresponding to the first N weighted values as a target topic set according to sorting results.
For example, if the weighted value of the topic word a is 3, the weighted value of the topic word b is 4, the weighted value of the topic word c is 5, the weighted value of the topic word d is 6, and if N is 2, the weighted values are sorted in descending order, and the topic words are topic words d, topic words c, topic words b, and topic words a, and if the top 2 topic words are selected, the topic word d is selected, and the topic word c is used as the target topic set.
Please refer to fig. 4, which is a schematic view of a scenario for topic retrieval according to an embodiment of the present invention. For better illustration, the target text shown in fig. 4 selects the microblog text as the processing object, but it should be understood that in other embodiments, the target text may also select other texts, and the embodiment of the present invention does not limit this.
The topic acquisition shown in fig. 4 is divided into two parts, namely event related topic filtering and optimization of words of the event related topic. Event related topic filtering can include: the method comprises the steps of inputting a microblog text, obtaining a topic D of the text through BTM topic model training, analyzing an event text, and obtaining event element information C, wherein Ti represents the topic in the text, Ci represents elements forming an event, m represents the length of an event description, and E are a set of event elements. And embedding the topics and the elements into word-level vector space representation through word embedding, and calculating semantic similarity of the topics and the elements through semantic representation to filter the topics irrelevant to the event to obtain the topics relevant to the event. The optimization of the event topic terms may include: and calculating the correlation between the topic and the words in the document by using a mutual information method through a KL distance, embedding the words and elements in the document into a vector space representation of a word level by using word embedding, calculating the similarity between the words and the event elements in the document, and finally selecting the first k words as event topic words by weighting the words and the event elements.
Therefore, according to the embodiment of the invention, the target text is input, the first topic set of the target text is obtained according to the preset topic model, the first event element set of the target text is obtained through analysis, the second topic set with higher topic correlation degree is obtained by utilizing the first topic set and the first event element set, then the correlation between the second topic set and the words in the target file and the similarity between the first event element set and the words in the target file are obtained through calculation by a mutual information and word embedding method, finally the target topic set is obtained through the correlation and similarity weighting, the noise topics are filtered, the correlation degree between the obtained topics and the events is higher, the event topics can be better expressed, and the topic obtaining quality is improved.
The following are embodiments of the apparatus of the present invention, which are used to implement the method of the first embodiment and the method of the second embodiment of the present invention, and for convenience of description, only relevant portions of the embodiments of the present invention, and specifically, portions not disclosed, are shown.
Please refer to fig. 5, which is a schematic structural diagram of a topic acquisition apparatus according to an embodiment of the present invention. The apparatus shown in fig. 5 may include:
an input module 501 is used for inputting a target text.
An obtaining module 502, configured to obtain a first topic set of the target text according to a preset topic model, where the first topic set includes at least one topic word.
The obtaining module 502 is further configured to analyze the target text to obtain a first event element set of the target text, where the first event element set at least includes one event element, and the event element refers to event information corresponding to the target text;
the obtaining module 502 is further configured to obtain a second topic set meeting topic correlation conditions according to the first topic set and the first event element set;
a calculating module 503, configured to calculate a correlation between the second topic set and the terms in the target file, and calculate a similarity between the first event element set and the terms in the target file;
and an optimizing module 504, configured to perform optimization processing on the second topic set according to the correlation and the similarity to obtain a target topic set.
In a possible implementation manner, the obtaining module 502 is specifically configured to calculate a semantic similarity according to the first topic set and the first event element set, and filter according to the semantic similarity to obtain a second topic set meeting topic correlation conditions.
In a possible implementation, the obtaining module 502 is specifically configured to embed the first topic set and the first event element set into a vector space for semantic representation; calculating semantic similarity corresponding to each topic word in the first topic set through the semantic representation; and selecting the topic words with the semantic similarity meeting the topic correlation condition from the first topic set as a second topic set.
In a possible implementation manner, the calculating module 503 is specifically configured to calculate, according to mutual information, a relevance between the second topic set and a term in the target file; embedding the words in the target file and the first event element set into word-level vector space representation by using word embedding so as to calculate the similarity between the first event element set and the words in the target file.
In a possible implementation manner, the optimization module 504 is specifically configured to preset a first weight value of the correlation and a second weight value of the similarity; weighting the correlation and the similarity according to the first weight value and the second weight value to obtain a weight value corresponding to each topic word in the second topic set; and sequencing the weighted values in a descending order, and selecting topic words corresponding to the first N weighted values as a target topic set according to a sequencing result.
In one possible implementation, the preset topic model includes a word pair topic model BTM.
Therefore, according to the embodiment of the invention, the topic acquisition device can be used for carrying out screening of the topic related to the event and optimization of the topic words related to the event on the first topic set extracted from the preset topic model by combining the event elements, so that the topic words with higher degree of correlation with the event are obtained, the event can be well represented, and the quality of the topic words is improved.
Fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
As shown in fig. 6, the topic acquisition device in the embodiment of the present invention includes: at least one input device 1000; at least one processor 2000, such as a CPU; at least one memory 3000; at least one output device 4000, the input device 1000, the processor 2000, the memory 3000, and the output device 4000 being connected through a bus. Wherein the bus is used for enabling connection communication between these components. The input device 1000 and the output device 4000 of the apparatus in the embodiment of the present invention may be wired transmission ports, or may also be wireless devices, for example, including an antenna apparatus, configured to perform signaling or data communication with other node devices.
The processor 2000 may be a Central Processing Unit (CPU) 2000, a network processor 2000 (NP), or a combination of CPU and NP.
The processor 2000 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.
The memory 3000 may include a volatile memory 3000(volatile memory), such as a random-access memory 3000 (RAM); the memory 3000 may also include a non-volatile memory 3000(non-volatile memory), such as a flash memory 3000(flash memory), a Hard Disk Drive (HDD) or a solid-state drive (SSD); memory 3000703 may also include a combination of memories 3000 of the sort described above.
Optionally, the memory 3000 is also used for storing program instructions. The processor 2000 may call the program instructions stored in the memory 3000 to implement the methods according to the first and second embodiments of the present invention.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
Specifically, the processor 2000 is configured to input a target text; obtaining a first topic set of the target text according to a preset topic model, wherein the first topic set comprises at least one topic word; analyzing the target text to obtain a first event element set of the target text, wherein the first event element set at least comprises one event element, and the event element refers to event information corresponding to the target text; obtaining a second topic set meeting topic correlation conditions according to the first topic set and the first event element set; calculating the relevance of the words in the second topic set and the target file, and calculating the similarity of the first event element set and the words in the target file; and optimizing the second topic set according to the correlation and the similarity to obtain a target topic set.
In the embodiments shown in fig. 1 to fig. 4, the method flows of the steps may be implemented based on the structure of the terminal.
In the embodiment shown in fig. 5, the functions of the modules may be implemented based on the structure of the terminal.
An embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, and the computer program makes a computer execute part or all of the steps of any one of the data transmission methods described in the above method embodiments.
Embodiments of the present invention also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the data transmission methods as recited in the above method embodiments. The computer program product may be a software installation package.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the modules and units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some interfaces, and may be in an electrical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a form of hardware or a form of a software program module.
The integrated modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above embodiments of the present invention are described in detail, and the principle and the implementation of the present invention are explained by applying specific embodiments, and the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A topic acquisition method, comprising:
inputting a target text;
obtaining a first topic set of the target text according to a preset topic model, wherein the first topic set comprises at least one topic word;
analyzing the target text to obtain a first event element set of the target text, wherein the first event element set at least comprises one event element, and the event element refers to event information corresponding to the target text;
obtaining a second topic set meeting topic correlation conditions according to the first topic set and the first event element set;
calculating the relevance of the words in the second topic set and the target file, and calculating the similarity of the first event element set and the words in the target file;
and optimizing the second topic set according to the correlation and the similarity to obtain a target topic set.
2. The topic acquisition method as claimed in claim 1 wherein the deriving a second topic set satisfying a topic correlation condition from the first topic set and the first event element set comprises:
and calculating to obtain semantic similarity according to the first topic set and the first event element set, and filtering to obtain a second topic set meeting topic correlation conditions according to the semantic similarity.
3. The topic acquisition method as claimed in claim 2, wherein the calculating a semantic similarity according to the first topic set and the first event element set, and filtering according to the semantic similarity to obtain a second topic set satisfying topic correlation conditions comprises:
embedding the first topic set and the first event element set into a vector space for semantic representation;
calculating semantic similarity corresponding to each topic word in the first topic set through the semantic representation;
and selecting the topic words with the semantic similarity meeting the topic correlation condition from the first topic set as a second topic set.
4. The topic acquisition method of any one of claims 1 to 3, wherein the calculating the relevance of the words in the second topic collection and the target document and the calculating the similarity of the first set of event elements and the words in the target document comprises:
calculating the relevance of the words in the second topic set and the target file according to mutual information;
embedding the words in the target file and the first event element set into word-level vector space representation by using word embedding so as to calculate the similarity between the first event element set and the words in the target file.
5. The topic acquisition method of claim 1, wherein the optimizing the second topic set according to the relevance and the similarity to obtain a target topic set comprises:
presetting a first weight value of the correlation and a second weight value of the similarity;
weighting the correlation and the similarity according to the first weight value and the second weight value to obtain a weight value corresponding to each topic word in the second topic set;
and sequencing the weighted values in a descending order, and selecting topic words corresponding to the first N weighted values as a target topic set according to a sequencing result.
6. The topic acquisition method of claim 1 wherein the preset topic model comprises a word-pair topic model (BTM).
7. A topic acquisition apparatus characterized by comprising means for performing the method of any one of claims 1 to 6.
8. A terminal, characterized in that it comprises a processor, a communication interface, a display screen and a memory, which are interconnected, wherein the memory is used to store a computer program comprising program instructions, and the processor is configured to invoke the program instructions to execute the method according to any one of claims 1-6.
9. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method according to any of claims 1-6.
10. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-6.
CN202010096076.4A 2020-02-17 2020-02-17 Topic acquisition method, terminal and computer readable storage medium Active CN111324725B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010096076.4A CN111324725B (en) 2020-02-17 2020-02-17 Topic acquisition method, terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010096076.4A CN111324725B (en) 2020-02-17 2020-02-17 Topic acquisition method, terminal and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111324725A true CN111324725A (en) 2020-06-23
CN111324725B CN111324725B (en) 2023-05-16

Family

ID=71163500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010096076.4A Active CN111324725B (en) 2020-02-17 2020-02-17 Topic acquisition method, terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111324725B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113590774A (en) * 2021-06-22 2021-11-02 北京百度网讯科技有限公司 Event query method, device and storage medium
CN114357278A (en) * 2020-09-28 2022-04-15 腾讯科技(深圳)有限公司 Topic recommendation method, device and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002108917A (en) * 2000-10-02 2002-04-12 Nippon Hoso Kyokai <Nhk> Device for tracking topics of news, device for extracting and presenting components of topics of news and broadcasting service method
CN104715014A (en) * 2015-01-26 2015-06-17 中山大学 Online news topic detection method
CN106610931A (en) * 2015-10-23 2017-05-03 北京国双科技有限公司 Extraction method and device for topic names
CN106709052A (en) * 2017-01-06 2017-05-24 电子科技大学 Keyword based topic-focused web crawler design method
CN109284507A (en) * 2018-11-29 2019-01-29 中山大学 A kind of method of filtering spam user and extraction short text topic
CN110134787A (en) * 2019-05-15 2019-08-16 北京信息科技大学 A kind of news topic detection method
CN110245355A (en) * 2019-06-24 2019-09-17 深圳市腾讯网域计算机网络有限公司 Text topic detecting method, device, server and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002108917A (en) * 2000-10-02 2002-04-12 Nippon Hoso Kyokai <Nhk> Device for tracking topics of news, device for extracting and presenting components of topics of news and broadcasting service method
CN104715014A (en) * 2015-01-26 2015-06-17 中山大学 Online news topic detection method
CN106610931A (en) * 2015-10-23 2017-05-03 北京国双科技有限公司 Extraction method and device for topic names
CN106709052A (en) * 2017-01-06 2017-05-24 电子科技大学 Keyword based topic-focused web crawler design method
CN109284507A (en) * 2018-11-29 2019-01-29 中山大学 A kind of method of filtering spam user and extraction short text topic
CN110134787A (en) * 2019-05-15 2019-08-16 北京信息科技大学 A kind of news topic detection method
CN110245355A (en) * 2019-06-24 2019-09-17 深圳市腾讯网域计算机网络有限公司 Text topic detecting method, device, server and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HONGBIN WANG等: "Fusion News Elements of News Text Similarity Calculation", 《ICMIR 2018》 *
原伟等: "基于本体的俄文新闻话题检测设计与实现", 《山东大学学报(理学版)》 *
彭仁杰等: "基于案件要素的案件话题优化", 《小型微型计算机***》 *
李卫疆等: "基于BTM和K-means的微博话题检测", 《计算机科学》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357278A (en) * 2020-09-28 2022-04-15 腾讯科技(深圳)有限公司 Topic recommendation method, device and equipment
CN114357278B (en) * 2020-09-28 2024-03-19 腾讯科技(深圳)有限公司 Topic recommendation method, device and equipment
CN113590774A (en) * 2021-06-22 2021-11-02 北京百度网讯科技有限公司 Event query method, device and storage medium
CN113590774B (en) * 2021-06-22 2023-09-29 北京百度网讯科技有限公司 Event query method, device and storage medium

Also Published As

Publication number Publication date
CN111324725B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN111241389B (en) Sensitive word filtering method and device based on matrix, electronic equipment and storage medium
WO2017097231A1 (en) Topic processing method and device
CN106874253A (en) Recognize the method and device of sensitive information
CN107943792B (en) Statement analysis method and device, terminal device and storage medium
CN113722438B (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN111984792A (en) Website classification method and device, computer equipment and storage medium
US20180246872A1 (en) System and method for automatic key phrase extraction rule generation
CN111061837A (en) Topic identification method, device, equipment and medium
CN109241392A (en) Recognition methods, device, system and the storage medium of target word
CN112995414B (en) Behavior quality inspection method, device, equipment and storage medium based on voice call
CN111324725B (en) Topic acquisition method, terminal and computer readable storage medium
CN111651666A (en) User theme recommendation method and device, computer equipment and storage medium
CN112507167A (en) Method and device for identifying video collection, electronic equipment and storage medium
CN108984514A (en) Acquisition methods and device, storage medium, the processor of word
CN111507090A (en) Abstract extraction method, device, equipment and computer readable storage medium
CN112115342A (en) Search method, search device, storage medium and terminal
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN113688629A (en) Text deduplication method and device and storage medium
CN116383521B (en) Subject word mining method and device, computer equipment and storage medium
CN106294584B (en) The training method and device of order models
CN112785095A (en) Loan prediction method, loan prediction device, electronic device, and computer-readable storage medium
CN110399464B (en) Similar news judgment method and system and electronic equipment
CN116842160A (en) Patent search type generation method, system, equipment and medium
US20210217422A1 (en) Method for establishing link to display relevant data of keyword and electronic device employing the method
CN110852078A (en) Method and device for generating title

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant