CN107491417B - Document generation method based on specific division under topic model - Google Patents
Document generation method based on specific division under topic model Download PDFInfo
- Publication number
- CN107491417B CN107491417B CN201710548431.5A CN201710548431A CN107491417B CN 107491417 B CN107491417 B CN 107491417B CN 201710548431 A CN201710548431 A CN 201710548431A CN 107491417 B CN107491417 B CN 107491417B
- Authority
- CN
- China
- Prior art keywords
- distribution
- elbo
- topic
- variation
- subset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of data mining, and particularly relates to a document generation method based on a specially divided topic model. The invention adds the concept of the subset according to a given text database dividing mode, for example, for some text databases, such as news databases, the topic distribution of texts in a certain time segment has certain similarity, especially for the texts of different news channels reporting the same event, the database can be divided into the subsets by utilizing the attribute of the time segment. Thus, the present invention proposes a new topic model (DbLDA) on a text database; in the DbLDA, the specific steps of the generation of each document are: generating a theme matrix; generating a topic distribution for a subset: generating a theme distribution for the articles in the subset; for each word, a topic is selected, and a word is selected. Can be applied to a text database with structured attributes.
Description
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a document generation method based on a specially divided topic model, which is applied to a text database with structured attributes.
Background
The method is widely applied to the field of data mining nowadays by utilizing a topic model to process and analyze text data, wherein LDA (Latent Dirichlet Allocation) is widely concerned as a simple and easy-to-use topic model. LDA assumes that each text originates from a separate generation process and thus ignores the connections between texts, which may however lead to a degradation of the model effect.
A large amount of text data contains not only structured attributes such as time, place, etc., but also unstructured text content attributes. The text data can be organized into a number of subsets according to these structured attributes, stored in a text database, which forms a particular partition of the text database. Based on the observation that the texts classified into the same subset have some commonality, i.e. there is a link between the texts. For example, in the case of a news database, news data at the same time or the same place may be focused on important events such as the spread of a certain virus or the movement track of a certain typhoon. According to this phenomenon, the entire news database may be divided into some subsets having commonality according to their time or location attributes. In order to analyze the document set which can be subjected to subset division, the invention designs a new theme model based on LDA from the perspective of generating the model, and the aim is to fully utilize the commonality in each subset aiming at the specific division of the text database so as to obtain a theme model with better effect.
Disclosure of Invention
The invention aims to provide a document generation method based on a theme model of specific division, which has better effect.
The invention relates to a new model constructed according to an LDA model. Text generation process in LDA: firstly, a distribution is generated from Dirichlet distribution as the theme distribution of an article, the theme of the article is generated from the theme distribution, and then a word is generated from the corresponding theme, so that the word in the document is obtainedWord of[3]The distribution of the topics of each article is independent.
The topic model based on specific division, which is proposed by the present invention, is designed according to LDA, and is applied to a text database with structured attributes, such as a news database with time or place tags (i.e. the text database can be divided according to these structured attributes). The invention considers the concept of adding subsets according to a given text database dividing mode, for example, for some text databases, such as news databases, the topic distribution of texts in a certain time segment has certain similarity, especially for the texts of different news channels reporting the same event, the database can be divided into subsets by using the attribute of the time segment. Therefore, the invention provides a new topic model (LDA over Text Database, implicit Dirichlet distribution on the Text Database) on the Text Database, which is recorded as DbLDA.
In the DbLDA, the specific steps of the generation method of each document are as follows:
step (1), generating a theme matrix: phi is ak~Dir(β);
step (4) for each word
(a) Selecting one theme: z is a radical ofs,d,n~Mult(π(θ′s,d));
(b) Selecting a word: w is as,d,n|zs,d,n~Mult(φk)。
Wherein the content of the first and second substances,is a mapping from a multi-term distribution parameter vector to a natural vector:
c is a constant, so that each multi-term distribution parameter vector corresponds to a natural parameter vector family; pi is the mapping from the natural parameter vector back to the multinomial distribution parameter vector,
the probability distribution of the random variables to each other in the graph model of fig. 1 corresponds to this generation process.
The key to the effectiveness of a topic model is whether it truly reflects the true distribution of the text data, i.e., whether the topic distribution has a corresponding physical meaning. From the perspective of the generation process, step (2) is equivalent to assigning a main topic distribution (or average topic distribution) to each subset, and step (3) adds gaussian noise to the average topic distribution to generate a topic distribution for each text in the subset, which is equivalent to that the topic distribution of each article in the subset surrounds a common main topic distribution. It is hoped that the commonality of the texts in the subsets is expressed by the modeling mode, so that the model parameters (namely, the topic, the multinomial distribution on the vocabulary) deduced by the model can reflect the characteristics of the text data more truly, and better model performance is obtained. Gaussian distributed Σ (gaussian distributed covariance matrix) can be considered as the subject distribution density in the subset.
Compared with LDA, the method adds a probability path when generating the theme distribution, namely, one layer of logic Gaussian distribution is added. In the field of data analysis, logical gaussian distributions are often viewed as "gaussian distributions over simplices". We use this distribution to model commonalities among the subsets.
As shown in fig. 2, the text topic distribution in the same subset on the simplex of the topic distribution of the DbLDA model is more concentrated, and there is similarity in topic distribution when corresponding to a real text database, for example, several news texts describing the same event at the same time.
With new topic models, a key task is to infer the model. The topic model is a class of generative models that generate documents, such as LDA (latent dirichlet distribution), from a designed graph model. The graph model comprises a plurality of hidden variables and known variables, and the main work of analyzing the graph model is to estimate the values of the hidden variables through the known variables, namely to solve the posterior distribution of the hidden variables in the model. However, most topic models have high complexity, and the posterior distribution of hidden variables in the model generally does not have a closed-form expression, so that the posterior distribution of the model needs to be approximately calculated, namely the model inference problem. The method for approximating inference is various, and the invention selects a method of using shrinkage variational Bayes to approximate the model, wherein the shrinkage means that some hidden variables in the model are removed from the posterior distribution by means of marginal integration. The reasons are two reasons: the DbLDA model is complex, not all hidden variables can be contracted, and the distribution of the required samples is more, so that the convergence of the sampling method is possibly too slow, and the convergence is difficult to judge; the calculation amount required by each iteration of DbLDA is large, the convergence is fast by using a variational method, and the program performance can be obviously improved.
For the variational method, the variational method transforms the problem into a maximization problem by the following formula, since the true posterior is not available. The logarithm of probability of a fact (text) is equal to the KL divergence (Kullback-Leibler divergence, also known as relative entropy) plus the true Lower Bound (ELBO), so minimizing the KL divergence is done by maximizing the ELBO.
Based on the idea of systolic variational bayes, we model explicitly the dependencies between hidden variables, but since θ' in DbLDA (the distribution of topics for an article in a subset) is difficult to remove by marginal integration, we only show modeling θ,(mean topic distribution and topic matrix of subsets, respectively). This approach is referred to as "partial contraction variational bayes". Thus, the variation posterior distribution has the following form:
in which θ' follows a variational Gaussian distributionz obeys variational multinomial distributions
Thus, ELBO becomes:
wherein the content of the first and second substances,is the entropy of the variation distribution.
First, aboutMaximizing ELBO. Since there is no limit to the two variation distributions, the maximum value isAnd (4) obtaining, namely, obtaining when the variational posterior is equal to the real posterior. After simplification, ELBO becomes:
then, ELBO is developed according to a graph model of DbLDA:
due to the non-conjugation of dirichlet distribution and logical gaussian distribution, it is difficult to directly compute in ELBO:
in order to simplify the calculation and meet the purpose of modeling, each dimension of the K-dimensional random variable theta' is assigned to obey independent unary Gaussian distribution, namely, the covariance matrix is defined as a diagonal matrixMeanwhile, a mapping from the polynomial distribution parameter vector to the natural parameter vector is selected as c 1. By simplification, the above equation can be calculated.
Wherein D issIs the amount of text in the subset s.
Referring to the inference method of Gaussian random variables in CTM (Correlated Topic Model), since the variation of the logarithmic normalization factor of theta' is expected to be difficult to calculate, in order to maintain the property of the lower bound of ELBO, the invention uses Taylor expansion to find an upper bound for the logarithmic normalization factor of theta[1]:
Then, with respect to the variation parameterMaximizing ELBO, the loop is updated in turn about each variation parameter. Specifically, the method comprises the following steps:
Simplifying the above formula by eliminating terms appearing in the denominator of the numerator in the above formula yields:
however, the computation cost of the variation expectation term in the above formula is too high, so the variation expectation is estimated by adopting a Gaussian approximation method in the original paper of a shrinkage variation Bayesian inference method paper, but only 0-order Taylor expansion is kept as further approximation in the approximation,
to improve the computational performance (zero taylor expansion 1).
Using the above-mentioned pairBy approximating the expression (c), we can obtainThe update equation of (1).
Second, ELBO is maximized with respect to ζ.
For derivatives of the zeta-related term in ELBO and making the derivative zero, the zeta update equation can be obtained:
there is no analytical solution to zero this derivative. For this purpose, Newton's method is used to solveTo the maximization of (a).
Finally, as toMaximize ELBO with the constraint ofAs above, this maximization problem also does not have an analytical solution, and is therefore solved using newton's method. ELBO aboutTaking the derivative of
And (4) combining the conclusions, and updating the variation parameters in turn in each iteration to obtain a coordinated ascending algorithm of the ELBO.
Drawings
FIG. 1 is a DbLDA graphical model.
FIG. 2 shows a sample of the simplex distribution of DbLDA topics (3 topics, 2 subsets, 1000 articles, a red dot corresponding to the topic distribution of an article, a blue dot for the main topic distribution of each subset, and a pink triangle for the simplex distribution of topics)
Fig. 3 shows DbLDA/CVB0_ LDA (implicit dirichlet distribution inferred using systolic variational bayes)/CGS _ LDA (implicit dirichlet distribution inferred using gibbs sampling) experimental test results (α ═ 1.01, β ═ 0.01, K ═ 50 for DbLDA, given ═ 1.0) for Predictive Perplexity (descriptive prediction capability) predicted on the one month news data (3942 news texts, 16379 words) of the road agency, while for DbLDA, given ═ 1.0)
Fig. 4 shows experimental results of predicted chaos for DbLDA/CVB0_ LDA under different settings (different sizes of text database, 1, 2, 6 months, and different sub-divisions, 15 days news for a sub-set and 30 days news for a sub-set).
Detailed Description
The method comprises the steps of dividing the test of the DbLDA subject model into an LDA comparison experiment and a model experiment under different parameters, wherein the comparison method comprises the steps of calculating the prediction chaos and the running time under different models, and the different parameters are different subset lengths or different text database sizes. The text prediction capabilities of the DbLDA topic model are intended to be tested in various ways.
(1) Experimental Environment and data set introduction
The experimental programs are all run in an Ubuntu 16.04 environment, the experimental machines CPU is i5-3470, the memory 12GB, and the experiments are all performed using codes written in Java (8), and the following DbLDA and LDA experiments are all performed under the conditions that the given parameters α is 1.01, β is 0.01, and K is 50, while the given parameters Σ is 1.0 for DbLDA.
And collecting English news data on the Rough agency as test language material for testing the performance of the DbLDA. The division basis of the news text subsets is time, and the minimum division unit is day. The linguistic data is subjected to word segmentation, punctuation removal, root word extraction and other processing, and all words with one occurrence frequency are removed. Through processing, the news data of the road transparent society in six months (one month to six months, 22723 news texts, 36639 words) are obtained. The test text set is 10% of the training text set.
(2) Comparative experiment with LDA
The measure of the comparison experiment is prediction chaos, a standard often used for measuring the prediction capability of the language model, and the lower prediction chaos represents that the model has better prediction capability.
For the DbLDA, the test set text generation probability calculation mode is as follows (namely, using the variational posterior as the true posterior and using the value of the parameter in the expected estimation model):
the subject model compared to DbLDA is an implicit dirichlet distribution using a systolic variational bayesian inference (CVB) (while approximating the desired variational, again using a zero-order taylor expansion approximation, denoted CVB 0). As the theme distribution of each text in the test text set needs to be obtained, phi is obtained by training in the training text set, and then for each article in the test text set, the theta of each article is obtained by training 50% of the first text. The algorithm for training the test text set is consistent with the previous method except that phi is a fixed value and is fixed as a result obtained in the training text set. And obtaining the theme distribution of the test text set, namely substituting the expression into the last 50% of texts of each test text, and calculating the prediction chaos.
In the comparative experiment, data of one month (april, 3942 news texts, 16379 words) in the data set of the ro-moto was taken as a data set, and for DbLDA, it was set that each subset included 7 days of news texts. In the experiment, each section of program is iterated for 500 times, phi of the training text set is estimated after each iteration, and the prediction chaos is calculated according to the method. It should be noted that the implicit Dirichlet distribution inferred by using the contracted Gibbs sampling is added in the experiment for comparison, so as to illustrate that different model inference methods can cause different model test results.
The results of the experiment are shown in FIG. 3. From experimental results, the prediction capability of the DbLDA on the test text set is stronger than that of the hidden Dirichlet distribution deduced by using the contracted Gibbs sampling, and a better model is not obtained in 500 iterations due to slower convergence of the hidden Dirichlet distribution by using the contracted Gibbs sampling, which also shows the influence of different approximate deduction modes on the performance of the topic model. The convergence speed of the DbLDA is slower than that of the implicit Dirichlet distribution deduced by using the contracted Gibbs sampling, and meanwhile, the time required for reaching the same prediction chaos is longer than that of the implicit Dirichlet distribution by using the contracted Gibbs sampling, because the number of variables needing to be updated by iteration of the DbLDA is more than that of the implicit Dirichlet distribution, and because some variables need to be updated by using a Newton method, 5-10 times of extra iteration is added in each iteration to obtain the calculation amount of an approximate solution.
Table 2 shows the time efficiency comparison of DbLDA and implicit dirichlet distribution inferred using contracted gibbs sampling, recording the time required for both algorithms to reach the same one of the chaos thresholds, the data set tested and the parameter settings as above.
TABLE 2 comparison of iterative time tests (time required to reach the same Perplexity in seconds)
Claims (4)
1. A document generation method based on a specially divided topic model is characterized in that the topic model is hidden Dirichlet distribution on a text database and is marked as DbLDA; in DbLDA, the specific steps for each document generation are as follows:
step (1), generating a theme matrix: phi is ak~Dir(β);
step (4) for each word
(a) Selecting one theme: z is a radical ofsdn~Mult(π(θ′sd));
(b) Selecting a word: w is asdn|zsdn~Mult(φk);
Wherein the content of the first and second substances,is a mapping from a multi-term distribution parameter vector to a natural vector:
each multinomial distribution parameter vector corresponds to a natural parameter vector family, and c is a constant; pi is the mapping from the natural parameter vector back to the multinomial distribution parameter vector,
wherein the parameters and symbols used are as follows:
s represents the number of subsets;
α represents a hyper-parameter of Dirichlet priors of the distribution of topics in the subset;
beta represents a hyper-parameter of Dirichlet prior of the distribution of each word frequency in each topic;
∑srepresenting the distribution density of the topics in the subset s;
θsrepresenting the mean topic distribution of the subset s;
θ′sda topic distribution representing the d text in the subset s;
phi denotes a topic matrix;
zsdna topic representing the nth word of the d text in the subset s;
wsdnthe nth word representing the mth text in the subset s.
2. The method for generating document under topic model based on specific partition according to claim 1, wherein the topic model is approximated by using a method of contraction variational Bayes; the contraction is to remove some hidden variables in the subject model from the posterior distribution by means of marginal integration;
for the variational bayes method, the probability logarithm of a fact is equal to the KL divergence plus the lower bound of the fact, denoted as ELBO, so minimizing the KL divergence is obtained by maximizing ELBO:
since the topic distribution θ' of an article in a subset of the DbLDA is difficult to remove by marginal integration, only the average topic distribution θ and topic matrix for the modeled subset are displayedOf (c); this practice is called partial shrinkage variational bayes; thus, the variation posterior distribution has the following form:
in which θ' follows a variational Gaussian distributionz obeys variational multinomial distributions
Thus, ELBO becomes:
3. The method for generating documents under the topic model based on specific partitions of claim 2, wherein the specific steps of maximizing ELBO are as follows:
since there is no limit to the two variation distributions, the maximum value is Obtaining the variation posterior, namely the variation posterior is equal to the real posterior; after simplification, ELBO becomes:
then, ELBO is developed according to a graph model of DbLDA:
to simplify the calculation, each dimension of θ' is assigned to follow an independent unary Gaussian distribution, i.e., the covariance matrix is defined as a diagonal matrixMeanwhile, selecting a mapping from the multinomial distribution parameter vector to the natural parameter vector as c being 1; through simplification, the above formula can be calculated;
wherein D issK represents the number of topics as the number of texts in the subset s;
an upper bound is found for the logarithmic normalization factor for θ' using Taylor expansion:
then, with respect to the variation parameterMaximizing ELBO, and sequentially circulating each variation parameter for updating;
ζ represents a variation parameter required in calculating the logarithmic normalization factor of θ'.
4. The method of claim 3, wherein the parameters related to variation are selected from the group consisting ofMaximizing ELBO, and sequentially circulating each variation parameter to update, the specific steps are as follows:
simplifying the above formula by eliminating terms appearing in the denominator of the numerator in the above formula yields:
because the computation cost of the variation expectation term in the above formula is too high, the variation expectation is estimated by adopting a Gaussian approximation method, and only 0-order Taylor expansion is reserved as further approximation during the approximation so as to improve the computation performance:
second step, ELBO maximization with respect to ζ
For the derivative of the ζ -related term in ELBO and making the derivative zero, the ζ update equation is obtained:
there is no analytical solution to zero this derivative; for this purpose, Newton's method is used to solveThe maximization problem of (a);
finally, as toMaximize ELBO with the constraint ofAs above, this maximization problem also does not have an analytical solution, so it is solved using newton's method; ELBO aboutTaking the derivative of
By integrating the above steps, each iteration updates the variation parameters in turn, and a coordinated ascending algorithm of ELBO is obtained;
v represents the vocabulary size;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710548431.5A CN107491417B (en) | 2017-07-06 | 2017-07-06 | Document generation method based on specific division under topic model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710548431.5A CN107491417B (en) | 2017-07-06 | 2017-07-06 | Document generation method based on specific division under topic model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107491417A CN107491417A (en) | 2017-12-19 |
CN107491417B true CN107491417B (en) | 2021-06-22 |
Family
ID=60644370
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710548431.5A Active CN107491417B (en) | 2017-07-06 | 2017-07-06 | Document generation method based on specific division under topic model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107491417B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110110331B (en) * | 2019-04-30 | 2021-02-26 | 清华大学 | Text generation method, device, medium and computing equipment |
CN110738242B (en) * | 2019-09-25 | 2021-08-10 | 清华大学 | Bayes structure learning method and device of deep neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101587493A (en) * | 2009-06-29 | 2009-11-25 | 中国科学技术大学 | Text classification method |
CN102591917A (en) * | 2011-12-16 | 2012-07-18 | 华为技术有限公司 | Data processing method and system and related device |
CN105183833A (en) * | 2015-08-31 | 2015-12-23 | 天津大学 | User model based microblogging text recommendation method and recommendation apparatus thereof |
CN105740354A (en) * | 2016-01-26 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Adaptive potential Dirichlet model selection method and apparatus |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8825648B2 (en) * | 2010-04-15 | 2014-09-02 | Microsoft Corporation | Mining multilingual topics |
US20120278353A1 (en) * | 2011-04-28 | 2012-11-01 | International Business Machines | Searching with topic maps of a model for canonical model based integration |
-
2017
- 2017-07-06 CN CN201710548431.5A patent/CN107491417B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101587493A (en) * | 2009-06-29 | 2009-11-25 | 中国科学技术大学 | Text classification method |
CN102591917A (en) * | 2011-12-16 | 2012-07-18 | 华为技术有限公司 | Data processing method and system and related device |
CN105183833A (en) * | 2015-08-31 | 2015-12-23 | 天津大学 | User model based microblogging text recommendation method and recommendation apparatus thereof |
CN105740354A (en) * | 2016-01-26 | 2016-07-06 | 中国人民解放军国防科学技术大学 | Adaptive potential Dirichlet model selection method and apparatus |
Non-Patent Citations (3)
Title |
---|
Text Categorization Based on Topic Model;Shibin Zhou et al;《International Journal of Computational Intelligence Systems》;20091204;第2卷(第4期);第398-409页 * |
一种面向大型网络的快速随机化社区挖掘算法;余韬 等;《第26届中国数据库学术会议论文集(B辑)》;20090915;第406-412页 * |
自然语言处理中主题模型的发展;徐戈 等;《计算机学报》;20110831(第08期);第1423-1436页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107491417A (en) | 2017-12-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | Sentence level topic models for associated topics extraction | |
JP5250076B2 (en) | Structure prediction model learning apparatus, method, program, and recording medium | |
JP2940501B2 (en) | Document classification apparatus and method | |
JP6902945B2 (en) | Text summarization system | |
CN109471889B (en) | Report accelerating method, system, computer equipment and storage medium | |
Pruteanu-Malinici et al. | Hierarchical Bayesian modeling of topics in time-stamped documents | |
US20160203105A1 (en) | Information processing device, information processing method, and information processing program | |
CN111462751A (en) | Method, apparatus, computer device and storage medium for decoding voice data | |
CN107491417B (en) | Document generation method based on specific division under topic model | |
WO2023088309A1 (en) | Method for rewriting narrative text, device, apparatus, and medium | |
Tatti | Ranking episodes using a partition model | |
Wang et al. | A brief tour of Bayesian sampling methods | |
JP4143234B2 (en) | Document classification apparatus, document classification method, and storage medium | |
US20220114441A1 (en) | Apparatus and method for scheduling data augmentation technique | |
CN110716761A (en) | Automatic and self-optimizing determination of execution parameters of software applications on an information processing platform | |
US7853541B1 (en) | Method and apparatus for simmered greedy optimization | |
Wang et al. | Gaussian process-based random search for continuous optimization via simulation | |
JP7143599B2 (en) | Metadata evaluation device, metadata evaluation method, and metadata evaluation program | |
Culp et al. | On adaptive regularization methods in boosting | |
US20220101187A1 (en) | Identifying and quantifying confounding bias based on expert knowledge | |
CN111339287B (en) | Abstract generation method and device | |
CN114610576A (en) | Log generation monitoring method and device | |
CN110162629B (en) | Text classification method based on multi-base model framework | |
CN113297854A (en) | Method, device and equipment for mapping text to knowledge graph entity and storage medium | |
Bethard et al. | Topic model analysis of metaphor frequency for psycholinguistic stimuli |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |