CN111428472A - Article automatic generation system and method based on natural language processing and image algorithm - Google Patents
Article automatic generation system and method based on natural language processing and image algorithm Download PDFInfo
- Publication number
- CN111428472A CN111428472A CN202010176734.0A CN202010176734A CN111428472A CN 111428472 A CN111428472 A CN 111428472A CN 202010176734 A CN202010176734 A CN 202010176734A CN 111428472 A CN111428472 A CN 111428472A
- Authority
- CN
- China
- Prior art keywords
- article
- image
- screening module
- module
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/5866—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention discloses an article automatic generation system and method based on natural language processing and image algorithm, which relates to the field of artificial intelligence and comprises an operation terminal, a display terminal and a display terminal, wherein the operation terminal is used for inputting product information and user information and outputting final tweet; the basic label extraction system is used for extracting keywords by using a BERT pre-training model according to input product information and user information and establishing a plurality of labels; the content generation module generates a plurality of corresponding titles, a plurality of article contents and a plurality of pictures in accordance with the article contents according to the label information; the intelligent screening module comprises an article screening module and an image screening module, and is used for screening the combination of a plurality of article contents and corresponding images by using the article screening module and the image screening module to obtain a group of article contents and corresponding images which meet the requirements; and the intelligent typesetting module is used for typesetting the finally selected article and the matched picture to obtain the final pushed article.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to an article automatic generation system and method based on natural language processing and image algorithm.
Background
At present, more and more products are recommended to users by enterprises, and in order to make the advertisements played out more attract the attention of the users, the most common mode is article recommendation of each large platform. The traditional method of writing short texts by manual method has the following disadvantages: the number of the short texts to be written is large, the writing process is complicated, a large amount of time is required for completing the short texts, and the short text generation rate is low; secondly, the writing of the short texts is usually based on the angle of an enterprise, so that the attraction to users is small, and the applicability of the recommended short texts is poor; third, to attract more attention, it is indispensable to include a drawing for pasting a theme in a document, and the current technology lacks research on image generation.
Disclosure of Invention
The invention aims to provide an automatic article generation system and method based on natural language processing and image algorithm, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: an article automatic generation system based on natural language processing and image algorithm comprises:
the operation terminal is used for inputting the product information query and the user information and outputting a final tweet;
the basic label extraction system is used for extracting keywords by using a BERT pre-training model according to input product information and user information and establishing a plurality of labels;
the content generation module comprises a title generation model, a short text generation model and an image generation model, and generates a plurality of corresponding titles, a plurality of article contents Ds and a plurality of pictures which are consistent with the article contents according to the label information;
the intelligent screening module comprises an article screening module and an image screening module, and is used for screening the combination of a plurality of article contents and corresponding images by using the article screening module and the image screening module to obtain a group of article contents and corresponding images which meet the requirements;
and the intelligent typesetting module is used for typesetting the finally selected article and the matched picture to obtain the final pushed article.
By inputting simple product information, the tweet and the corresponding picture can be generated, a large amount of manpower and time are saved, and the working efficiency is improved.
As a preferred technical scheme of the invention, the intelligent typesetting module comprises a text database, and a model for automatically typesetting the articles and the pictures is trained through a deep learning BERT algorithm.
As a preferred technical scheme of the invention, the content generation module comprises a collected title database and different types of short text databases; respectively obtaining a title generation model and a short text generation model through BERT pre-training model training; and drawing a picture conforming to the short text by an image generation model-StackGAN algorithm.
As a preferred technical scheme of the invention, the picture screening module calculates the degree of fit between the conforming picture and the article content according to an image generation model-StackGAN algorithm, sets a threshold value and selects the conforming picture with the highest degree of fit.
As a preferred technical solution of the present invention, the article screening module calculates a relevance score between a product information query and each document d through a BERT pre-training model, where the query includes words q1, q2... qn, and a calculation formula of the relevance score is:
wherein R (qi, d) is the relevance value of each word qi and the document d in the query sentence query, Wi is the inverse document frequency of the word qiThe rate of the IDF is set to be,wherein N is the total number of documents, N (q)i) The number of documents containing the word qi;
wherein k1, k2 and b are adjustment factors, qfi is the frequency of occurrence of the word qi in the query statement query, and fi is qiThe frequency of occurrence in document d, dl being the length of document d, avgdl being the average length of all documents;
calculating the relevance score of each document d and the query, setting a threshold value, comparing the threshold value with each calculated relevance score, and selecting a conforming article with proper relevance;
an article automatic generation method based on natural language processing and image algorithm comprises the following steps:
s1, inputting product information and user information through the operation terminal;
s2, the basic label extraction system extracts keywords from the input content and establishes a plurality of labels;
s3, the content generating module generates a plurality of titles, a plurality of article contents and a plurality of pictures corresponding to the article contents according to the label information generated in the step S2;
s4, screening out a short text and a matching picture with the highest fitting degree through an intelligent screening module;
and S5, typesetting the short texts and the corresponding pictures through the intelligent typesetting module to obtain final text pushing, and outputting the text pushing through the operation terminal.
Compared with the prior art: according to input product information and user information, firstly, extracting keywords by using a natural language processing algorithm and establishing a plurality of labels; then, a short text and a title which introduce the product and accord with the reading habit of the user are generated according to the label information; then, drawing an image which accords with the idea of the article aiming at the newly generated short text by using a StackGAN algorithm; and finally, screening the generated short texts by using a calculation formula of the relevance scores to obtain more appropriate contents, thereby greatly improving the writing efficiency of the articles.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a schematic structural diagram of an article generation module according to the present invention.
Fig. 3 is a schematic structural diagram of the intelligent screening module of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides an article automatic generation system based on natural language processing and image algorithm, comprising:
the operation terminal is used for inputting product information query and user information and outputting final text, an operation platform is provided for an operator, the operation platform can be a mobile phone or a computer, and output articles can be directly edited and shared.
The basic label extraction system is used for extracting keywords by using a BERT pre-training model according to input product information and user information and establishing a plurality of labels;
and the content generation module comprises a title generation module, a short text generation module and a picture generation module. And generating a plurality of corresponding titles, a plurality of article contents and a plurality of pictures in accordance with the article contents according to the label information. The content generation module comprises collected title databases and different types of short text databases; training through a natural language processing algorithm to obtain a title generation model; training through a natural language processing algorithm to obtain a short text generation model; and drawing a consistent picture through a short text generated by the StackGAN algorithm and a short text generation model.
The intelligent screening module comprises an article screening module and an image screening module; and the picture screening module is used for calculating the conformity of the conforming pictures and the article contents according to an image algorithm, setting a threshold value and selecting the conforming pictures with the highest conformity.
The article screening module calculates a relevance score between a product information query and each document d through a BERT pre-training model, wherein the query comprises words q1, q2... qn, and a calculation formula of the relevance score is as follows:where R (qi, d) is the relevance value of each word qi and document d in the query sentence query, Wi is the inverse document frequency IDF of the word qi,wherein N is the total number of documents, N (q)i) For the number of documents containing the word qi, 0.5 is the training coefficient, and the case where N (qi) is 0 is avoided, the log function is to make the IDF value more smoothly affected by N and N (qi). The meaning of the IDF value is evident from the formula: that is, the larger the total document number is, the smaller the document number including the word qi is, the larger the IDF value of qi is. For example, we have 1 million documents, and the word basketball, Kobe Bryant appears almost exclusively in sports-related documents, indicating that the IDF values of these two words are relatively large, and the words is, are, what appear almost in all documents, and the IDF values of these several words are very small. Wherein
Where k1, k2, and b are adjustment factors, generally, k1 is 1, k2 is 1, b is 0.75, and qfi is the frequency of occurrence of the word qi in the query sentence query, and fi is qiThe frequency of occurrence in document d, dl being the length of document d, avgdl being the average length of all documents; qfi is the frequency of occurrence of the word qi in the query statement queryAnd the rate fi is the frequency of occurrence of qi in the document d. Since in most cases the word qi appears only once in a short query sentence query, qfi ═ 1. Examples are for instance: one query is: where the Zhuge Liang died?
The contents of document1 are: zhuge Liang is the disease of the five long original products, and finally died;
the contents of document2 are: exemplarily speaking with the radix violae to give a cut in five husband years;
and document3 is the content of a whole history of China.
It is clear that document3 contains a significant amount of Gelian]Where]"go to death]These words, however, since the document length of document3 is too large, K is very large, and the degree of relevance R (qi, d) to each word qi in the query is very small. The comprehensive finishing formula is as follows:
and calculating the relevance score of each document d and the query, setting a threshold value, comparing the threshold value with each calculated relevance score, and selecting a conforming article with proper relevance.
The intelligent typesetting module comprises a text database and trains a model for automatically typesetting the articles and the pictures through a machine deep learning algorithm. And typesetting the finally selected article and the corresponding picture to obtain the final tweet.
An article automatic generation method based on natural language processing and image algorithm comprises the following steps:
s1, inputting product information and user information through the operation terminal;
s2, extracting keywords from the input content by the basic label extraction system, and establishing a plurality of labels;
s3, the content generating module generates a plurality of titles, a plurality of article contents and a plurality of pictures corresponding to the article contents according to the label information generated in the step S2;
s4, screening out a short text and a matching picture with the highest fitting degree through an intelligent screening module; after the selection, the approval of the operator can not be obtained, and the step S2 is restarted until a satisfactory short text and a matching picture with the highest fitting degree are selected;
and S5, typesetting the short texts and the corresponding pictures through the intelligent typesetting module to obtain final text pushing, and outputting the text pushing through the operation terminal.
In step S4, first, the generated short texts and titles are screened according to the word number requirement and the established standards of sensitive word lexicon, so as to obtain short texts and titles meeting the requirement, and if the short texts and titles cannot be obtained satisfactorily after screening, the method returns to the second step to regenerate titles, articles and pictures until satisfactory contents are obtained; and then, calculating the fitting degree of the pictures and the short texts according to an image algorithm aiming at a series of pictures generated by the short texts, and if a threshold value is set to be 80%, selecting the pictures with the highest values from the pictures with the fitting degree higher than 80%.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. An automatic article generation system based on natural language processing and image algorithm is characterized by comprising:
the operation terminal is used for inputting the product information query and the user information and outputting a final tweet;
the basic label extraction system is used for extracting keywords by using a BERT pre-training model according to input product information and user information and establishing a plurality of labels;
the content generation module comprises a title generation model, a short text generation model and an image generation model, and generates a plurality of corresponding titles, a plurality of article contents d and a plurality of pictures which are consistent with the article contents according to the label information; the intelligent screening module comprises an article screening module and an image screening module, and is used for screening the combination of a plurality of article contents and corresponding images by using the article screening module and the image screening module to obtain a group of article contents and corresponding images which meet the requirements;
and the intelligent typesetting module is used for typesetting the finally selected article and the matched picture to obtain the final pushed article.
2. The system of claim 1 for automatically generating articles based on natural language processing and image algorithms, wherein: the intelligent typesetting module comprises a text database, and a model for automatically typesetting the articles and the pictures is trained through a deep learning BERT algorithm.
3. The system of claim 1 for automatically generating articles based on natural language processing and image algorithms, wherein: the content generation module comprises a collected title database and different types of short text databases; respectively obtaining a title generation model and a short text generation model through BERT pre-training model training; and drawing a picture conforming to the short text by an image generation model-StackGAN algorithm.
4. The system of claim 3 for automatically generating articles based on natural language processing and image algorithms, wherein: and the picture screening module calculates the conformity of the conforming pictures and the article contents according to an image generation model-StackGAN algorithm, sets a threshold value and selects the conforming picture with the highest conformity.
5. The system of claim 3 for automatically generating articles based on natural language processing and image algorithms, wherein: the article screening module calculates a relevance score between product information query and each document d through a BERT pre-training model, wherein the query comprises words q1, q2... qn, and a calculation formula of the relevance score is as follows:
where R (qi, d) is the relevance value of each word qi and document d in the query sentence query, Wi is the inverse document frequency IDF of the word qi,wherein N is the total number of documents, N (q)i) The number of documents containing the word qi;
wherein k1, k2 and b are adjustment factors, qfi is the frequency of occurrence of the word qi in the query statement query, and fi is qiThe frequency of occurrence in document d, dl being the length of document d, avgdl being the average length of all documents;
and calculating the relevance score of each document d and the query, setting a threshold value, comparing the threshold value with each calculated relevance score, and selecting a conforming article with proper relevance.
6. A method for automatically generating an article based on natural language processing and image algorithm according to claims 1-5, characterized by comprising the following steps:
s1, inputting product information and user information through the operation terminal;
s2, the basic label extraction system extracts keywords from the input content and establishes a plurality of labels;
s3, the content generating module generates a plurality of titles, a plurality of article contents and a plurality of pictures corresponding to the article contents according to the label information generated in the step S2;
s4, screening out a short text and a matching picture with the highest fitting degree through an intelligent screening module; after the selection, the approval of the operator can not be obtained, and the step S2 is restarted until a satisfactory short text and a matching picture with the highest fitting degree are selected;
and S5, typesetting the short texts and the corresponding pictures through the intelligent typesetting module to obtain final text pushing, and outputting the text pushing through the operation terminal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010176734.0A CN111428472A (en) | 2020-03-13 | 2020-03-13 | Article automatic generation system and method based on natural language processing and image algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010176734.0A CN111428472A (en) | 2020-03-13 | 2020-03-13 | Article automatic generation system and method based on natural language processing and image algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111428472A true CN111428472A (en) | 2020-07-17 |
Family
ID=71547906
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010176734.0A Pending CN111428472A (en) | 2020-03-13 | 2020-03-13 | Article automatic generation system and method based on natural language processing and image algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111428472A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113220825A (en) * | 2021-03-23 | 2021-08-06 | 上海交通大学 | Modeling method and system of topic emotion tendency prediction model for personal tweet |
CN115204118A (en) * | 2022-07-12 | 2022-10-18 | 平安科技(深圳)有限公司 | Article generation method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407168A (en) * | 2016-09-06 | 2017-02-15 | 首都师范大学 | Automatic generation method for practical writing |
CN106777193A (en) * | 2016-12-23 | 2017-05-31 | 李鹏 | A kind of method for writing specific contribution automatically |
CN106970898A (en) * | 2017-03-31 | 2017-07-21 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating article |
CN107992542A (en) * | 2017-11-27 | 2018-05-04 | 中山大学 | A kind of similar article based on topic model recommends method |
CN109858028A (en) * | 2019-01-30 | 2019-06-07 | 神思电子技术股份有限公司 | A kind of short text similarity calculating method based on probabilistic model |
-
2020
- 2020-03-13 CN CN202010176734.0A patent/CN111428472A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407168A (en) * | 2016-09-06 | 2017-02-15 | 首都师范大学 | Automatic generation method for practical writing |
CN106777193A (en) * | 2016-12-23 | 2017-05-31 | 李鹏 | A kind of method for writing specific contribution automatically |
CN106970898A (en) * | 2017-03-31 | 2017-07-21 | 百度在线网络技术(北京)有限公司 | Method and apparatus for generating article |
CN107992542A (en) * | 2017-11-27 | 2018-05-04 | 中山大学 | A kind of similar article based on topic model recommends method |
CN109858028A (en) * | 2019-01-30 | 2019-06-07 | 神思电子技术股份有限公司 | A kind of short text similarity calculating method based on probabilistic model |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113220825A (en) * | 2021-03-23 | 2021-08-06 | 上海交通大学 | Modeling method and system of topic emotion tendency prediction model for personal tweet |
CN115204118A (en) * | 2022-07-12 | 2022-10-18 | 平安科技(深圳)有限公司 | Article generation method and device, computer equipment and storage medium |
CN115204118B (en) * | 2022-07-12 | 2023-06-27 | 平安科技(深圳)有限公司 | Article generation method, apparatus, computer device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287922B (en) | Text data viewpoint abstract mining method fusing topic attributes and emotional information | |
CN106776711B (en) | Chinese medical knowledge map construction method based on deep learning | |
US9336299B2 (en) | Acquisition of semantic class lexicons for query tagging | |
CN109726298B (en) | Knowledge graph construction method, system, terminal and medium suitable for scientific and technical literature | |
CN111881307A (en) | Demonstration manuscript generation method and device, computer equipment and storage medium | |
CN100595760C (en) | Method for gaining oral vocabulary entry, device and input method system thereof | |
CN108363725B (en) | Method for extracting user comment opinions and generating opinion labels | |
US20110060734A1 (en) | Method and Apparatus of Knowledge Base Building | |
CN108268668B (en) | Topic diversity-based text data viewpoint abstract mining method | |
CN117056471A (en) | Knowledge base construction method and question-answer dialogue method and system based on generation type large language model | |
WO2021217772A1 (en) | Ai-based interview corpus classification method and apparatus, computer device and medium | |
CN110008309B (en) | Phrase mining method and device | |
CN101556596B (en) | Input method system and intelligent word making method | |
CN113886604A (en) | Job knowledge map generation method and system | |
CN112989208B (en) | Information recommendation method and device, electronic equipment and storage medium | |
US20220207483A1 (en) | Automatic document classification | |
CN113627797B (en) | Method, device, computer equipment and storage medium for generating staff member portrait | |
CN111428472A (en) | Article automatic generation system and method based on natural language processing and image algorithm | |
CN111930895A (en) | Document data retrieval method, device, equipment and storage medium based on MRC | |
CN110889292B (en) | Text data viewpoint abstract generating method and system based on sentence meaning structure model | |
CN113761114A (en) | Phrase generation method and device and computer-readable storage medium | |
CN111259223B (en) | News recommendation and text classification method based on emotion analysis model | |
CN111966899A (en) | Search ranking method, system and computer readable storage medium | |
CN114416914B (en) | Processing method based on picture question and answer | |
CN113127627B (en) | Poetry recommendation method based on LDA theme model and poetry knowledge map |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200717 |
|
RJ01 | Rejection of invention patent application after publication |