CN111428472A - Article automatic generation system and method based on natural language processing and image algorithm - Google Patents

Article automatic generation system and method based on natural language processing and image algorithm Download PDF

Info

Publication number
CN111428472A
CN111428472A CN202010176734.0A CN202010176734A CN111428472A CN 111428472 A CN111428472 A CN 111428472A CN 202010176734 A CN202010176734 A CN 202010176734A CN 111428472 A CN111428472 A CN 111428472A
Authority
CN
China
Prior art keywords
article
image
screening module
module
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010176734.0A
Other languages
Chinese (zh)
Inventor
孟宪坤
边树森
***
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Huakun Daowei Data Technology Co ltd
Original Assignee
Zhejiang Huakun Daowei Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Huakun Daowei Data Technology Co ltd filed Critical Zhejiang Huakun Daowei Data Technology Co ltd
Priority to CN202010176734.0A priority Critical patent/CN111428472A/en
Publication of CN111428472A publication Critical patent/CN111428472A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses an article automatic generation system and method based on natural language processing and image algorithm, which relates to the field of artificial intelligence and comprises an operation terminal, a display terminal and a display terminal, wherein the operation terminal is used for inputting product information and user information and outputting final tweet; the basic label extraction system is used for extracting keywords by using a BERT pre-training model according to input product information and user information and establishing a plurality of labels; the content generation module generates a plurality of corresponding titles, a plurality of article contents and a plurality of pictures in accordance with the article contents according to the label information; the intelligent screening module comprises an article screening module and an image screening module, and is used for screening the combination of a plurality of article contents and corresponding images by using the article screening module and the image screening module to obtain a group of article contents and corresponding images which meet the requirements; and the intelligent typesetting module is used for typesetting the finally selected article and the matched picture to obtain the final pushed article.

Description

Article automatic generation system and method based on natural language processing and image algorithm
Technical Field
The invention relates to the field of artificial intelligence, in particular to an article automatic generation system and method based on natural language processing and image algorithm.
Background
At present, more and more products are recommended to users by enterprises, and in order to make the advertisements played out more attract the attention of the users, the most common mode is article recommendation of each large platform. The traditional method of writing short texts by manual method has the following disadvantages: the number of the short texts to be written is large, the writing process is complicated, a large amount of time is required for completing the short texts, and the short text generation rate is low; secondly, the writing of the short texts is usually based on the angle of an enterprise, so that the attraction to users is small, and the applicability of the recommended short texts is poor; third, to attract more attention, it is indispensable to include a drawing for pasting a theme in a document, and the current technology lacks research on image generation.
Disclosure of Invention
The invention aims to provide an automatic article generation system and method based on natural language processing and image algorithm, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: an article automatic generation system based on natural language processing and image algorithm comprises:
the operation terminal is used for inputting the product information query and the user information and outputting a final tweet;
the basic label extraction system is used for extracting keywords by using a BERT pre-training model according to input product information and user information and establishing a plurality of labels;
the content generation module comprises a title generation model, a short text generation model and an image generation model, and generates a plurality of corresponding titles, a plurality of article contents Ds and a plurality of pictures which are consistent with the article contents according to the label information;
the intelligent screening module comprises an article screening module and an image screening module, and is used for screening the combination of a plurality of article contents and corresponding images by using the article screening module and the image screening module to obtain a group of article contents and corresponding images which meet the requirements;
and the intelligent typesetting module is used for typesetting the finally selected article and the matched picture to obtain the final pushed article.
By inputting simple product information, the tweet and the corresponding picture can be generated, a large amount of manpower and time are saved, and the working efficiency is improved.
As a preferred technical scheme of the invention, the intelligent typesetting module comprises a text database, and a model for automatically typesetting the articles and the pictures is trained through a deep learning BERT algorithm.
As a preferred technical scheme of the invention, the content generation module comprises a collected title database and different types of short text databases; respectively obtaining a title generation model and a short text generation model through BERT pre-training model training; and drawing a picture conforming to the short text by an image generation model-StackGAN algorithm.
As a preferred technical scheme of the invention, the picture screening module calculates the degree of fit between the conforming picture and the article content according to an image generation model-StackGAN algorithm, sets a threshold value and selects the conforming picture with the highest degree of fit.
As a preferred technical solution of the present invention, the article screening module calculates a relevance score between a product information query and each document d through a BERT pre-training model, where the query includes words q1, q2... qn, and a calculation formula of the relevance score is:
Figure BDA0002411080090000021
wherein R (qi, d) is the relevance value of each word qi and the document d in the query sentence query, Wi is the inverse document frequency of the word qiThe rate of the IDF is set to be,
Figure BDA0002411080090000022
wherein N is the total number of documents, N (q)i) The number of documents containing the word qi;
Figure BDA0002411080090000023
wherein k1, k2 and b are adjustment factors, qfi is the frequency of occurrence of the word qi in the query statement query, and fi is qiThe frequency of occurrence in document d, dl being the length of document d, avgdl being the average length of all documents;
calculating the relevance score of each document d and the query, setting a threshold value, comparing the threshold value with each calculated relevance score, and selecting a conforming article with proper relevance;
an article automatic generation method based on natural language processing and image algorithm comprises the following steps:
s1, inputting product information and user information through the operation terminal;
s2, the basic label extraction system extracts keywords from the input content and establishes a plurality of labels;
s3, the content generating module generates a plurality of titles, a plurality of article contents and a plurality of pictures corresponding to the article contents according to the label information generated in the step S2;
s4, screening out a short text and a matching picture with the highest fitting degree through an intelligent screening module;
and S5, typesetting the short texts and the corresponding pictures through the intelligent typesetting module to obtain final text pushing, and outputting the text pushing through the operation terminal.
Compared with the prior art: according to input product information and user information, firstly, extracting keywords by using a natural language processing algorithm and establishing a plurality of labels; then, a short text and a title which introduce the product and accord with the reading habit of the user are generated according to the label information; then, drawing an image which accords with the idea of the article aiming at the newly generated short text by using a StackGAN algorithm; and finally, screening the generated short texts by using a calculation formula of the relevance scores to obtain more appropriate contents, thereby greatly improving the writing efficiency of the articles.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
FIG. 2 is a schematic structural diagram of an article generation module according to the present invention.
Fig. 3 is a schematic structural diagram of the intelligent screening module of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides an article automatic generation system based on natural language processing and image algorithm, comprising:
the operation terminal is used for inputting product information query and user information and outputting final text, an operation platform is provided for an operator, the operation platform can be a mobile phone or a computer, and output articles can be directly edited and shared.
The basic label extraction system is used for extracting keywords by using a BERT pre-training model according to input product information and user information and establishing a plurality of labels;
and the content generation module comprises a title generation module, a short text generation module and a picture generation module. And generating a plurality of corresponding titles, a plurality of article contents and a plurality of pictures in accordance with the article contents according to the label information. The content generation module comprises collected title databases and different types of short text databases; training through a natural language processing algorithm to obtain a title generation model; training through a natural language processing algorithm to obtain a short text generation model; and drawing a consistent picture through a short text generated by the StackGAN algorithm and a short text generation model.
The intelligent screening module comprises an article screening module and an image screening module; and the picture screening module is used for calculating the conformity of the conforming pictures and the article contents according to an image algorithm, setting a threshold value and selecting the conforming pictures with the highest conformity.
The article screening module calculates a relevance score between a product information query and each document d through a BERT pre-training model, wherein the query comprises words q1, q2... qn, and a calculation formula of the relevance score is as follows:
Figure BDA0002411080090000041
where R (qi, d) is the relevance value of each word qi and document d in the query sentence query, Wi is the inverse document frequency IDF of the word qi,
Figure BDA0002411080090000042
wherein N is the total number of documents, N (q)i) For the number of documents containing the word qi, 0.5 is the training coefficient, and the case where N (qi) is 0 is avoided, the log function is to make the IDF value more smoothly affected by N and N (qi). The meaning of the IDF value is evident from the formula: that is, the larger the total document number is, the smaller the document number including the word qi is, the larger the IDF value of qi is. For example, we have 1 million documents, and the word basketball, Kobe Bryant appears almost exclusively in sports-related documents, indicating that the IDF values of these two words are relatively large, and the words is, are, what appear almost in all documents, and the IDF values of these several words are very small. Wherein
Figure BDA0002411080090000051
Where k1, k2, and b are adjustment factors, generally, k1 is 1, k2 is 1, b is 0.75, and qfi is the frequency of occurrence of the word qi in the query sentence query, and fi is qiThe frequency of occurrence in document d, dl being the length of document d, avgdl being the average length of all documents; qfi is the frequency of occurrence of the word qi in the query statement queryAnd the rate fi is the frequency of occurrence of qi in the document d. Since in most cases the word qi appears only once in a short query sentence query, qfi ═ 1. Examples are for instance: one query is: where the Zhuge Liang died?
The contents of document1 are: zhuge Liang is the disease of the five long original products, and finally died;
the contents of document2 are: exemplarily speaking with the radix violae to give a cut in five husband years;
and document3 is the content of a whole history of China.
It is clear that document3 contains a significant amount of Gelian]Where]"go to death]These words, however, since the document length of document3 is too large, K is very large, and the degree of relevance R (qi, d) to each word qi in the query is very small. The comprehensive finishing formula is as follows:
Figure BDA0002411080090000052
and calculating the relevance score of each document d and the query, setting a threshold value, comparing the threshold value with each calculated relevance score, and selecting a conforming article with proper relevance.
The intelligent typesetting module comprises a text database and trains a model for automatically typesetting the articles and the pictures through a machine deep learning algorithm. And typesetting the finally selected article and the corresponding picture to obtain the final tweet.
An article automatic generation method based on natural language processing and image algorithm comprises the following steps:
s1, inputting product information and user information through the operation terminal;
s2, extracting keywords from the input content by the basic label extraction system, and establishing a plurality of labels;
s3, the content generating module generates a plurality of titles, a plurality of article contents and a plurality of pictures corresponding to the article contents according to the label information generated in the step S2;
s4, screening out a short text and a matching picture with the highest fitting degree through an intelligent screening module; after the selection, the approval of the operator can not be obtained, and the step S2 is restarted until a satisfactory short text and a matching picture with the highest fitting degree are selected;
and S5, typesetting the short texts and the corresponding pictures through the intelligent typesetting module to obtain final text pushing, and outputting the text pushing through the operation terminal.
In step S4, first, the generated short texts and titles are screened according to the word number requirement and the established standards of sensitive word lexicon, so as to obtain short texts and titles meeting the requirement, and if the short texts and titles cannot be obtained satisfactorily after screening, the method returns to the second step to regenerate titles, articles and pictures until satisfactory contents are obtained; and then, calculating the fitting degree of the pictures and the short texts according to an image algorithm aiming at a series of pictures generated by the short texts, and if a threshold value is set to be 80%, selecting the pictures with the highest values from the pictures with the fitting degree higher than 80%.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. An automatic article generation system based on natural language processing and image algorithm is characterized by comprising:
the operation terminal is used for inputting the product information query and the user information and outputting a final tweet;
the basic label extraction system is used for extracting keywords by using a BERT pre-training model according to input product information and user information and establishing a plurality of labels;
the content generation module comprises a title generation model, a short text generation model and an image generation model, and generates a plurality of corresponding titles, a plurality of article contents d and a plurality of pictures which are consistent with the article contents according to the label information; the intelligent screening module comprises an article screening module and an image screening module, and is used for screening the combination of a plurality of article contents and corresponding images by using the article screening module and the image screening module to obtain a group of article contents and corresponding images which meet the requirements;
and the intelligent typesetting module is used for typesetting the finally selected article and the matched picture to obtain the final pushed article.
2. The system of claim 1 for automatically generating articles based on natural language processing and image algorithms, wherein: the intelligent typesetting module comprises a text database, and a model for automatically typesetting the articles and the pictures is trained through a deep learning BERT algorithm.
3. The system of claim 1 for automatically generating articles based on natural language processing and image algorithms, wherein: the content generation module comprises a collected title database and different types of short text databases; respectively obtaining a title generation model and a short text generation model through BERT pre-training model training; and drawing a picture conforming to the short text by an image generation model-StackGAN algorithm.
4. The system of claim 3 for automatically generating articles based on natural language processing and image algorithms, wherein: and the picture screening module calculates the conformity of the conforming pictures and the article contents according to an image generation model-StackGAN algorithm, sets a threshold value and selects the conforming picture with the highest conformity.
5. The system of claim 3 for automatically generating articles based on natural language processing and image algorithms, wherein: the article screening module calculates a relevance score between product information query and each document d through a BERT pre-training model, wherein the query comprises words q1, q2... qn, and a calculation formula of the relevance score is as follows:
Figure FDA0002411080080000021
where R (qi, d) is the relevance value of each word qi and document d in the query sentence query, Wi is the inverse document frequency IDF of the word qi,
Figure FDA0002411080080000022
wherein N is the total number of documents, N (q)i) The number of documents containing the word qi;
Figure FDA0002411080080000023
wherein k1, k2 and b are adjustment factors, qfi is the frequency of occurrence of the word qi in the query statement query, and fi is qiThe frequency of occurrence in document d, dl being the length of document d, avgdl being the average length of all documents;
and calculating the relevance score of each document d and the query, setting a threshold value, comparing the threshold value with each calculated relevance score, and selecting a conforming article with proper relevance.
6. A method for automatically generating an article based on natural language processing and image algorithm according to claims 1-5, characterized by comprising the following steps:
s1, inputting product information and user information through the operation terminal;
s2, the basic label extraction system extracts keywords from the input content and establishes a plurality of labels;
s3, the content generating module generates a plurality of titles, a plurality of article contents and a plurality of pictures corresponding to the article contents according to the label information generated in the step S2;
s4, screening out a short text and a matching picture with the highest fitting degree through an intelligent screening module; after the selection, the approval of the operator can not be obtained, and the step S2 is restarted until a satisfactory short text and a matching picture with the highest fitting degree are selected;
and S5, typesetting the short texts and the corresponding pictures through the intelligent typesetting module to obtain final text pushing, and outputting the text pushing through the operation terminal.
CN202010176734.0A 2020-03-13 2020-03-13 Article automatic generation system and method based on natural language processing and image algorithm Pending CN111428472A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010176734.0A CN111428472A (en) 2020-03-13 2020-03-13 Article automatic generation system and method based on natural language processing and image algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010176734.0A CN111428472A (en) 2020-03-13 2020-03-13 Article automatic generation system and method based on natural language processing and image algorithm

Publications (1)

Publication Number Publication Date
CN111428472A true CN111428472A (en) 2020-07-17

Family

ID=71547906

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010176734.0A Pending CN111428472A (en) 2020-03-13 2020-03-13 Article automatic generation system and method based on natural language processing and image algorithm

Country Status (1)

Country Link
CN (1) CN111428472A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220825A (en) * 2021-03-23 2021-08-06 上海交通大学 Modeling method and system of topic emotion tendency prediction model for personal tweet
CN115204118A (en) * 2022-07-12 2022-10-18 平安科技(深圳)有限公司 Article generation method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407168A (en) * 2016-09-06 2017-02-15 首都师范大学 Automatic generation method for practical writing
CN106777193A (en) * 2016-12-23 2017-05-31 李鹏 A kind of method for writing specific contribution automatically
CN106970898A (en) * 2017-03-31 2017-07-21 百度在线网络技术(北京)有限公司 Method and apparatus for generating article
CN107992542A (en) * 2017-11-27 2018-05-04 中山大学 A kind of similar article based on topic model recommends method
CN109858028A (en) * 2019-01-30 2019-06-07 神思电子技术股份有限公司 A kind of short text similarity calculating method based on probabilistic model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407168A (en) * 2016-09-06 2017-02-15 首都师范大学 Automatic generation method for practical writing
CN106777193A (en) * 2016-12-23 2017-05-31 李鹏 A kind of method for writing specific contribution automatically
CN106970898A (en) * 2017-03-31 2017-07-21 百度在线网络技术(北京)有限公司 Method and apparatus for generating article
CN107992542A (en) * 2017-11-27 2018-05-04 中山大学 A kind of similar article based on topic model recommends method
CN109858028A (en) * 2019-01-30 2019-06-07 神思电子技术股份有限公司 A kind of short text similarity calculating method based on probabilistic model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220825A (en) * 2021-03-23 2021-08-06 上海交通大学 Modeling method and system of topic emotion tendency prediction model for personal tweet
CN115204118A (en) * 2022-07-12 2022-10-18 平安科技(深圳)有限公司 Article generation method and device, computer equipment and storage medium
CN115204118B (en) * 2022-07-12 2023-06-27 平安科技(深圳)有限公司 Article generation method, apparatus, computer device and storage medium

Similar Documents

Publication Publication Date Title
CN108287922B (en) Text data viewpoint abstract mining method fusing topic attributes and emotional information
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
US9336299B2 (en) Acquisition of semantic class lexicons for query tagging
CN109726298B (en) Knowledge graph construction method, system, terminal and medium suitable for scientific and technical literature
CN111881307A (en) Demonstration manuscript generation method and device, computer equipment and storage medium
CN100595760C (en) Method for gaining oral vocabulary entry, device and input method system thereof
CN108363725B (en) Method for extracting user comment opinions and generating opinion labels
US20110060734A1 (en) Method and Apparatus of Knowledge Base Building
CN108268668B (en) Topic diversity-based text data viewpoint abstract mining method
CN117056471A (en) Knowledge base construction method and question-answer dialogue method and system based on generation type large language model
WO2021217772A1 (en) Ai-based interview corpus classification method and apparatus, computer device and medium
CN110008309B (en) Phrase mining method and device
CN101556596B (en) Input method system and intelligent word making method
CN113886604A (en) Job knowledge map generation method and system
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
US20220207483A1 (en) Automatic document classification
CN113627797B (en) Method, device, computer equipment and storage medium for generating staff member portrait
CN111428472A (en) Article automatic generation system and method based on natural language processing and image algorithm
CN111930895A (en) Document data retrieval method, device, equipment and storage medium based on MRC
CN110889292B (en) Text data viewpoint abstract generating method and system based on sentence meaning structure model
CN113761114A (en) Phrase generation method and device and computer-readable storage medium
CN111259223B (en) News recommendation and text classification method based on emotion analysis model
CN111966899A (en) Search ranking method, system and computer readable storage medium
CN114416914B (en) Processing method based on picture question and answer
CN113127627B (en) Poetry recommendation method based on LDA theme model and poetry knowledge map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200717

RJ01 Rejection of invention patent application after publication