CN110245350A - Official documents and correspondence is rewritten and update method, device and equipment - Google Patents

Official documents and correspondence is rewritten and update method, device and equipment Download PDF

Info

Publication number
CN110245350A
CN110245350A CN201910455177.3A CN201910455177A CN110245350A CN 110245350 A CN110245350 A CN 110245350A CN 201910455177 A CN201910455177 A CN 201910455177A CN 110245350 A CN110245350 A CN 110245350A
Authority
CN
China
Prior art keywords
correspondence
official documents
knowledge base
text
term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910455177.3A
Other languages
Chinese (zh)
Other versions
CN110245350B (en
Inventor
熊军
孙梦姝
陈若田
刘弘一
李若鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Nova Technology Singapore Holdings Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910455177.3A priority Critical patent/CN110245350B/en
Publication of CN110245350A publication Critical patent/CN110245350A/en
Application granted granted Critical
Publication of CN110245350B publication Critical patent/CN110245350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

This specification embodiment discloses a kind of official documents and correspondence and rewrites and update method, device and equipment.Wherein, in official documents and correspondence rewriting scheme, by the content of text to the official documents and correspondence to be rewritten got, word segmentation processing is carried out, participle is rewritten using the specified term in domain knowledge base, thus after by being rewritten to different participles, to obtain different candidate official documents and correspondences.

Description

Official documents and correspondence is rewritten and update method, device and equipment
Technical field
This specification is related to field of computer technology more particularly to a kind of official documents and correspondence is rewritten and update method, device and equipment.
Background technique
Promoted in marketing in official documents and correspondence, for example, appear in webpage, in APP (application program) advertisement exhibition unit advertisement, because of user It is easy to ad content fatigue, this just needs the content to the official documents and correspondence for promoting marketing to timely update, and updated official documents and correspondence Be also required to novelty, diversity and current hotspot can be followed, can just attract the user's attention in this way so that user go it is clear It lookes at, even click the official documents and correspondence that the popularization is marketed, to improve user's clicking rate and conversion ratio.
And in official documents and correspondence, often important content occurs from the content of text in official documents and correspondence banner.At present to official documents and correspondence Content of text be updated, usually first by manual redesign and new content of text is edited out, then formed new Official documents and correspondence is finally uploaded again and is updated by official documents and correspondence, and at high cost in this way, efficiency is also low, and artificially generated official documents and correspondence, and official documents and correspondence quantity is few, interior Hold and also not enough enrich, is difficult to attract the attention of user.
Summary of the invention
In view of this, this specification embodiment provides a kind of official documents and correspondence Improvement, device and equipment, for according to initial Official documents and correspondence (official documents and correspondence i.e. to be rewritten) is rewritten automatically, and to form candidate official documents and correspondence, not only quantity is more for official documents and correspondence candidate in this way, content It can be rich and varied;This specification embodiment additionally provides a kind of official documents and correspondence update method, device and equipment, initial for rewriting automatically Official documents and correspondence (official documents and correspondence i.e. to be updated) forms candidate official documents and correspondence, and then scores candidate official documents and correspondence, and automatically update official documents and correspondence.
This specification embodiment adopts the following technical solutions:
This specification embodiment provides a kind of official documents and correspondence Improvement, comprising:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to the participle It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be rewritten Dry candidate's official documents and correspondence.
This specification embodiment also provides a kind of official documents and correspondence update method, comprising:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be updated specifies term to the participle It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be updated Dry candidate's official documents and correspondence;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
This specification embodiment also provides a kind of official documents and correspondence re-writing device, including obtain module, word segmentation module, rewrite module and Generation module;
The content of text for obtaining module and being used to obtain official documents and correspondence to be rewritten;
The word segmentation module is used to carry out word segmentation processing to the content of text, obtains several participles;
The module of rewriting is used to be referred to according at least one of the domain knowledge base of the official documents and correspondence fields to be rewritten Determine term to rewrite the participle, the domain knowledge base is preparatory according to the specified term in the text data of corpus The knowledge base of foundation, the specified term are the term in official documents and correspondence field with business scenario meaning;
The generation module is used to carry out the revised content of text according to the different participles, generate it is described to Rewrite the corresponding several candidate official documents and correspondences of official documents and correspondence.
This specification embodiment also provides a kind of official documents and correspondence updating device, including obtain module, word segmentation module, rewrite module, Generation module, scoring modules and update module;
The content of text for obtaining module and being used to obtain official documents and correspondence to be updated;
The word segmentation module is used to carry out word segmentation processing to the content of text, obtains several participles;
The rewriting module is used to be referred to according at least one of domain knowledge base of the official documents and correspondence fields to be updated Determine term to rewrite the participle, the domain knowledge base is preparatory according to the specified term in the text data of corpus The knowledge base of foundation, the specified term are the term in official documents and correspondence field with business scenario meaning;
The generation module is used to carry out the revised content of text according to the different participles, generate it is described to Update the corresponding several candidate official documents and correspondences of official documents and correspondence;
The scoring modules are used to give a mark to the candidate official documents and correspondence based on language model;
The candidate official documents and correspondence after the update module is used to give a mark presses preset more new strategy, updates described to be updated Official documents and correspondence.
This specification embodiment also provides a kind of for rewriting the electronic equipment of official documents and correspondence, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to the participle It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be rewritten Dry candidate's official documents and correspondence.
This specification embodiment also provides a kind of for updating the electronic equipment of official documents and correspondence, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be updated specifies term to the participle It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be updated Dry candidate's official documents and correspondence;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
At least one above-mentioned technical solution that this specification embodiment uses can reach following the utility model has the advantages that by treating The content of text for rewriting official documents and correspondence is segmented, and is formed a number of participle, is utilized specified term (the i.e. entity of domain knowledge base The corresponding vocabulary of knowledge) it goes to rewrite different participles, to automatically generate the corresponding several candidate official documents and correspondences of official documents and correspondence to be rewritten, due to field Knowledge base is pre-established according to the content of text of corpus, and which includes a large amount of specified terms, and specified term is Vocabulary of the content of text of corpus with specific meanings (i.e. with business scenario meaning) in official documents and correspondence field, such as characterization industry The term for scene of being engaged in, candidate official documents and correspondence obtained in this way, not only substantial amounts, and also the word content of official documents and correspondence also will more enrich It is colorful, meet the needs of official documents and correspondence needs with novelty, diversity and can follow current hotspot, well in order to incite somebody to action These abundant in content colorful official documents and correspondences to user promote market when, it is easier to be attracted to user's note that be conducive to improve user Clicking rate and conversion ratio.
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification, for those of ordinary skill in the art, in not making the creative labor property Under the premise of, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart for official documents and correspondence Improvement that this specification embodiment provides.
The schematic diagram of investment endowment marketing official documents and correspondence in a kind of official documents and correspondence Improvement that Fig. 2 provides for this specification embodiment.
The rewriting result of investment endowment marketing official documents and correspondence in a kind of official documents and correspondence Improvement that Fig. 3 provides for this specification embodiment Schematic diagram.
The flow chart of domain knowledge base is established in a kind of official documents and correspondence Improvement that Fig. 4 provides for this specification embodiment.
The functional block diagram of domain knowledge base is established in a kind of official documents and correspondence Improvement that Fig. 5 provides for this specification embodiment Schematic diagram.
Fig. 6 is a kind of structural schematic diagram for official documents and correspondence re-writing device that this specification embodiment provides.
Fig. 7 is a kind of flow chart for official documents and correspondence update method that this specification embodiment provides.
The flow chart of train language model in a kind of official documents and correspondence update method that Fig. 8 provides for this specification embodiment.
The flow chart for updating marketing official documents and correspondence is rewritten in a kind of official documents and correspondence update method that Fig. 9 provides for this specification embodiment.
Figure 10 is a kind of structural schematic diagram for official documents and correspondence updating device that this specification embodiment provides.
Specific embodiment
In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation Attached drawing in book embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described Embodiment be merely a part but not all of the embodiments of the present application.Based on this specification embodiment, this field Those of ordinary skill's every other embodiment obtained without creative efforts, all should belong to the application The range of protection.
As Such analysis, in official documents and correspondence, often important content occurs from the content of text in official documents and correspondence banner, And the content of text of official documents and correspondence is updated at present, it usually needs it is updated by the new content of text of artificially generated official documents and correspondence, At high cost, efficiency is also low, and artificially generated official documents and correspondence, and official documents and correspondence quantity is few, and content is also not abundant enough, is difficult to attract the note of user Meaning.
Based on this, this specification embodiment provide a kind of official documents and correspondence rewrite with update method, device and equipment, pass through and rewrite text Case, the automatic rewriting of the content of text of Lai Shixian official documents and correspondence, also by realizing official documents and correspondence certainly using revised official documents and correspondence as candidate official documents and correspondence It is dynamic to update.Wherein, in official documents and correspondence rewriting, to the content of text of the official documents and correspondence to be rewritten got, if carrying out word segmentation processing to obtain Dry participle, rewrites participle according to the specified term in domain knowledge base, for example participle is replaced in domain knowledge base Specified term, be for another example inserted into specified term in the adjacent position of participle, according to the content of text after participle is rewritten, give birth to again It is not only large number of in this way by rewriting the candidate official documents and correspondence of acquisition at official documents and correspondence as candidate official documents and correspondence, and content also obtained it is rich Richness, to when carrying out promoting marketing to user as update official documents and correspondence, will be easier to inhale by abundant in content colorful candidate official documents and correspondence Guide to user's note that be conducive to improve user clicking rate and conversion ratio.
Below in conjunction with attached drawing, the technical scheme provided by various embodiments of the present application will be described in detail.
Fig. 1 is a kind of flow chart for official documents and correspondence Improvement that this specification embodiment provides.
As shown in Figure 1, the official documents and correspondence Improvement may include following steps:
Step S102, the content of text of official documents and correspondence to be rewritten is obtained.
In specific implementation, official documents and correspondence to be rewritten may include the official documents and correspondence of marketing to be promoted, and can also be and is promoting marketing In official documents and correspondence, for example in promoting the function of marketing a APP (application program), the content of text of initial official documents and correspondence usually comes The APP details about the APP function characteristics filled in from developer or developer, the details stress to show the core for having APP At this moment heart function just will include the official documents and correspondence of APP details, carry out in the advertisement position being placed in the media such as Website page, the APP page Marketing is promoted, to attract the attention of user by the content of text shown, read, or even clicks the link of official documents and correspondence, into Enter specific marketing instruction page etc..
It should be noted that the word content that can also include from official documents and correspondence obtains out content of text, it is not reinflated here Explanation.
Step S104, word segmentation processing is carried out to the content of text, obtains several participles.
In specific implementation, word segmentation processing is carried out using dictionary, wherein dictionary can be the dictionary manually established, and dictionary can To be the dictionary for having collected the vocabulary for needing to be rewritten official documents and correspondence fields, it can quickly and accurately be treated change by dictionary in this way The content of text for writing official documents and correspondence is matched, and several participles are marked off, and participle obtained so also has with official documents and correspondence fields Biggish correlation.
In some embodiments, can also head and the tail identifier be added to the participle, head and the tail identifier can be passed through in this way To reflect position of the participle in sentence, moreover it is possible to reflect the relationship of the participle with front and back participle, it is subsequent so as to be convenient for Rewriting operation.
Step S106, according to the specified term pair of at least one of the domain knowledge base of the official documents and correspondence fields to be rewritten Several participles are rewritten.
Wherein, the domain knowledge base is the knowledge pre-established according to the specified term in the text data of corpus Library, the specified term are the term in official documents and correspondence field with business scenario meaning.
It should be noted that being also commonly referred to as the reality in the field with the term of business scenario meaning in official documents and correspondence fields The corresponding term of body knowledge, wherein entity mobility models are the knowledge of the entity in field with specific meanings, so specified term can For the corresponding word of entity mobility models in field;One or more participle is rewritten in addition, can be, when being multiple When participle is rewritten, with regard to carrying out rewriting participle using corresponding excessively a specified term.
For example, the corresponding specified term of entity mobility models may include having in marketing domain: festivals or holidays are (such as: National Day, mid-autumn Section etc.), popular cyberspeak (such as: pricking the old iron of the heart, vertical flag), marketing hot word (such as: also not fastly action, it is slow with one's hands without) Deng.
It in some embodiments, can be to the participle for obtaining out from the content of text of official documents and correspondence to be rewritten, using replacement side Formula, which is realized, rewrites, that is to say, that can be by domain knowledge base, with participle accessed from the content of text of official documents and correspondence to be rewritten The term (i.e. specified term) of corresponding entity mobility models, the directly accessed participle of replacement, and then obtain in new text Hold, not only remained the meaning of original text content in this way, but also the similar core meaning can be expressed by new content of text, enriches The expression way of content.
For example, the similar corresponding marketing hot word " not taking action fastly also " of the meaning at this moment can be used when segmenting as " quickly take action ", Or the replacement such as " quick-moving to have ", " slow with one's hands nothing ".
It in some embodiments, can also be to the participle for obtaining out from the content of text of official documents and correspondence to be rewritten, using slotting Enter mode and realize rewriting, that is to say, that can be and accessed from the content of text of official documents and correspondence to be rewritten by domain knowledge base Participle has the corresponding term of entity mobility models being closely connected, and is inserted into the anterior/posterior position of accessed participle, and then obtain To new content of text.
For example, current date has closed on National Day, and the marketing official documents and correspondence traveled is want to carry out favorable sale before festivals or holidays At this moment National Day, the corresponding term of preferential movable entity mobility models etc. can be inserted into the content of text of tourism official documents and correspondence by activity In, specifically it is inserted into the adjacent position accordingly segmented.
It in some embodiments, can also be to the participle for obtaining out from the content of text of official documents and correspondence to be rewritten, using deleting The mode of subtracting realizes rewriting, that is to say, that can according to the specified term in domain knowledge base, will with entity mobility models correlation (such as Similarity) low word, it is deleted away from participle, to only be left have the participle being closely connected with entity mobility models.
In some embodiments, it can also be the term of the related entity mobility models according to domain knowledge base, to extract The corresponding term of entity mobility models in the content of text of official documents and correspondence to be rewritten, then again using the term extracted as described point Word not only ensure that participle was closely related with official documents and correspondence in this way, but also can guarantee the quantity of participle in suitable range.
It in some embodiments, can also be according to the corresponding head and the tail marker character of the participle, to rewrite the participle, specifically The rewrite methods such as replacement, insertion can be used in implementation, which is not described herein again.
Step S108, the revised content of text is carried out according to the different participles, generates the text to be rewritten The corresponding several candidate official documents and correspondences of case.
Word segmentation processing, shape are carried out to the content of text of the official documents and correspondence to be rewritten got according to above-mentioned steps S102~S108 At a number of participle, then go to rewrite participle using at least one specified term of domain knowledge base, and then to difference It, will be corresponding to automatically generate official documents and correspondence to be rewritten according to the obtained different content of text of the different participles of rewriting after participle is rewritten Several candidate's official documents and correspondences, since domain knowledge base is pre-established according to the content of text of corpus, which includes official documents and correspondence necks The corresponding term of a large amount of entity mobility models in domain, and entity mobility models are knowing in entity of the official documents and correspondence fields with specific meanings Know, specified content of text of the term from corpus, candidate official documents and correspondence obtained in this way, not only substantial amounts, but also official documents and correspondence Word content also will be more rich and varied, meet official documents and correspondence well and need with novelty, diversity and can follow current The demand of hot spot, in order to when these abundant in content colorful official documents and correspondences are promoted marketing to user, it is easier to be attracted to user Note that being conducive to improve user's clicking rate and conversion ratio.
For ease of understanding, here to show that small rated throws the marketing of pension fund by the corset advertisement position of APP in mobile phone For official documents and correspondence, it is illustrated.
If the marketing official documents and correspondence (official documents and correspondence i.e. to be rewritten) of a small amount investment endowment, as shown in Fig. 2, at this moment content of text It mainly include theme, small rated is thrown and its illustration of income.Wherein, theme be " investment support parents-allow future life more It is fine ";The exemplary contents that small rated is thrown include " insurer: Mr. Wang, 30 years old, 10,000 yuan of monthly income, retirement in 60 years old ", " fixed to throw Scheme: throw 100 yuan of pension fund surely weekly to retirement ", " about 56.4 ten thousand yuan of pension fund estimated revenue: can be accumulated ", " capital 14.4 ten thousand ", " income 420,000 ".
It is assumed that the corresponding specified term of entity mobility models in domain knowledge base, includes that small rated throws related word of supporting parents Converge, for example, throw surely the amount of money (such as how many member), throw the time surely (such as daily, by week, monthly, per year etc.), throw surely duration (such as 10 years, 20 years, 30 years etc.), income description (such as how many member that expire, add up after how many member, retirement up to how many member), marketing word (for example go to look at, hasten to look at) etc., at this moment can be drawn into the corresponding word of following entity mobility models from content of text As participle: " investment ", " endowment ", " fixed to throw ", " 30 years old ", " retirement in 60 years old ", " 100 yuan ", " retirement ", " can accumulate at " weekly " It is tired ", " pension fund ", " 56.4 ten thousand yuan " etc..
But in view of the advertisement position of APP corset position (position of dotted line frame as shown in Figure 3) in mobile phone, usually require that displaying The content of text of official documents and correspondence wants short and small, refining, and can attract the user's attention, at this moment when rewriting to participle, so that it may same Shi Caiyong such as replaces, is inserted into, deleting at a variety of rewrite methods, for example the participles such as " investment ", " endowment ", " pension fund " are deleted, Then " retirement " is replaced with " after 30 years ", " can accumulate " replaces with " can become ", " 56.4 ten thousand yuan " replace with " how many ", will also " going to look at " is inserted into the corresponding position of participle and (is inserted into rearmost position here), one generated candidate official documents and correspondence, and Candidate's official documents and correspondence APP corset position advertisement position show as a result, can the part as shown in dotted line frame in Fig. 3.In this way by changing After writing, not only content of text is shorter and smaller, and the core meaning is also more prominent, thus be easier to be attracted to user's note that in turn with Family can enter the details page of official documents and correspondence after clicking, such as using the official documents and correspondence in earlier figures 2 as a part of details page.
Fig. 4 is in a kind of official documents and correspondence Improvement that this specification embodiment provides, according in the text of the official documents and correspondence of corpus Hold, establishes the flow chart of domain knowledge base.
As shown in figure 4, being pre-established according to the content of text of the official documents and correspondence of corpus described in the official documents and correspondence Improvement The step of domain knowledge base, it may include:
Step S202, text data is obtained from corpus.
Wherein, corpus can be the database being made of several text datas;Each text data can be one it is pure Text;Text data can be to collect by big data (for example network crawls) and obtain, and can also be by artificially collecting It arrives.
Specifically, corpus may include internal corpus and/or external corpus.Wherein internal corpus is as nomenclature Expect library, dedicated corpus may include the corpus established according to the text data of business scenario accumulation, such as text data at this time Data including each applied business scene relevant to official documents and correspondence fields to be rewritten;External corpus is as general corpus Library, general corpus may include the corpus established according to the text data that crawls at this time, for example, text data may include from The data that the external channel such as Baidupedia, Chinese wiki (Wiki) encyclopaedia, news corpus, search engine is collected.
Step S204, from the specified term extracted in the text data in official documents and correspondence field with business scenario meaning.
It, can be by name entity identification algorithms and/or regular expression, to be taken out from the text data in specific implementation Take the specified term in official documents and correspondence field with business scenario meaning, that is to say, that by the corresponding finger of entity mobility models in field Determine term (also can refer to dedicated word) to extract as specified term.
Step S206, obtained specified term will be extracted and establishes knowledge base, as the domain knowledge base.
According to the entity mobility models in the text data of corpus, domain knowledge base, such domain knowledge base are pre-established out It just deposits in disk, when rewriting official documents and correspondence, so that it may directly go to change according to the corresponding term of entity mobility models in domain knowledge base Different participles is write, to obtain different candidate official documents and correspondences.
In some embodiments, the knowledge base established according to obtained specified term is extracted can also further sieved After choosing, after also being merged with the knowledge base that dependence artificially collects maintenance, it is re-used as the domain knowledge base.Specifically, step S206 will extract obtained specified term and establish knowledge base, may also include that as the step of domain knowledge base will extract Obtained specified term establishes knowledge base;The knowledge base of foundation is screened;By after screening knowledge base with by manually receiving After the knowledge base that collection obtains is merged, as the domain knowledge base.
It should be noted that screening here, can be and screened according to default rule, it is also possible to by rule Then after preliminary screening, then by artificial further screening;Fusion may include the duplicate removal processings such as deleting, merging.
Fig. 5 is the schematic diagram for establishing the functional block diagram of domain knowledge base.
As shown in figure 5, being primarily based on the text corpus in inside and outside portion, then by name entity extraction algorithm, extract The corresponding term of the entity mobility models of official documents and correspondence fields, finally using the corresponding term of these entity mobility models establish out knowledge base as Domain knowledge base, to can quickly and accurately get the corresponding specified use of participle from domain knowledge base when rewriting participle Language.
Based on the same inventive concept, this specification embodiment also provide device for rewriting official documents and correspondence, electronic equipment with And nonvolatile computer storage media.
Detailed description had been carried out to the official documents and correspondence Improvement in view of in previous embodiment, to dress in following example It sets, corresponding contents involved in equipment and nonvolatile computer storage media will not be described in great detail.
Fig. 6 is a kind of structural schematic diagram for official documents and correspondence re-writing device that this specification provides, and wherein dashed rectangle indicates optional Module.
As shown in fig. 6, official documents and correspondence re-writing device 10 includes obtaining module 11, word segmentation module 12, rewrite module 13 and generating mould Block 14.Wherein, the content of text that module 11 is used to obtain official documents and correspondence to be rewritten is obtained;Word segmentation module 12 is used for the content of text Word segmentation processing is carried out, several participles are obtained;Rewrite the domain knowledge that module 13 is used for the official documents and correspondence fields to be rewritten according to At least one of library specifies term to rewrite the participle, and the domain knowledge base is the text data according to corpus In the knowledge base that pre-establishes of specified term, the specified term is the use in official documents and correspondence field with business scenario meaning Language;Generation module 14 is used to carry out the revised content of text according to the different participles, generates the text to be rewritten The corresponding several candidate official documents and correspondences of case.
Optionally, at least one of the domain knowledge base that module 13 is used for the official documents and correspondence fields to be rewritten described in is rewritten Specified term replaces the participle;
And/or it rewrites module 13 and is used at least one of the domain knowledge base of the official documents and correspondence fields to be rewritten Specified term is inserted into the adjacent position of the participle.
Optionally, official documents and correspondence re-writing device 10 further includes domain knowledge library module 15, and wherein domain knowledge library module 15 is used for By execution following steps, the domain knowledge base is established:
Text data is obtained from corpus;
From the specified term extracted in the text data in official documents and correspondence field with business scenario meaning;
Obtained specified term will be extracted and establish knowledge base, as the domain knowledge base.
Optionally, described from the specified term extracted in the text data in official documents and correspondence field with business scenario meaning It include: from extracting in the text data there is industry in official documents and correspondence field using name entity extraction algorithm and/or regular expression The specified term for scene meaning of being engaged in.
Optionally, the obtained specified term that will extract establishes knowledge base, includes: that will take out as the domain knowledge base The specified term obtained establishes knowledge base;The knowledge base of foundation is screened;By the knowledge base after screening and by artificial After the knowledge base that collection obtains is merged, as the domain knowledge base.
Optionally, word segmentation module 12 is used to extract several specified terms in the content of text, will be described several specified Term is as several participles.
Optionally, the corpus includes dedicated corpus and/or general corpus, wherein the dedicated corpus packet The corpus established according to the text data of business scenario accumulation is included, the general corpus includes according to the textual data crawled According to the corpus of foundation.
Optionally, head and the tail identifier also is added to the participle after obtaining several participles in word segmentation module 12.
This specification embodiment also provides a kind of for rewriting the electronic equipment of official documents and correspondence, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to the participle It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be rewritten Dry candidate's official documents and correspondence.
This specification embodiment also provides a kind of for rewriting the nonvolatile computer storage media of official documents and correspondence, is stored with meter Calculation machine executable instruction, the computer executable instructions setting are as follows:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to the participle It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be rewritten Dry candidate's official documents and correspondence.
Based on the same inventive concept, this specification embodiment also provides a kind of official documents and correspondence update method, device and equipment.
Fig. 7 is that this illustrates a kind of flow chart for official documents and correspondence update method that embodiment provides.
As shown in fig. 7, the official documents and correspondence update method can comprise the following steps that
Step S302, the content of text of official documents and correspondence to be updated is obtained.
Step S304, word segmentation processing is carried out to the content of text, obtains several participles.
Step S306, term pair is specified according at least one of domain knowledge base of the official documents and correspondence fields to be updated The participle is rewritten.
Wherein, the domain knowledge base is the knowledge pre-established according to the specified term in the text data of corpus Library, the specified term are the term in official documents and correspondence field with business scenario meaning.
Step S308, the revised content of text is carried out according to the different participles, generates the text to be updated The corresponding several candidate official documents and correspondences of case.
Step S310, it is given a mark based on language model to the candidate official documents and correspondence.
Wherein, language model be used for determine sentence probability model, wherein the probability of sentence be represented by P (W1, W2 ..., Wk), W1, W2 ..., Wk be sentence in each word segment.
Pass through language model in this way, so that it may determine the possibility degree that each word segment occurs in official documents and correspondence, and occur When in official documents and correspondence sentence, the whole clear and coherent degree of sentence, and then according to the clear and coherent degree of sentence, so that it may determine entire candidate official documents and correspondence Clear and coherent degree.
Specifically, can each candidate official documents and correspondence be input in language model, is counted using speech model to sentence probability It calculates, and using sentence probability as marking result.In this way, score is higher, a possibility that indicating the sentence physical presence bigger, the time Selection case more meets the expression of natural language, that is to say, that sentence score is higher, and the fluency of sentence is higher, and grammer is also got over Correctly, the candidate official documents and correspondence that such sentence is constituted, quality are also higher.
Step S312, the candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
In specific implementation, more new strategy can be preset according to actual needs, for example more new strategy may include that the frequency updates Strategy, candidate official documents and correspondence Selection Strategy.
In specific implementation, one or several candidate official documents and correspondences outputs of marking higher (i.e. quality is higher) can be used for conduct Update the alternative official documents and correspondence of official documents and correspondence.It specifically, can be according to marking as a result, choosing the higher candidate official documents and correspondence of rewriting of marking as final Arithmetic result, according to marking as a result, press S*=ar gmaxi∈[1,T]Si, determine each candidate official documents and correspondence SiPut in order, with Convenient for selecting alternative official documents and correspondence according to more new strategy.
It should be noted that the official documents and correspondence Improvement in the content and previous embodiment of above-mentioned steps S302~S308 Step S102~S108 content it is close, which is not described herein again.
S302~S312 through the above steps passes through language model, it may be determined that is each generated out by statement sequence The sentence probability of candidate's marketing official documents and correspondence, and then according to the probability, it gives a mark to candidate official documents and correspondence, such score is bigger, indicates the marketing Official documents and correspondence more meets natural language expressing, and quality is also higher, and then goes to update using the higher official documents and correspondence of quality, it is easier to be attracted to use Family note that be conducive to improve user clicking rate and conversion ratio.
In this specification embodiment, n-gram language model is can be used in language model, and neural network language mould can also be used Type (NNLM), and then be trained using the text data in corpus.
In some embodiments, as shown in figure 8, following steps can be used in language model is trained:
Step S402, text data is obtained from corpus.
In specific implementation, corpus can be only with general corpus, since the text data of general corpus is by big Data are crawled and are collected into, and the language message for including is richer, thus established using the text data of general corpus and More accuracy, is predicted the marking result for obtaining candidate official documents and correspondence by the language model that training obtains.
Step S404, word segmentation processing is carried out to the text data.
Here word segmentation processing can refer to preceding description, and which is not described herein again.
Step S406, the first identifier is added to the participle.
By the first identifier, to obtain term (lexical item) sequence of every text, in order to go to instruct according to term sequence Practice language model.
Step S408, the participle that the first identifier is added is used for train language model.
According to above-mentioned steps S402~S408, by adding in participle using a large amount of text data in general corpus Enter the first identifier, to obtain each term sequence, and these term sequence train language models is utilized, due to general corpus The text data in library is that acquisition is crawled by big data, and not only these text datas more meet the expression of natural-sounding, but also Data are also richer, to will be preferably applied to the score of predicting candidate official documents and correspondence after language model is trained based on these data.
For ease of understanding, principle explanation is carried out to application of the official documents and correspondence update method in marketing official documents and correspondence below.
As shown in figure 9, constructing domain knowledge base by step S1~S3 first, and trained by step S4~S6 Language model can thus complete the rewriting and update of marketing official documents and correspondence by step S7~S11.
Specifically, step S1~S11 is briefly described as follows:
S1. text data is obtained from inside and outside portion's text corpus;
S2. from the corresponding knowledge of each entity in field is extracted in text data, it is corresponding specified that entity mobility models are formed Term;
S3. obtained specified term will be extracted and establish knowledge base, as domain knowledge base;
S4. text data is obtained from external text corpus;
S5. text data train language model is utilized, wherein language model uses n-gram language model, in this way in training Before, by the way that text data is carried out term sequence section, and then term sequence is used to train n-gram language model;
S6. language model is obtained;
S7. content of text is obtained from the initial marketing official documents and correspondence of input (official documents and correspondence i.e. to be updated);
S8. word segmentation processing, such as word or term sequence cutting, the processing such as addition head and the tail identifier are carried out to content of text;
S9. based on the corresponding term of entity mobility models in domain knowledge base, participle is rewritten, and then generates different times Selection case;
S10. it is given a mark based on language model to candidate official documents and correspondence;
When to the marking of candidate official documents and correspondence, sentence probability can be pressedIt determines, Wherein, P (S) indicates that the probability of sentence, T indicate the length of term sequence, WiIndicate i-th of term sequence.
S11. based on marking as a result, choosing the candidate official documents and correspondence of quality higher (i.e. score is larger) as more according to more new strategy Official documents and correspondence newly is exported.
Detailed description had been carried out to the official documents and correspondence update method in view of in previous embodiment, to official documents and correspondence in following example Corresponding contents involved in updating device, equipment and nonvolatile computer storage media will not be described in great detail.
Figure 10 is that this illustrates a kind of structural schematic diagram for official documents and correspondence updating device that embodiment provides, and wherein dashed rectangle indicates Optional module.
As shown in Figure 10, official documents and correspondence updating device 20 includes obtaining module 21, word segmentation module 22, rewriting module 23, generate mould Block 24, scoring modules 25 and update module 26.Wherein, the content of text that module 21 is used to obtain official documents and correspondence to be updated is obtained;Participle Module 22 is used to carry out word segmentation processing to the content of text, obtains several participles;Module 23 is rewritten to be used for according to described to more The specified term of at least one of the domain knowledge base of new official documents and correspondence fields rewrites the participle, the domain knowledge Library is the knowledge base pre-established according to the specified term in the text data of corpus, and the specified term is in official documents and correspondence field In with business scenario meaning term;Generation module 24 is used to carry out the revised text according to the different participles Content generates the corresponding several candidate official documents and correspondences of the official documents and correspondence to be updated;Scoring modules 25 are used for based on language model to the time The marking of selection case;The candidate official documents and correspondence after update module 26 is used to give a mark presses preset more new strategy, updates described to more New official documents and correspondence.
Optionally, official documents and correspondence updating device 20 further includes training module 17.Wherein, training module 17 is used for following by executing Step, the training language model:
Text data is obtained from corpus;
Word segmentation processing is carried out to the text data;
The first identifier is added to the participle;
The participle that the first identifier is added is used for train language model.
It is a kind of for updating the electronic equipment of official documents and correspondence that this illustrates that embodiment also provides, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be updated specifies term to the participle It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be updated Dry candidate's official documents and correspondence;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
This specification embodiment also provides a kind of for updating the nonvolatile computer storage media of official documents and correspondence, is stored with meter Calculation machine executable instruction, the computer executable instructions setting are as follows:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be updated specifies term to the participle It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be updated Dry candidate's official documents and correspondence;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can With or may be advantageous.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device, For equipment, nonvolatile computer storage media embodiment, since it is substantially similar to the method embodiment, so the ratio of description Relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
Device that this specification embodiment provides, equipment, nonvolatile computer storage media with method be it is corresponding, because This, device, equipment, nonvolatile computer storage media also have the advantageous effects similar with corresponding method, due to upper Face is described in detail the advantageous effects of method, therefore, which is not described herein again corresponding intrument, equipment, it is non-easily The advantageous effects of the property lost computer storage medium.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable LogicDevice, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc. Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims (20)

1. a kind of official documents and correspondence Improvement, comprising:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to carry out the participle It rewrites, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, the finger Determining term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, generates the corresponding several times of the official documents and correspondence to be rewritten Selection case.
2. official documents and correspondence Improvement as described in claim 1, the domain knowledge of the official documents and correspondence fields to be rewritten according to Term is specified to carry out rewriting to the participle at least one of library
At least one of domain knowledge base with the official documents and correspondence fields to be rewritten specifies term to replace the participle;
And/or the specified term of at least one of domain knowledge base of the official documents and correspondence fields to be rewritten is inserted into described The adjacent position of participle.
3. official documents and correspondence Improvement as described in claim 1 is pre-established according to the specified term in the text data of corpus The step of domain knowledge base includes:
Text data is obtained from corpus;
From the specified term extracted in the text data in official documents and correspondence field with business scenario meaning;
Obtained specified term will be extracted and establish knowledge base, as the domain knowledge base.
4. official documents and correspondence Improvement as claimed in claim 3, described to have in official documents and correspondence field from extraction in the text data The specified term of business scenario meaning includes: using name entity extraction algorithm and/or regular expression, from the text data Middle extraction is in official documents and correspondence field with the specified term of business scenario meaning.
5. official documents and correspondence Improvement as claimed in claim 3, the corpus includes dedicated corpus and/or general corpus, Wherein, the dedicated corpus includes the corpus established according to the text data of business scenario accumulation, the general corpus Including the corpus established according to the text data crawled.
6. official documents and correspondence Improvement as claimed in claim 3, the obtained specified term that will extract establishes knowledge base, as institute The step of stating domain knowledge base include:
Obtained specified term will be extracted and establish knowledge base;
The knowledge base of foundation is screened;
After knowledge base after screening is merged with the knowledge base by artificially collecting, as the domain knowledge base.
7. official documents and correspondence Improvement as described in claim 1, described to carry out word segmentation processing to the content of text, several points are obtained Word includes:
Extract several specified terms in the content of text;
Using several specified terms as several participles.
8. official documents and correspondence Improvement as described in claim 1, after obtaining several participles, the official documents and correspondence Improvement further include: Head and the tail identifier is added to the participle.
9. a kind of official documents and correspondence update method, comprising:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
Term is specified to rewrite the participle according at least one of domain knowledge base belonging to the official documents and correspondence to be updated, The domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, the specified term To have the term of business scenario meaning in official documents and correspondence field;
The revised content of text is carried out according to the different participles, generates the corresponding several times of the official documents and correspondence to be updated Selection case;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
10. official documents and correspondence update method as claimed in claim 9, the language model is obtained by following steps training:
Text data is obtained from corpus;
Word segmentation processing is carried out to the text data;
The first identifier is added to the participle;
The participle that the first identifier is added is used for train language model.
11. a kind of official documents and correspondence re-writing device, including obtain module, word segmentation module, rewrite module and generation module;
The content of text for obtaining module and being used to obtain official documents and correspondence to be rewritten;
The word segmentation module is used to carry out word segmentation processing to the content of text, obtains several participles;
The module of rewriting is used to use according at least one of the domain knowledge base of the official documents and correspondence fields to be rewritten is specified Language rewrites the participle, and the domain knowledge base is to be pre-established according to the specified term in the text data of corpus Knowledge base, the specified term be in official documents and correspondence field with business scenario meaning term;
The generation module is used to carry out the revised content of text according to the different participles, generates described wait rewrite The corresponding several candidate official documents and correspondences of official documents and correspondence.
12. official documents and correspondence re-writing device as claimed in claim 11, the rewriting module is used for described wait rewrite neck belonging to official documents and correspondence At least one of the domain knowledge base in domain specifies term to replace the participle;
And/or the rewriting module is used to refer at least one of domain knowledge base of the official documents and correspondence fields to be rewritten Determine the adjacent position that term is inserted into the participle.
13. official documents and correspondence re-writing device as claimed in claim 11, the official documents and correspondence re-writing device further includes domain knowledge library module, institute State domain knowledge library module for by execution following steps, establish the domain knowledge base:
Text data is obtained from corpus;
From the specified term extracted in the text data in official documents and correspondence field with business scenario meaning;
Obtained specified term will be extracted and establish knowledge base, as the domain knowledge base.
14. official documents and correspondence re-writing device as claimed in claim 13, described to have in official documents and correspondence field from extraction in the text data The specified term for having business scenario meaning includes: using name entity extraction algorithm and/or regular expression, from the textual data According to middle extraction with the specified term of business scenario meaning in official documents and correspondence field.
15. official documents and correspondence re-writing device as claimed in claim 13, it is described will extract obtained specified term establish knowledge base as The domain knowledge base includes:
Obtained specified term will be extracted and establish knowledge base;
The knowledge base of foundation is screened;
After knowledge base after screening is merged with the knowledge base by artificially collecting, as the domain knowledge base.
16. official documents and correspondence re-writing device as claimed in claim 11, if the word segmentation module is for extracting in the content of text Dry specified term, and using several specified terms as several participles.
17. a kind of official documents and correspondence updating device, including obtains module, word segmentation module, rewrites module, generation module, scoring modules and more New module;
The content of text for obtaining module and being used to obtain official documents and correspondence to be updated;
The word segmentation module is used to carry out word segmentation processing to the content of text, obtains several participles;
The rewriting module is for the specified term pair of at least one of domain knowledge base according to belonging to the official documents and correspondence to be updated The participle is rewritten, and the domain knowledge base is to be known according to what the specified term in the text data of corpus pre-established Know library, the specified term is the term in official documents and correspondence field with business scenario meaning;
The generation module is used to carry out the revised content of text according to the different participles, generates described to be updated The corresponding several candidate official documents and correspondences of official documents and correspondence;
The scoring modules are used to give a mark to the candidate official documents and correspondence based on language model;
The candidate official documents and correspondence after the update module is used to give a mark presses preset more new strategy, updates the text to be updated Case.
18. official documents and correspondence updating device as claimed in claim 17, the official documents and correspondence updating device further includes training module;The training Module is used for by executing following steps, the training language model:
Text data is obtained from corpus;
Word segmentation processing is carried out to the text data;
The first identifier is added to the participle;
The participle that the first identifier is added is used for train language model.
19. a kind of for rewriting the electronic equipment of official documents and correspondence, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one Manage device execute so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to carry out the participle It rewrites, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, the finger Determining term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, generates the corresponding several times of the official documents and correspondence to be rewritten Selection case.
20. a kind of for updating the electronic equipment of official documents and correspondence, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one Manage device execute so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
Term is specified to rewrite the participle according at least one of domain knowledge base belonging to the official documents and correspondence to be updated, The domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, the specified term To have the term of business scenario meaning in official documents and correspondence field;
The revised content of text is carried out according to the different participles, generates the corresponding several times of the official documents and correspondence to be updated Selection case;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
CN201910455177.3A 2019-05-29 2019-05-29 Method, device and equipment for rewriting and updating file Active CN110245350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910455177.3A CN110245350B (en) 2019-05-29 2019-05-29 Method, device and equipment for rewriting and updating file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910455177.3A CN110245350B (en) 2019-05-29 2019-05-29 Method, device and equipment for rewriting and updating file

Publications (2)

Publication Number Publication Date
CN110245350A true CN110245350A (en) 2019-09-17
CN110245350B CN110245350B (en) 2023-04-07

Family

ID=67885375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910455177.3A Active CN110245350B (en) 2019-05-29 2019-05-29 Method, device and equipment for rewriting and updating file

Country Status (1)

Country Link
CN (1) CN110245350B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476003A (en) * 2020-03-12 2020-07-31 支付宝(杭州)信息技术有限公司 Lyric rewriting method and device
CN112487151A (en) * 2020-12-14 2021-03-12 深圳市欢太科技有限公司 File generation method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332217A1 (en) * 2009-06-29 2010-12-30 Shalom Wintner Method for text improvement via linguistic abstractions
CN106446162A (en) * 2016-09-26 2017-02-22 浙江大学 Orient field self body intelligence library article search method
US20180018576A1 (en) * 2016-07-12 2018-01-18 International Business Machines Corporation Text Classifier Training
CN108228665A (en) * 2016-12-22 2018-06-29 阿里巴巴集团控股有限公司 Determine object tag, the method and device for establishing tab indexes, object search
US20180373691A1 (en) * 2017-06-26 2018-12-27 International Business Machines Corporation Identifying linguistic replacements to improve textual message effectiveness
CN109766537A (en) * 2019-01-16 2019-05-17 北京未名复众科技有限公司 Study abroad document methodology of composition, device and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332217A1 (en) * 2009-06-29 2010-12-30 Shalom Wintner Method for text improvement via linguistic abstractions
US20180018576A1 (en) * 2016-07-12 2018-01-18 International Business Machines Corporation Text Classifier Training
CN106446162A (en) * 2016-09-26 2017-02-22 浙江大学 Orient field self body intelligence library article search method
CN108228665A (en) * 2016-12-22 2018-06-29 阿里巴巴集团控股有限公司 Determine object tag, the method and device for establishing tab indexes, object search
US20180373691A1 (en) * 2017-06-26 2018-12-27 International Business Machines Corporation Identifying linguistic replacements to improve textual message effectiveness
CN109766537A (en) * 2019-01-16 2019-05-17 北京未名复众科技有限公司 Study abroad document methodology of composition, device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐艳华等: "基于LDA模型的HSK作文生成", 《数据分析与知识发现》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476003A (en) * 2020-03-12 2020-07-31 支付宝(杭州)信息技术有限公司 Lyric rewriting method and device
CN112487151A (en) * 2020-12-14 2021-03-12 深圳市欢太科技有限公司 File generation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110245350B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN112417880B (en) Automatic case information extraction method for court electronic files
CN108108426B (en) Understanding method and device for natural language question and electronic equipment
CN104572616B (en) The definite method and apparatus of Text Orientation
CN107622050A (en) Text sequence labeling system and method based on Bi LSTM and CRF
CN109858010A (en) Field new word identification method, device, computer equipment and storage medium
CN111582241A (en) Video subtitle recognition method, device, equipment and storage medium
CN107729309A (en) A kind of method and device of the Chinese semantic analysis based on deep learning
CN104933039A (en) Entity link system for language lacking resources
CN106886580A (en) A kind of picture feeling polarities analysis method based on deep learning
CN105005616B (en) Method and system are illustrated based on the text that textual image feature interaction expands
CN106897559A (en) A kind of symptom and sign class entity recognition method and device towards multi-data source
CN112836487B (en) Automatic comment method and device, computer equipment and storage medium
CN109992771A (en) A kind of method and device of text generation
CN110245350A (en) Official documents and correspondence is rewritten and update method, device and equipment
Li et al. HWOBC-a handwriting oracle bone character recognition database
CN107844476A (en) A kind of part-of-speech tagging method of enhancing
CN107807958A (en) A kind of article list personalized recommendation method, electronic equipment and storage medium
CN113705733A (en) Medical bill image processing method and device, electronic device and storage medium
CN110263151A (en) A kind of enigmatic language justice learning method towards multi-angle of view multi-tag data
CN110119401A (en) Processing method, device, server and the storage medium of user's portrait
CN108804472A (en) A kind of webpage content extraction method, device and server
CN110046231A (en) A kind of customer service information processing method, server and system
CN106227770A (en) A kind of intelligentized news web page information extraction method
CN108875743A (en) A kind of text recognition method and device
CN107273546A (en) Counterfeit application detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240228

Address after: 128 Meizhi Road, Guohao Times City # 20-01, Singapore 189773

Patentee after: Advanced Nova Technology (Singapore) Holdings Ltd.

Country or region after: Singapore

Address before: 27 Hospital Road, George Town, Grand Cayman ky1-9008

Patentee before: Innovative advanced technology Co.,Ltd.

Country or region before: Cayman Islands

TR01 Transfer of patent right