Official documents and correspondence is rewritten and update method, device and equipment
Technical field
This specification is related to field of computer technology more particularly to a kind of official documents and correspondence is rewritten and update method, device and equipment.
Background technique
Promoted in marketing in official documents and correspondence, for example, appear in webpage, in APP (application program) advertisement exhibition unit advertisement, because of user
It is easy to ad content fatigue, this just needs the content to the official documents and correspondence for promoting marketing to timely update, and updated official documents and correspondence
Be also required to novelty, diversity and current hotspot can be followed, can just attract the user's attention in this way so that user go it is clear
It lookes at, even click the official documents and correspondence that the popularization is marketed, to improve user's clicking rate and conversion ratio.
And in official documents and correspondence, often important content occurs from the content of text in official documents and correspondence banner.At present to official documents and correspondence
Content of text be updated, usually first by manual redesign and new content of text is edited out, then formed new
Official documents and correspondence is finally uploaded again and is updated by official documents and correspondence, and at high cost in this way, efficiency is also low, and artificially generated official documents and correspondence, and official documents and correspondence quantity is few, interior
Hold and also not enough enrich, is difficult to attract the attention of user.
Summary of the invention
In view of this, this specification embodiment provides a kind of official documents and correspondence Improvement, device and equipment, for according to initial
Official documents and correspondence (official documents and correspondence i.e. to be rewritten) is rewritten automatically, and to form candidate official documents and correspondence, not only quantity is more for official documents and correspondence candidate in this way, content
It can be rich and varied;This specification embodiment additionally provides a kind of official documents and correspondence update method, device and equipment, initial for rewriting automatically
Official documents and correspondence (official documents and correspondence i.e. to be updated) forms candidate official documents and correspondence, and then scores candidate official documents and correspondence, and automatically update official documents and correspondence.
This specification embodiment adopts the following technical solutions:
This specification embodiment provides a kind of official documents and correspondence Improvement, comprising:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to the participle
It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute
Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be rewritten
Dry candidate's official documents and correspondence.
This specification embodiment also provides a kind of official documents and correspondence update method, comprising:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be updated specifies term to the participle
It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute
Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be updated
Dry candidate's official documents and correspondence;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
This specification embodiment also provides a kind of official documents and correspondence re-writing device, including obtain module, word segmentation module, rewrite module and
Generation module;
The content of text for obtaining module and being used to obtain official documents and correspondence to be rewritten;
The word segmentation module is used to carry out word segmentation processing to the content of text, obtains several participles;
The module of rewriting is used to be referred to according at least one of the domain knowledge base of the official documents and correspondence fields to be rewritten
Determine term to rewrite the participle, the domain knowledge base is preparatory according to the specified term in the text data of corpus
The knowledge base of foundation, the specified term are the term in official documents and correspondence field with business scenario meaning;
The generation module is used to carry out the revised content of text according to the different participles, generate it is described to
Rewrite the corresponding several candidate official documents and correspondences of official documents and correspondence.
This specification embodiment also provides a kind of official documents and correspondence updating device, including obtain module, word segmentation module, rewrite module,
Generation module, scoring modules and update module;
The content of text for obtaining module and being used to obtain official documents and correspondence to be updated;
The word segmentation module is used to carry out word segmentation processing to the content of text, obtains several participles;
The rewriting module is used to be referred to according at least one of domain knowledge base of the official documents and correspondence fields to be updated
Determine term to rewrite the participle, the domain knowledge base is preparatory according to the specified term in the text data of corpus
The knowledge base of foundation, the specified term are the term in official documents and correspondence field with business scenario meaning;
The generation module is used to carry out the revised content of text according to the different participles, generate it is described to
Update the corresponding several candidate official documents and correspondences of official documents and correspondence;
The scoring modules are used to give a mark to the candidate official documents and correspondence based on language model;
The candidate official documents and correspondence after the update module is used to give a mark presses preset more new strategy, updates described to be updated
Official documents and correspondence.
This specification embodiment also provides a kind of for rewriting the electronic equipment of official documents and correspondence, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
A processor executes so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to the participle
It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute
Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be rewritten
Dry candidate's official documents and correspondence.
This specification embodiment also provides a kind of for updating the electronic equipment of official documents and correspondence, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
A processor executes so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be updated specifies term to the participle
It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute
Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be updated
Dry candidate's official documents and correspondence;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
At least one above-mentioned technical solution that this specification embodiment uses can reach following the utility model has the advantages that by treating
The content of text for rewriting official documents and correspondence is segmented, and is formed a number of participle, is utilized specified term (the i.e. entity of domain knowledge base
The corresponding vocabulary of knowledge) it goes to rewrite different participles, to automatically generate the corresponding several candidate official documents and correspondences of official documents and correspondence to be rewritten, due to field
Knowledge base is pre-established according to the content of text of corpus, and which includes a large amount of specified terms, and specified term is
Vocabulary of the content of text of corpus with specific meanings (i.e. with business scenario meaning) in official documents and correspondence field, such as characterization industry
The term for scene of being engaged in, candidate official documents and correspondence obtained in this way, not only substantial amounts, and also the word content of official documents and correspondence also will more enrich
It is colorful, meet the needs of official documents and correspondence needs with novelty, diversity and can follow current hotspot, well in order to incite somebody to action
These abundant in content colorful official documents and correspondences to user promote market when, it is easier to be attracted to user's note that be conducive to improve user
Clicking rate and conversion ratio.
Detailed description of the invention
In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or
Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only
The some embodiments recorded in this specification, for those of ordinary skill in the art, in not making the creative labor property
Under the premise of, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of flow chart for official documents and correspondence Improvement that this specification embodiment provides.
The schematic diagram of investment endowment marketing official documents and correspondence in a kind of official documents and correspondence Improvement that Fig. 2 provides for this specification embodiment.
The rewriting result of investment endowment marketing official documents and correspondence in a kind of official documents and correspondence Improvement that Fig. 3 provides for this specification embodiment
Schematic diagram.
The flow chart of domain knowledge base is established in a kind of official documents and correspondence Improvement that Fig. 4 provides for this specification embodiment.
The functional block diagram of domain knowledge base is established in a kind of official documents and correspondence Improvement that Fig. 5 provides for this specification embodiment
Schematic diagram.
Fig. 6 is a kind of structural schematic diagram for official documents and correspondence re-writing device that this specification embodiment provides.
Fig. 7 is a kind of flow chart for official documents and correspondence update method that this specification embodiment provides.
The flow chart of train language model in a kind of official documents and correspondence update method that Fig. 8 provides for this specification embodiment.
The flow chart for updating marketing official documents and correspondence is rewritten in a kind of official documents and correspondence update method that Fig. 9 provides for this specification embodiment.
Figure 10 is a kind of structural schematic diagram for official documents and correspondence updating device that this specification embodiment provides.
Specific embodiment
In order to make those skilled in the art more fully understand the technical solution in this specification, below in conjunction with this explanation
Attached drawing in book embodiment is clearly and completely described the technical solution in this specification embodiment, it is clear that described
Embodiment be merely a part but not all of the embodiments of the present application.Based on this specification embodiment, this field
Those of ordinary skill's every other embodiment obtained without creative efforts, all should belong to the application
The range of protection.
As Such analysis, in official documents and correspondence, often important content occurs from the content of text in official documents and correspondence banner,
And the content of text of official documents and correspondence is updated at present, it usually needs it is updated by the new content of text of artificially generated official documents and correspondence,
At high cost, efficiency is also low, and artificially generated official documents and correspondence, and official documents and correspondence quantity is few, and content is also not abundant enough, is difficult to attract the note of user
Meaning.
Based on this, this specification embodiment provide a kind of official documents and correspondence rewrite with update method, device and equipment, pass through and rewrite text
Case, the automatic rewriting of the content of text of Lai Shixian official documents and correspondence, also by realizing official documents and correspondence certainly using revised official documents and correspondence as candidate official documents and correspondence
It is dynamic to update.Wherein, in official documents and correspondence rewriting, to the content of text of the official documents and correspondence to be rewritten got, if carrying out word segmentation processing to obtain
Dry participle, rewrites participle according to the specified term in domain knowledge base, for example participle is replaced in domain knowledge base
Specified term, be for another example inserted into specified term in the adjacent position of participle, according to the content of text after participle is rewritten, give birth to again
It is not only large number of in this way by rewriting the candidate official documents and correspondence of acquisition at official documents and correspondence as candidate official documents and correspondence, and content also obtained it is rich
Richness, to when carrying out promoting marketing to user as update official documents and correspondence, will be easier to inhale by abundant in content colorful candidate official documents and correspondence
Guide to user's note that be conducive to improve user clicking rate and conversion ratio.
Below in conjunction with attached drawing, the technical scheme provided by various embodiments of the present application will be described in detail.
Fig. 1 is a kind of flow chart for official documents and correspondence Improvement that this specification embodiment provides.
As shown in Figure 1, the official documents and correspondence Improvement may include following steps:
Step S102, the content of text of official documents and correspondence to be rewritten is obtained.
In specific implementation, official documents and correspondence to be rewritten may include the official documents and correspondence of marketing to be promoted, and can also be and is promoting marketing
In official documents and correspondence, for example in promoting the function of marketing a APP (application program), the content of text of initial official documents and correspondence usually comes
The APP details about the APP function characteristics filled in from developer or developer, the details stress to show the core for having APP
At this moment heart function just will include the official documents and correspondence of APP details, carry out in the advertisement position being placed in the media such as Website page, the APP page
Marketing is promoted, to attract the attention of user by the content of text shown, read, or even clicks the link of official documents and correspondence, into
Enter specific marketing instruction page etc..
It should be noted that the word content that can also include from official documents and correspondence obtains out content of text, it is not reinflated here
Explanation.
Step S104, word segmentation processing is carried out to the content of text, obtains several participles.
In specific implementation, word segmentation processing is carried out using dictionary, wherein dictionary can be the dictionary manually established, and dictionary can
To be the dictionary for having collected the vocabulary for needing to be rewritten official documents and correspondence fields, it can quickly and accurately be treated change by dictionary in this way
The content of text for writing official documents and correspondence is matched, and several participles are marked off, and participle obtained so also has with official documents and correspondence fields
Biggish correlation.
In some embodiments, can also head and the tail identifier be added to the participle, head and the tail identifier can be passed through in this way
To reflect position of the participle in sentence, moreover it is possible to reflect the relationship of the participle with front and back participle, it is subsequent so as to be convenient for
Rewriting operation.
Step S106, according to the specified term pair of at least one of the domain knowledge base of the official documents and correspondence fields to be rewritten
Several participles are rewritten.
Wherein, the domain knowledge base is the knowledge pre-established according to the specified term in the text data of corpus
Library, the specified term are the term in official documents and correspondence field with business scenario meaning.
It should be noted that being also commonly referred to as the reality in the field with the term of business scenario meaning in official documents and correspondence fields
The corresponding term of body knowledge, wherein entity mobility models are the knowledge of the entity in field with specific meanings, so specified term can
For the corresponding word of entity mobility models in field;One or more participle is rewritten in addition, can be, when being multiple
When participle is rewritten, with regard to carrying out rewriting participle using corresponding excessively a specified term.
For example, the corresponding specified term of entity mobility models may include having in marketing domain: festivals or holidays are (such as: National Day, mid-autumn
Section etc.), popular cyberspeak (such as: pricking the old iron of the heart, vertical flag), marketing hot word (such as: also not fastly action, it is slow with one's hands without)
Deng.
It in some embodiments, can be to the participle for obtaining out from the content of text of official documents and correspondence to be rewritten, using replacement side
Formula, which is realized, rewrites, that is to say, that can be by domain knowledge base, with participle accessed from the content of text of official documents and correspondence to be rewritten
The term (i.e. specified term) of corresponding entity mobility models, the directly accessed participle of replacement, and then obtain in new text
Hold, not only remained the meaning of original text content in this way, but also the similar core meaning can be expressed by new content of text, enriches
The expression way of content.
For example, the similar corresponding marketing hot word " not taking action fastly also " of the meaning at this moment can be used when segmenting as " quickly take action ",
Or the replacement such as " quick-moving to have ", " slow with one's hands nothing ".
It in some embodiments, can also be to the participle for obtaining out from the content of text of official documents and correspondence to be rewritten, using slotting
Enter mode and realize rewriting, that is to say, that can be and accessed from the content of text of official documents and correspondence to be rewritten by domain knowledge base
Participle has the corresponding term of entity mobility models being closely connected, and is inserted into the anterior/posterior position of accessed participle, and then obtain
To new content of text.
For example, current date has closed on National Day, and the marketing official documents and correspondence traveled is want to carry out favorable sale before festivals or holidays
At this moment National Day, the corresponding term of preferential movable entity mobility models etc. can be inserted into the content of text of tourism official documents and correspondence by activity
In, specifically it is inserted into the adjacent position accordingly segmented.
It in some embodiments, can also be to the participle for obtaining out from the content of text of official documents and correspondence to be rewritten, using deleting
The mode of subtracting realizes rewriting, that is to say, that can according to the specified term in domain knowledge base, will with entity mobility models correlation (such as
Similarity) low word, it is deleted away from participle, to only be left have the participle being closely connected with entity mobility models.
In some embodiments, it can also be the term of the related entity mobility models according to domain knowledge base, to extract
The corresponding term of entity mobility models in the content of text of official documents and correspondence to be rewritten, then again using the term extracted as described point
Word not only ensure that participle was closely related with official documents and correspondence in this way, but also can guarantee the quantity of participle in suitable range.
It in some embodiments, can also be according to the corresponding head and the tail marker character of the participle, to rewrite the participle, specifically
The rewrite methods such as replacement, insertion can be used in implementation, which is not described herein again.
Step S108, the revised content of text is carried out according to the different participles, generates the text to be rewritten
The corresponding several candidate official documents and correspondences of case.
Word segmentation processing, shape are carried out to the content of text of the official documents and correspondence to be rewritten got according to above-mentioned steps S102~S108
At a number of participle, then go to rewrite participle using at least one specified term of domain knowledge base, and then to difference
It, will be corresponding to automatically generate official documents and correspondence to be rewritten according to the obtained different content of text of the different participles of rewriting after participle is rewritten
Several candidate's official documents and correspondences, since domain knowledge base is pre-established according to the content of text of corpus, which includes official documents and correspondence necks
The corresponding term of a large amount of entity mobility models in domain, and entity mobility models are knowing in entity of the official documents and correspondence fields with specific meanings
Know, specified content of text of the term from corpus, candidate official documents and correspondence obtained in this way, not only substantial amounts, but also official documents and correspondence
Word content also will be more rich and varied, meet official documents and correspondence well and need with novelty, diversity and can follow current
The demand of hot spot, in order to when these abundant in content colorful official documents and correspondences are promoted marketing to user, it is easier to be attracted to user
Note that being conducive to improve user's clicking rate and conversion ratio.
For ease of understanding, here to show that small rated throws the marketing of pension fund by the corset advertisement position of APP in mobile phone
For official documents and correspondence, it is illustrated.
If the marketing official documents and correspondence (official documents and correspondence i.e. to be rewritten) of a small amount investment endowment, as shown in Fig. 2, at this moment content of text
It mainly include theme, small rated is thrown and its illustration of income.Wherein, theme be " investment support parents-allow future life more
It is fine ";The exemplary contents that small rated is thrown include " insurer: Mr. Wang, 30 years old, 10,000 yuan of monthly income, retirement in 60 years old ", " fixed to throw
Scheme: throw 100 yuan of pension fund surely weekly to retirement ", " about 56.4 ten thousand yuan of pension fund estimated revenue: can be accumulated ", " capital
14.4 ten thousand ", " income 420,000 ".
It is assumed that the corresponding specified term of entity mobility models in domain knowledge base, includes that small rated throws related word of supporting parents
Converge, for example, throw surely the amount of money (such as how many member), throw the time surely (such as daily, by week, monthly, per year etc.), throw surely duration (such as
10 years, 20 years, 30 years etc.), income description (such as how many member that expire, add up after how many member, retirement up to how many member), marketing word
(for example go to look at, hasten to look at) etc., at this moment can be drawn into the corresponding word of following entity mobility models from content of text
As participle: " investment ", " endowment ", " fixed to throw ", " 30 years old ", " retirement in 60 years old ", " 100 yuan ", " retirement ", " can accumulate at " weekly "
It is tired ", " pension fund ", " 56.4 ten thousand yuan " etc..
But in view of the advertisement position of APP corset position (position of dotted line frame as shown in Figure 3) in mobile phone, usually require that displaying
The content of text of official documents and correspondence wants short and small, refining, and can attract the user's attention, at this moment when rewriting to participle, so that it may same
Shi Caiyong such as replaces, is inserted into, deleting at a variety of rewrite methods, for example the participles such as " investment ", " endowment ", " pension fund " are deleted,
Then " retirement " is replaced with " after 30 years ", " can accumulate " replaces with " can become ", " 56.4 ten thousand yuan " replace with " how many ", will also
" going to look at " is inserted into the corresponding position of participle and (is inserted into rearmost position here), one generated candidate official documents and correspondence, and
Candidate's official documents and correspondence APP corset position advertisement position show as a result, can the part as shown in dotted line frame in Fig. 3.In this way by changing
After writing, not only content of text is shorter and smaller, and the core meaning is also more prominent, thus be easier to be attracted to user's note that in turn with
Family can enter the details page of official documents and correspondence after clicking, such as using the official documents and correspondence in earlier figures 2 as a part of details page.
Fig. 4 is in a kind of official documents and correspondence Improvement that this specification embodiment provides, according in the text of the official documents and correspondence of corpus
Hold, establishes the flow chart of domain knowledge base.
As shown in figure 4, being pre-established according to the content of text of the official documents and correspondence of corpus described in the official documents and correspondence Improvement
The step of domain knowledge base, it may include:
Step S202, text data is obtained from corpus.
Wherein, corpus can be the database being made of several text datas;Each text data can be one it is pure
Text;Text data can be to collect by big data (for example network crawls) and obtain, and can also be by artificially collecting
It arrives.
Specifically, corpus may include internal corpus and/or external corpus.Wherein internal corpus is as nomenclature
Expect library, dedicated corpus may include the corpus established according to the text data of business scenario accumulation, such as text data at this time
Data including each applied business scene relevant to official documents and correspondence fields to be rewritten;External corpus is as general corpus
Library, general corpus may include the corpus established according to the text data that crawls at this time, for example, text data may include from
The data that the external channel such as Baidupedia, Chinese wiki (Wiki) encyclopaedia, news corpus, search engine is collected.
Step S204, from the specified term extracted in the text data in official documents and correspondence field with business scenario meaning.
It, can be by name entity identification algorithms and/or regular expression, to be taken out from the text data in specific implementation
Take the specified term in official documents and correspondence field with business scenario meaning, that is to say, that by the corresponding finger of entity mobility models in field
Determine term (also can refer to dedicated word) to extract as specified term.
Step S206, obtained specified term will be extracted and establishes knowledge base, as the domain knowledge base.
According to the entity mobility models in the text data of corpus, domain knowledge base, such domain knowledge base are pre-established out
It just deposits in disk, when rewriting official documents and correspondence, so that it may directly go to change according to the corresponding term of entity mobility models in domain knowledge base
Different participles is write, to obtain different candidate official documents and correspondences.
In some embodiments, the knowledge base established according to obtained specified term is extracted can also further sieved
After choosing, after also being merged with the knowledge base that dependence artificially collects maintenance, it is re-used as the domain knowledge base.Specifically, step
S206 will extract obtained specified term and establish knowledge base, may also include that as the step of domain knowledge base will extract
Obtained specified term establishes knowledge base;The knowledge base of foundation is screened;By after screening knowledge base with by manually receiving
After the knowledge base that collection obtains is merged, as the domain knowledge base.
It should be noted that screening here, can be and screened according to default rule, it is also possible to by rule
Then after preliminary screening, then by artificial further screening;Fusion may include the duplicate removal processings such as deleting, merging.
Fig. 5 is the schematic diagram for establishing the functional block diagram of domain knowledge base.
As shown in figure 5, being primarily based on the text corpus in inside and outside portion, then by name entity extraction algorithm, extract
The corresponding term of the entity mobility models of official documents and correspondence fields, finally using the corresponding term of these entity mobility models establish out knowledge base as
Domain knowledge base, to can quickly and accurately get the corresponding specified use of participle from domain knowledge base when rewriting participle
Language.
Based on the same inventive concept, this specification embodiment also provide device for rewriting official documents and correspondence, electronic equipment with
And nonvolatile computer storage media.
Detailed description had been carried out to the official documents and correspondence Improvement in view of in previous embodiment, to dress in following example
It sets, corresponding contents involved in equipment and nonvolatile computer storage media will not be described in great detail.
Fig. 6 is a kind of structural schematic diagram for official documents and correspondence re-writing device that this specification provides, and wherein dashed rectangle indicates optional
Module.
As shown in fig. 6, official documents and correspondence re-writing device 10 includes obtaining module 11, word segmentation module 12, rewrite module 13 and generating mould
Block 14.Wherein, the content of text that module 11 is used to obtain official documents and correspondence to be rewritten is obtained;Word segmentation module 12 is used for the content of text
Word segmentation processing is carried out, several participles are obtained;Rewrite the domain knowledge that module 13 is used for the official documents and correspondence fields to be rewritten according to
At least one of library specifies term to rewrite the participle, and the domain knowledge base is the text data according to corpus
In the knowledge base that pre-establishes of specified term, the specified term is the use in official documents and correspondence field with business scenario meaning
Language;Generation module 14 is used to carry out the revised content of text according to the different participles, generates the text to be rewritten
The corresponding several candidate official documents and correspondences of case.
Optionally, at least one of the domain knowledge base that module 13 is used for the official documents and correspondence fields to be rewritten described in is rewritten
Specified term replaces the participle;
And/or it rewrites module 13 and is used at least one of the domain knowledge base of the official documents and correspondence fields to be rewritten
Specified term is inserted into the adjacent position of the participle.
Optionally, official documents and correspondence re-writing device 10 further includes domain knowledge library module 15, and wherein domain knowledge library module 15 is used for
By execution following steps, the domain knowledge base is established:
Text data is obtained from corpus;
From the specified term extracted in the text data in official documents and correspondence field with business scenario meaning;
Obtained specified term will be extracted and establish knowledge base, as the domain knowledge base.
Optionally, described from the specified term extracted in the text data in official documents and correspondence field with business scenario meaning
It include: from extracting in the text data there is industry in official documents and correspondence field using name entity extraction algorithm and/or regular expression
The specified term for scene meaning of being engaged in.
Optionally, the obtained specified term that will extract establishes knowledge base, includes: that will take out as the domain knowledge base
The specified term obtained establishes knowledge base;The knowledge base of foundation is screened;By the knowledge base after screening and by artificial
After the knowledge base that collection obtains is merged, as the domain knowledge base.
Optionally, word segmentation module 12 is used to extract several specified terms in the content of text, will be described several specified
Term is as several participles.
Optionally, the corpus includes dedicated corpus and/or general corpus, wherein the dedicated corpus packet
The corpus established according to the text data of business scenario accumulation is included, the general corpus includes according to the textual data crawled
According to the corpus of foundation.
Optionally, head and the tail identifier also is added to the participle after obtaining several participles in word segmentation module 12.
This specification embodiment also provides a kind of for rewriting the electronic equipment of official documents and correspondence, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
A processor executes so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to the participle
It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute
Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be rewritten
Dry candidate's official documents and correspondence.
This specification embodiment also provides a kind of for rewriting the nonvolatile computer storage media of official documents and correspondence, is stored with meter
Calculation machine executable instruction, the computer executable instructions setting are as follows:
Obtain the content of text of official documents and correspondence to be rewritten;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be rewritten specifies term to the participle
It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute
Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be rewritten
Dry candidate's official documents and correspondence.
Based on the same inventive concept, this specification embodiment also provides a kind of official documents and correspondence update method, device and equipment.
Fig. 7 is that this illustrates a kind of flow chart for official documents and correspondence update method that embodiment provides.
As shown in fig. 7, the official documents and correspondence update method can comprise the following steps that
Step S302, the content of text of official documents and correspondence to be updated is obtained.
Step S304, word segmentation processing is carried out to the content of text, obtains several participles.
Step S306, term pair is specified according at least one of domain knowledge base of the official documents and correspondence fields to be updated
The participle is rewritten.
Wherein, the domain knowledge base is the knowledge pre-established according to the specified term in the text data of corpus
Library, the specified term are the term in official documents and correspondence field with business scenario meaning.
Step S308, the revised content of text is carried out according to the different participles, generates the text to be updated
The corresponding several candidate official documents and correspondences of case.
Step S310, it is given a mark based on language model to the candidate official documents and correspondence.
Wherein, language model be used for determine sentence probability model, wherein the probability of sentence be represented by P (W1,
W2 ..., Wk), W1, W2 ..., Wk be sentence in each word segment.
Pass through language model in this way, so that it may determine the possibility degree that each word segment occurs in official documents and correspondence, and occur
When in official documents and correspondence sentence, the whole clear and coherent degree of sentence, and then according to the clear and coherent degree of sentence, so that it may determine entire candidate official documents and correspondence
Clear and coherent degree.
Specifically, can each candidate official documents and correspondence be input in language model, is counted using speech model to sentence probability
It calculates, and using sentence probability as marking result.In this way, score is higher, a possibility that indicating the sentence physical presence bigger, the time
Selection case more meets the expression of natural language, that is to say, that sentence score is higher, and the fluency of sentence is higher, and grammer is also got over
Correctly, the candidate official documents and correspondence that such sentence is constituted, quality are also higher.
Step S312, the candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
In specific implementation, more new strategy can be preset according to actual needs, for example more new strategy may include that the frequency updates
Strategy, candidate official documents and correspondence Selection Strategy.
In specific implementation, one or several candidate official documents and correspondences outputs of marking higher (i.e. quality is higher) can be used for conduct
Update the alternative official documents and correspondence of official documents and correspondence.It specifically, can be according to marking as a result, choosing the higher candidate official documents and correspondence of rewriting of marking as final
Arithmetic result, according to marking as a result, press S*=ar gmaxi∈[1,T]Si, determine each candidate official documents and correspondence SiPut in order, with
Convenient for selecting alternative official documents and correspondence according to more new strategy.
It should be noted that the official documents and correspondence Improvement in the content and previous embodiment of above-mentioned steps S302~S308
Step S102~S108 content it is close, which is not described herein again.
S302~S312 through the above steps passes through language model, it may be determined that is each generated out by statement sequence
The sentence probability of candidate's marketing official documents and correspondence, and then according to the probability, it gives a mark to candidate official documents and correspondence, such score is bigger, indicates the marketing
Official documents and correspondence more meets natural language expressing, and quality is also higher, and then goes to update using the higher official documents and correspondence of quality, it is easier to be attracted to use
Family note that be conducive to improve user clicking rate and conversion ratio.
In this specification embodiment, n-gram language model is can be used in language model, and neural network language mould can also be used
Type (NNLM), and then be trained using the text data in corpus.
In some embodiments, as shown in figure 8, following steps can be used in language model is trained:
Step S402, text data is obtained from corpus.
In specific implementation, corpus can be only with general corpus, since the text data of general corpus is by big
Data are crawled and are collected into, and the language message for including is richer, thus established using the text data of general corpus and
More accuracy, is predicted the marking result for obtaining candidate official documents and correspondence by the language model that training obtains.
Step S404, word segmentation processing is carried out to the text data.
Here word segmentation processing can refer to preceding description, and which is not described herein again.
Step S406, the first identifier is added to the participle.
By the first identifier, to obtain term (lexical item) sequence of every text, in order to go to instruct according to term sequence
Practice language model.
Step S408, the participle that the first identifier is added is used for train language model.
According to above-mentioned steps S402~S408, by adding in participle using a large amount of text data in general corpus
Enter the first identifier, to obtain each term sequence, and these term sequence train language models is utilized, due to general corpus
The text data in library is that acquisition is crawled by big data, and not only these text datas more meet the expression of natural-sounding, but also
Data are also richer, to will be preferably applied to the score of predicting candidate official documents and correspondence after language model is trained based on these data.
For ease of understanding, principle explanation is carried out to application of the official documents and correspondence update method in marketing official documents and correspondence below.
As shown in figure 9, constructing domain knowledge base by step S1~S3 first, and trained by step S4~S6
Language model can thus complete the rewriting and update of marketing official documents and correspondence by step S7~S11.
Specifically, step S1~S11 is briefly described as follows:
S1. text data is obtained from inside and outside portion's text corpus;
S2. from the corresponding knowledge of each entity in field is extracted in text data, it is corresponding specified that entity mobility models are formed
Term;
S3. obtained specified term will be extracted and establish knowledge base, as domain knowledge base;
S4. text data is obtained from external text corpus;
S5. text data train language model is utilized, wherein language model uses n-gram language model, in this way in training
Before, by the way that text data is carried out term sequence section, and then term sequence is used to train n-gram language model;
S6. language model is obtained;
S7. content of text is obtained from the initial marketing official documents and correspondence of input (official documents and correspondence i.e. to be updated);
S8. word segmentation processing, such as word or term sequence cutting, the processing such as addition head and the tail identifier are carried out to content of text;
S9. based on the corresponding term of entity mobility models in domain knowledge base, participle is rewritten, and then generates different times
Selection case;
S10. it is given a mark based on language model to candidate official documents and correspondence;
When to the marking of candidate official documents and correspondence, sentence probability can be pressedIt determines,
Wherein, P (S) indicates that the probability of sentence, T indicate the length of term sequence, WiIndicate i-th of term sequence.
S11. based on marking as a result, choosing the candidate official documents and correspondence of quality higher (i.e. score is larger) as more according to more new strategy
Official documents and correspondence newly is exported.
Detailed description had been carried out to the official documents and correspondence update method in view of in previous embodiment, to official documents and correspondence in following example
Corresponding contents involved in updating device, equipment and nonvolatile computer storage media will not be described in great detail.
Figure 10 is that this illustrates a kind of structural schematic diagram for official documents and correspondence updating device that embodiment provides, and wherein dashed rectangle indicates
Optional module.
As shown in Figure 10, official documents and correspondence updating device 20 includes obtaining module 21, word segmentation module 22, rewriting module 23, generate mould
Block 24, scoring modules 25 and update module 26.Wherein, the content of text that module 21 is used to obtain official documents and correspondence to be updated is obtained;Participle
Module 22 is used to carry out word segmentation processing to the content of text, obtains several participles;Module 23 is rewritten to be used for according to described to more
The specified term of at least one of the domain knowledge base of new official documents and correspondence fields rewrites the participle, the domain knowledge
Library is the knowledge base pre-established according to the specified term in the text data of corpus, and the specified term is in official documents and correspondence field
In with business scenario meaning term;Generation module 24 is used to carry out the revised text according to the different participles
Content generates the corresponding several candidate official documents and correspondences of the official documents and correspondence to be updated;Scoring modules 25 are used for based on language model to the time
The marking of selection case;The candidate official documents and correspondence after update module 26 is used to give a mark presses preset more new strategy, updates described to more
New official documents and correspondence.
Optionally, official documents and correspondence updating device 20 further includes training module 17.Wherein, training module 17 is used for following by executing
Step, the training language model:
Text data is obtained from corpus;
Word segmentation processing is carried out to the text data;
The first identifier is added to the participle;
The participle that the first identifier is added is used for train language model.
It is a kind of for updating the electronic equipment of official documents and correspondence that this illustrates that embodiment also provides, comprising:
At least one processor;And
The memory being connect at least one described processor communication;Wherein,
The memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
A processor executes so that at least one described processor can:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be updated specifies term to the participle
It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute
Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be updated
Dry candidate's official documents and correspondence;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
This specification embodiment also provides a kind of for updating the nonvolatile computer storage media of official documents and correspondence, is stored with meter
Calculation machine executable instruction, the computer executable instructions setting are as follows:
Obtain the content of text of official documents and correspondence to be updated;
Word segmentation processing is carried out to the content of text, obtains several participles;
At least one of domain knowledge base according to the official documents and correspondence fields to be updated specifies term to the participle
It is rewritten, the domain knowledge base is the knowledge base pre-established according to the specified term in the text data of corpus, institute
Stating specified term is the term in official documents and correspondence field with business scenario meaning;
The revised content of text is carried out according to the different participles, if it is corresponding to generate the official documents and correspondence to be updated
Dry candidate's official documents and correspondence;
It is given a mark based on language model to the candidate official documents and correspondence;
The candidate official documents and correspondence after marking is pressed into preset more new strategy, updates the official documents and correspondence to be updated.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims
It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment
It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable
Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can
With or may be advantageous.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for device,
For equipment, nonvolatile computer storage media embodiment, since it is substantially similar to the method embodiment, so the ratio of description
Relatively simple, the relevent part can refer to the partial explaination of embodiments of method.
Device that this specification embodiment provides, equipment, nonvolatile computer storage media with method be it is corresponding, because
This, device, equipment, nonvolatile computer storage media also have the advantageous effects similar with corresponding method, due to upper
Face is described in detail the advantageous effects of method, therefore, which is not described herein again corresponding intrument, equipment, it is non-easily
The advantageous effects of the property lost computer storage medium.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example,
Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So
And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit.
Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause
This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device
(Programmable LogicDevice, PLD) (such as field programmable gate array (Field Programmable Gate
Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer
Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker
Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " is patrolled
Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development,
And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language
(Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL
(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description
Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL
(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby
Hardware Description Language) etc., VHDL (Very-High-Speed is most generally used at present
Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer
This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages,
The hardware circuit for realizing the logical method process can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing
The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can
Read medium, logic gate, switch, specific integrated circuit (Application Specific Integrated Circuit,
ASIC), the form of programmable logic controller (PLC) and insertion microcontroller, the example of controller includes but is not limited to following microcontroller
Device: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320 are deposited
Memory controller is also implemented as a part of the control logic of memory.It is also known in the art that in addition to
Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic
Controller is obtained to come in fact in the form of logic gate, switch, specific integrated circuit, programmable logic controller (PLC) and insertion microcontroller etc.
Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it
The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions
For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity,
Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used
Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play
It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment
The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this
The function of each unit can be realized in the same or multiple software and or hardware when application.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, calculating equipment includes one or more processors (CPU), input/output interface, net
Network interface and memory.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium
Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method
Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable
Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices
Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates
Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability
It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap
Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want
There is also other identical elements in the process, method of element, commodity or equipment.
The application can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group
Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage equipment.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art
For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal
Replacement, improvement etc., should be included within the scope of the claims of this application.