CN109977995A

CN109977995A - Text template recognition methods, device and computer readable storage medium

Info

Publication number: CN109977995A
Application number: CN201910109887.0A
Authority: CN
Inventors: 刘轲
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-02-11
Filing date: 2019-02-11
Publication date: 2019-07-05
Also published as: WO2020164204A1

Abstract

The invention discloses a kind of text template recognition methods, this method comprises: obtaining pre-set text template and matched text；The first similarity of the matched text Yu the pre-set text template is calculated according to the text similarity measurement algorithm based on word frequency；And/or the second similarity of the matched text Yu the pre-set text template is calculated according to semantic-based text similarity measurement algorithm；When first similarity and/or second similarity meet default similarity condition, determine that the matched text is text template similar with the pre-set text template.The present invention also proposes a kind of text template identification device and a kind of computer readable storage medium.The efficiency and accuracy of text template identification can be improved in the present invention.

Description

Text template recognition methods, device and computer readable storage medium

Technical field

The present invention relates to natural language processing technique field more particularly to a kind of text template recognition methods, device and meter Calculation machine readable storage medium storing program for executing.

Background technique

With the development of internet technology, people of all occupations can freely be issued and lower information carrying by the network platform Breath, this makes the information on network more and more, and big data analysis analyze to the data of network Shanghai amount and then be extracted Required information.It when carrying out big data analysis sometimes for text template is used, that is, include the text envelope of certain specific characters Breath.In general, identical text information or similar text information can correspond to a text template.In the prior art, text is obtained The method of this template is usually to be extracted from various information by staff, however this method takes time and effort, and work people Member, which needs to take a long time, to be identified and then obtains text module.

Summary of the invention

The present invention provides a kind of text template recognition methods, device and computer readable storage medium, main purpose and exists In the efficiency and accuracy that improve text template identification.

To achieve the above object, the present invention also provides a kind of text template recognition methods, this method comprises:

Obtain pre-set text template and matched text；

The first of the matched text and the pre-set text template is calculated according to the text similarity measurement algorithm based on word frequency Similarity；And/or

The second of the matched text and the pre-set text template is calculated according to semantic-based text similarity measurement algorithm Similarity；

When first similarity and/or second similarity meet default similarity condition, the matching is determined Text is text template similar with the pre-set text template.

Optionally, the basis calculates the matched text and the pre-set text based on the text similarity measurement algorithm of word frequency First similarity of template and/or the matched text and the default text are calculated according to semantic-based text similarity measurement algorithm Second similarity of this template includes:

The first similarity of the matched text Yu the pre-set text template is calculated using vector space model；

It is similar to the second of the pre-set text model that the matched text is calculated using LDA document subject matter generation model Degree；

First similarity and second similarity meet default similarity condition

Carry out linear weighted function according to first similarity and second similarity, obtain the matched text with it is described The third similarity of pre-set text template；

Judge whether the third similarity is greater than third and presets similarity；

If the third similarity is greater than the default similarity, first similarity and second similarity are determined Meet default condition of similarity.

Optionally, described that linear weighted function is carried out according to first similarity and second similarity, obtain described Include: with text and the third similarity of the pre-set text template

First similarity, second similarity are input to predetermined linear weighted formula, export the matching text The third similarity of this and the pre-set text template, the predetermined linear weighted formula are as follows:

Sim (p, q)=α sim_LDA(p,q)+βsim_TFIDF(p, q),

Wherein, p and q is respectively the matched text and the pre-set text template, sim_TFIDF(p, q) is first phase Like degree, sim_LDA(p, q) is second similarity, and sim (p, q) is the third similarity, and α and β are default weighted value.

Optionally, the method also includes:

Obtain the weighted value for being used for linear weighted function, comprising:

The first initial value is assigned to the weighted value, according to third similarity described in first calculation of initial value；

Judge whether the matching template and the pre-set text template are the same category by default clustering algorithm, obtains Cluster result；

Judge whether the third similarity obtained according to first calculation of initial value is quasi- by the cluster result Really；

If it is determined that it is accurate according to the third similarity that first calculation of initial value obtains, determine that described first is initial Value is the weighted value for linear weighted function；

If it is determined that accurately inaccurate according to the third similarity that first calculation of initial value obtains, described the is adjusted One initial value executes the operation of the third similarity according to first calculation of initial value.

Optionally, first similarity or the default similarity condition of second similarity satisfaction include:

First similarity is greater than the first default similarity or second similarity is greater than the second default similarity.

In addition, to achieve the above object, the present invention also provides a kind of text template identification device, which includes memory And processor, the text template recognizer that can be run on the processor, the text mould are stored in the memory Plate recognizer realizes following steps when being executed by the processor:

Obtain pre-set text template and matched text；

First similarity and second similarity meet default similarity condition

Sim (p, q)=α sim_LDA(p,q)+βsim_TFIDF(p, q),

Optionally, the text template recognizer is executed by the processor, also realization following steps:

In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Text template recognizer is stored on storage medium, the text template recognizer can be held by one or more processor Row, the step of to realize text template recognition methods as described above.

Text template recognition methods, text template identification device and computer readable storage medium proposed by the present invention, are obtained Take pre-set text template and matched text；According to the text similarity measurement algorithm based on word frequency calculate the matched text with it is described pre- If the first similarity of text template；And/or the matched text and institute are calculated according to semantic-based text similarity measurement algorithm State the second similarity of pre-set text template；When first similarity and/or second similarity meet default similarity When condition, determine that the matched text is text template similar with the pre-set text template.Without staff people one by one Work judgement, it will be able to be rapidly obtained text module similar with pre-set text template, realize and improve text template identification The purpose of efficiency by the text similarity measurement algorithm based on word frequency and/or be based on language also, when calculating text similarity The text similarity measurement algorithm of justice is calculated, and can be improved the accuracy of text template identification.

Detailed description of the invention

Fig. 1 is the flow diagram for the text template recognition methods that one embodiment of the invention provides；

Fig. 2 is the schematic diagram of internal structure for the text template identification device that one embodiment of the invention provides；

The module signal of text template recognizer in the text template identification device that Fig. 3 provides for one embodiment of the invention Figure.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

The present invention provides a kind of text template recognition methods.It is the text that first embodiment of the invention provides shown in referring to Fig.1 The flow diagram of this template recognition methods.This method can be executed by an electronic device.

In the present embodiment, text template recognition methods includes:

Step S10: pre-set text template and matched text are obtained.

The pre-set text template can be (such as being stored in electronic equipment) text for being stored in advance in default memory block This template.The pre-set text template can be obtained by user and be stored in default memory block, alternatively, the pre-set text template is by passing through The text of several similar words is analyzed, and extracts similar keyword in the text, obtains pre-set text template.

In a kind of possible embodiment, pre-set text template is any one text mould in a text template set Plate, all to include various inhomogeneous text templates in same class text template or text set in text set.Institute Stating and obtaining pre-set text template includes: to obtain text template set；Obtain the text template in the text template set.

The matched text is the text for judge whether it is Similar Text template.The matched text can be by one A or multiple sentence compositions.

Step S20: the matched text and the pre-set text mould are calculated according to the text similarity measurement algorithm based on word frequency First similarity of plate and/or the matched text and the pre-set text are calculated according to semantic-based text similarity measurement algorithm Second similarity of template.

It is described that similarity between two texts is calculated by the frequency of occurrences of word based on the text similarity measurement algorithm of word frequency； The semantic-based text similarity measurement algorithm calculates the similarity between two texts by this semanteme.

Specifically the text similarity measurement algorithm based on word frequency and the semantic-based text similarity measurement algorithm can be from It obtains in the prior art, details are not described herein again.

Optionally, in an alternative embodiment of the invention, the basis calculates described based on the text similarity measurement algorithm of word frequency With text with the first similarity of the pre-set text template and/or according to the calculating of semantic-based text similarity measurement algorithm Second similarity of matched text and the pre-set text template includes:

It is similar to the second of the pre-set text model that the matched text is calculated using LDA document subject matter generation model Degree.

The first similarity of matched text and pre-set text template is calculated using vector space model in the present embodiment.

Matched text and pre-set text mould are calculated using the vector space model (Vector Space Model, SVM) First similarity of plate includes:

Pretreatment operation is carried out to matched text and pre-set text template, the pretreatment operation includes but is not limited to point Word, go stop words (including word, symbol, punctuate, messy code for having little significance to content of text etc., as " this " " " " "), obtain To pretreated matched text and pretreated pre-set text template；

The frequency of word determines the first keyword from pretreated matched text, and from pretreated default text The frequency of word determines the second keyword in this template, wherein the first keyword and the second keyword all may include multiple words；

For example, the word for determining that the frequency of occurrences is greater than predeterminated frequency in pretreated matched text is the first keyword.

After determining the first keyword and the second keyword, the reverse text frequency of the first keyword, Yi Ji are calculated The reverse text frequency of two keywords, and generate indicate matched text primary vector and indicate pre-set text template second to Amount；

Wherein, reverse text frequency (inverse document frequency, IDF) is for measuring keyword weight Index.

The reverse text frequency of a certain keyword can be according to its formula IDF=log (D/D_w) calculated, wherein D is The total quantity of text, D in sample database_wFor the quantity for the text that keyword occurred.

In the present embodiment, primary vector and secondary vector are obtained according to the following formula:

D=D (T1, W1；T2, W2；..., Tn, Wn)

Wherein, T1 is a keyword, and W1 is the reverse text frequency of the keyword；T2 is another keyword, and W2 is The reverse text frequency of the keyword；And so on, Tn is n-th of keyword, and Wn is the reverse text frequency of the keyword.

In vector space model, the content degree of correlation Sim (D1, D2) between two texts commonly uses angle between vector Cosine value expression, therefore, after the secondary vector of the primary vector spatial model and pre-set text template that obtain matched text, The cosine of primary vector and secondary vector is calculated, to obtain the first similarity of pre-matching text Yu pre-set text template, is counted The formula for calculating cosine can obtain from the prior art, and details are not described herein again.

In the present embodiment, text is reduced to carry out table by the N-dimensional vector of component of the weight of characteristic item (keyword) Show, simplify the complex relationship in text between keyword, so that model is had computability, and then can quickly be matched The first similarity between text and pre-set text template.

In the present embodiment, the base of LDA (Latent Dirichlet Allocation implies the distribution of Di Li Cray) model This thought be by document description be the theme probability distribution and further by subject description be lexical item probability distribution.Specifically, how It can be from the prior art according to the second similarity that LDA document subject matter generates model calculating matched text and pre-set text model It obtains, details are not described herein again.

Step S30: it when first similarity and/or second similarity meet default similarity condition, determines The matched text is text template similar with the pre-set text template.

The default similarity condition can be pre-set.

Optionally, in an alternative embodiment of the invention, first similarity or second similarity meet default phase Include: like degree condition

The first default similarity and the second default similarity can according to need and preset, and described first is pre- If the value of similarity and the second default similarity can be same or different.For example, the first default similarity is 85%, the Two default similarities are 90%；Alternatively, the first default similarity and the second default similarity are all 90%.

Optionally, in an alternative embodiment of the invention, first similarity and second similarity meet default phase Include: like degree condition

If the third similarity, which is greater than the third, presets similarity, first similarity and second phase are determined Meet default condition of similarity like degree.

Linear weighted function assigns certain weighted value with the second similarity to the first similarity and is added again, and it is similar to obtain third Degree.

The default similarity of the third can be pre-set.

Optionally, in an alternative embodiment of the invention, it is described according to first similarity and second similarity into Row linear weighted function, the third similarity for obtaining the matched text and the pre-set text template include:

Sim (p, q)=α sim_LDA(p,q)+βsim_TFIDF(p, q),

In the present embodiment, 0≤α≤1,0≤β≤1, and the sum of α and β are 1.

Optionally, in an alternative embodiment of the invention, the method also includes: obtain be used for linear weighted function weighted value. Described obtain include: for the weighted value of linear weighted function

If it is determined that the third similarity inaccuracy obtained according to first calculation of initial value, at the beginning of adjustment described first Initial value executes the operation of the third similarity according to first calculation of initial value.

Above-mentioned steps are used to obtain the value of α or β.

The cluster result is matching template and pre-set text template is the same category or matching template and pre-set text Template is not the same category.

First initial value can be 0.1, when adjusting the first initial value, can adjust increase by 0.1 every time.For example, If the weight obtained is α, i.e., just having started assignment season α is 0.1, then β is 0.9 at this time, is calculated according to predetermined linear weighted formula The third similarity of matched text and pre-set text template, and matching template and pre-set text template are judged by clustering algorithm It whether is the same category, if third similarity, less than 50%, and clustering algorithm judges that matching template is not with pre-set text template The same category, it is determined that the third similarity inaccuracy obtained according to the first calculation of initial value.α=α+0.1 is enabled, then α is 0.2, β is 0.8 at this time, and the third similarity of matched text and pre-set text template is calculated according to predetermined linear weighted formula, with And judge whether matching template and pre-set text template are the same category, if inaccurate, enable α=α+0.1, then by clustering algorithm α is 0.3, and β is 0.7 at this time, is calculated again, and so on, until finding the value of optimal α and the value of β.

It in the present embodiment, can general when determining matched text is text template similar with pre-set text template It is added in the template set of pre-set text template with text, thus through this embodiment, available multiple text template collection It closes, is similar text template in each text template set.

The text template recognition methods that the present embodiment proposes obtains pre-set text template and matched text；According to word-based The text similarity measurement algorithm of frequency calculates the first similarity of the matched text Yu the pre-set text template；And/or according to base The second similarity of the matched text Yu the pre-set text template is calculated in semantic text similarity measurement algorithm；When described When one similarity and/or second similarity meet default similarity condition, determine that the matched text is to preset with described The similar text template of text template.Without staff's artificial judgment one by one, it will be able to be rapidly obtained and pre-set text The similar text module of template realizes the purpose for improving the efficiency of text template identification, also, is calculating text similarity When, it is calculated, be can be improved by text similarity measurement algorithm based on word frequency and/or semantic-based text similarity measurement algorithm The accuracy of text template identification.

The present invention also provides a kind of text template identification devices.It is the text that one embodiment of the invention provides referring to shown in Fig. 2 The schematic diagram of internal structure of this template identification device.

In the present embodiment, text template identification device 1 can be PC (PersonalComputer, PC), It can be the terminal devices such as smart phone, tablet computer, portable computer.Text template identification device 1 includes at least storage Device 11, processor 12, network interface 13 and communication bus 14.

Wherein, memory 11 include at least a type of readable storage medium storing program for executing, the readable storage medium storing program for executing include flash memory, Hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), magnetic storage, disk, CD etc..Memory 11 It can be the internal storage unit of text template identification device 1 in some embodiments, such as text template identification device 1 Hard disk.Memory 11 is also possible to the External memory equipment of text template identification device 1, such as text in further embodiments The plug-in type hard disk being equipped on template identification device 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..Further, memory 11 can also both include text The internal storage unit of template identification device 1 also includes External memory equipment.Memory 11 can be not only used for storage and be installed on Application software and Various types of data, such as the code of text template recognizer 01 of text template identification device 1 etc. can also be used In temporarily storing the data that has exported or will export.

Processor 12 can be in some embodiments a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips, the program for being stored in run memory 11 Code or processing data, such as execute text template recognizer 01 etc..

Network interface 13 optionally may include standard wireline interface and wireless interface (such as WI-FI interface), be commonly used in Communication connection is established between the device 1 and other electronic equipments.

Communication bus 14 is for realizing the connection communication between these components.

Optionally, text template identification device 1 can also include user interface, and user interface may include display (Display), input unit such as keyboard (Keyboard), optional user interface can also include standard wireline interface, Wireless interface.Optionally, in some embodiments, it is aobvious to can be light-emitting diode display, liquid crystal display, touch control type LCD for display Show that device and OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) touch device etc..Wherein, display Can also it is appropriate be known as display screen or display unit, for be shown in the information handled in text template identification device 1 and For showing visual user interface.

Fig. 2 illustrates only the text template identification device 1 with component 11-14 and text template recognizer 01, this Field technical staff it is understood that Fig. 2 shows structure do not constitute the restriction to text template identification device 1, can be with Including perhaps combining certain components or different component layouts than illustrating less perhaps more components.

In 1 embodiment of text template identification device shown in Fig. 2, text template recognizer is stored in memory 11 01；Processor 12 realizes following steps when executing the text template recognizer 01 stored in memory 11:

Obtain pre-set text template and matched text.

The first of the matched text and the pre-set text template is calculated according to the text similarity measurement algorithm based on word frequency Similarity and/or the of the matched text and the pre-set text template is calculated according to semantic-based text similarity measurement algorithm Two similarities.

Optionally, in an alternative embodiment of the invention, the basis calculates described based on the text similarity measurement algorithm of word frequency Described is calculated with the first similarity of the pre-set text template and according to semantic-based text similarity measurement algorithm with text Include: with text and the second similarity of the pre-set text template

D=D (T1, W1；T2, W2；..., Tn, Wn)

The default similarity condition can be pre-set.

The default similarity of the third can be pre-set.

Sim (p, q)=α sim_LDA(p,q)+βsim_TFIDF(p, q),

Optionally, in an alternative embodiment of the invention, the text template recognizer is executed by the processor, also real Existing following steps:

Obtain the weighted value for being used for linear weighted function.

Described obtain include: for the weighted value of linear weighted function

Above-mentioned steps are used to obtain the value of α or β.

The text template identification device that the present embodiment proposes obtains pre-set text template and matched text；According to word-based The text similarity measurement algorithm of frequency calculates the first similarity of the matched text Yu the pre-set text template；And/or according to base The second similarity of the matched text Yu the pre-set text template is calculated in semantic text similarity measurement algorithm；When described When one similarity and/or second similarity meet default similarity condition, determine that the matched text is to preset with described The similar text template of text template.Without staff's artificial judgment one by one, it will be able to be rapidly obtained and pre-set text The similar text module of template realizes the purpose for improving the efficiency of text template identification, also, is calculating text similarity When, it is calculated, be can be improved by text similarity measurement algorithm based on word frequency and/or semantic-based text similarity measurement algorithm Text template obtains the accuracy of identification.

Optionally, in other embodiments, text template recognizer can also be divided into one or more module, One or more module is stored in memory 11, and by one or more processors (the present embodiment is processor 12) institute It executes to complete the present invention, the so-called module of the present invention is the series of computation machine program instruction for referring to complete specific function Section, for describing implementation procedure of the text template recognizer in text template identification device.

It is the text template recognizer in one embodiment of text template identification device of the present invention for example, referring to shown in Fig. 3 01 program module schematic diagram, in the embodiment, text template recognizer, which can be divided into, obtains module 10, computing module 20 and determining module 30, illustratively:

It obtains module 10 to be used for: obtaining pre-set text template and matched text；

Computing module 20 is used for: being calculated the matched text according to the text similarity measurement algorithm based on word frequency and is preset with described First similarity of text template；And/or according to semantic-based text similarity measurement algorithm calculate the matched text with it is described Second similarity of pre-set text template；

Determining module 30 is used for: when first similarity and/or second similarity meet default similarity condition When, determine that the matched text is text template similar with the pre-set text template.

The program modules such as above-mentioned acquisition module 10, computing module 20 and determining module 30 be performed realized function or Operating procedure is substantially the same with above-described embodiment, and details are not described herein.

In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium On be stored with text template recognizer, the text template recognizer can be executed by one or more processors, with realize Following operation:

Obtain pre-set text template and matched text；

Computer readable storage medium specific embodiment of the present invention and above-mentioned text template identification device and each reality of method It is essentially identical to apply example, does not make tired state herein.

It should be noted that the serial number of the above embodiments of the invention is only for description, do not represent the advantages or disadvantages of the embodiments.And The terms "include", "comprise" herein or any other variant thereof is intended to cover non-exclusive inclusion, so that packet Process, device, article or the method for including a series of elements not only include those elements, but also including being not explicitly listed Other element, or further include for this process, device, article or the intrinsic element of method.Do not limiting more In the case where, the element that is limited by sentence "including a ...", it is not excluded that including process, device, the article of the element Or there is also other identical elements in method.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.

The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. a kind of text template recognition methods, which is characterized in that the described method includes:

Obtain pre-set text template and matched text；

It is similar to the first of the pre-set text template that the matched text is calculated according to the text similarity measurement algorithm based on word frequency Degree；And/or

It is similar to the second of the pre-set text template that the matched text is calculated according to semantic-based text similarity measurement algorithm Degree；

When first similarity and/or second similarity meet default similarity condition, the matched text is determined For text template similar with the pre-set text template.

2. text template recognition methods as described in claim 1, which is characterized in that it is characterized in that, described according to word-based The text similarity measurement algorithm of frequency calculates the first similarity of the matched text and the pre-set text template and/or according to being based on The second similarity that semantic text similarity measurement algorithm calculates the matched text and the pre-set text template includes:

The second similarity that model calculates the matched text Yu the pre-set text model is generated using LDA document subject matter；

First similarity and second similarity meet default similarity condition

Linear weighted function is carried out according to first similarity and second similarity, the matched text is obtained and is preset with described The third similarity of text template；

If the third similarity is greater than the default similarity, determine that first similarity and second similarity meet Default condition of similarity.

3. text template recognition methods as claimed in claim 2, which is characterized in that described according to first similarity and institute It states the second similarity and carries out linear weighted function, the third similarity for obtaining the matched text and the pre-set text template includes:

First similarity, second similarity are input to predetermined linear weighted formula, export the matched text with The third similarity of the pre-set text template, the predetermined linear weighted formula are as follows:

Sim (p, q)=α sim_LDA(p,q)+βsim_TFIDF(p, q),

Wherein, p and q is respectively the matched text and the pre-set text template, sim_TFIDF(p, q) is described first similar Degree, sim_LDA(p, q) is second similarity, and sim (p, q) is the third similarity, and α and β are default weighted value.

4. text template recognition methods as claimed in claim 2 or claim 3, which is characterized in that the method also includes:

Judge whether the matching template and the pre-set text template are the same category by default clustering algorithm, obtains cluster As a result；

Judge whether the third similarity obtained according to first calculation of initial value is accurate by the cluster result；

If it is determined that it is accurate according to the third similarity that first calculation of initial value obtains, determine that first initial value is Weighted value for linear weighted function；

If it is determined that it is accurately inaccurate according to the third similarity that first calculation of initial value obtains, at the beginning of adjustment described first Initial value executes the operation of the third similarity according to first calculation of initial value.

5. text template recognition methods as described in claim 1, which is characterized in that first similarity or second phase Meeting default similarity condition like degree includes:

6. a kind of text template identification device, which is characterized in that described device includes memory and processor, on the memory It is stored with the text template recognizer that can be run on the processor, the text template recognizer is by the processor Following steps are realized when execution:

Obtain pre-set text template and matched text；

7. text template identification device as claimed in claim 6, which is characterized in that text of the basis based on word frequency is similar Degree algorithm calculates the first similarity of the matched text and the pre-set text template and/or according to semantic-based text phase Include: like the second similarity that degree algorithm calculates the matched text and the pre-set text template

First similarity and second similarity meet default similarity condition

8. text template identification device as claimed in claim 7, which is characterized in that described according to first similarity and institute It states the second similarity and carries out linear weighted function, the third similarity for obtaining the matched text and the pre-set text template includes:

Sim (p, q)=α sim_LDA(p,q)+βsim_TFIDF(p, q),

9. text template identification device as claimed in claim 7 or 8, which is characterized in that the text template recognizer quilt The processor executes, also realization following steps:

10. a kind of computer readable storage medium, which is characterized in that be stored with text mould on the computer readable storage medium Plate recognizer, the text template recognizer can be executed by one or more processor, with realize as claim 1 to Described in any one of 5 the step of text template recognition methods.