CN109033082A

CN109033082A - The learning training method, apparatus and computer readable storage medium of semantic model

Info

Publication number: CN109033082A
Application number: CN201810800318.6A
Authority: CN
Inventors: 曾志辉; 姚凯
Original assignee: Shenzhen Skyworth Digital Technology Co Ltd
Current assignee: Shenzhen Skyworth Digital Technology Co Ltd
Priority date: 2018-07-19
Filing date: 2018-07-19
Publication date: 2018-12-18
Anticipated expiration: 2038-07-19
Also published as: CN109033082B

Abstract

The invention discloses a kind of learning training methods of semantic model, it include: based on preset core word bank, word segmentation processing is carried out to training text using two-way maximum matching method, obtains the corresponding matching word segmentation result of the training text, the matching word segmentation result includes individual character and/or core word；When there are core word, the corresponding dictionary type of the core word is determined；According to preset semantic model, the individual character and/or the corresponding index coding of the dictionary type are obtained, with the training text according to the index coded representation, wherein, in the semantic model, the corresponding unique index coding of each individual character, the corresponding unique index coding of each dictionary type.The invention also discloses the learning training devices and computer readable storage medium of a kind of semantic model.The present invention improves the learning training efficiency of semantic model.

Description

The learning training method, apparatus and computer readable storage medium of semantic model

Technical field

The present invention relates to field of computer technology more particularly to the learning training method, apparatus and meter of a kind of semantic model Calculation machine readable storage medium storing program for executing.

Background technique

Semantic model is to increase completely new data builder and data processing primitive on the basis of relational model, is used to table Up to a new class of data model of complicated structure and semanteme abundant, currently, being usually by each word using unique index Encoding index indicates, since the order of magnitude of dictionary is often in hundreds of thousands grade, then the order of magnitude of corresponding index coding is also several 100000 grades, this will lead to dimension disaster, and semantic model is especially big, and learning training speed is especially slow, low efficiency.

Summary of the invention

The main object of the present invention is to provide the learning training method, apparatus and computer-readable storage of a kind of semantic model Medium, it is intended to solve the problems, such as the learning training low efficiency of existing semantic model.

To achieve the above object, the present invention proposes the learning training method of semantic model, the study instruction of the semantic model Practice method the following steps are included:

Based on preset core word bank, word segmentation processing is carried out to training text using two-way maximum matching method, described in acquisition The corresponding matching word segmentation result of training text, the matching word segmentation result includes individual character and/or core word；

When there are core word, the corresponding dictionary type of the core word is determined；

According to preset semantic model, the individual character and/or the corresponding index coding of the dictionary type are obtained, with basis Training text described in the index coded representation, wherein in the semantic model, the corresponding unique index of each individual character is compiled Code, the corresponding unique index coding of each dictionary type.

Preferably, described to be based on preset core word bank, training text is carried out at participle using two-way maximum matching method Reason, the step of obtaining the training text corresponding matching word segmentation result include:

Word segmentation processing is carried out to training text using two-way maximum matching method, obtains the corresponding individual character of the training text And/or word；

When there are word, it is based on the core word bank, judges whether institute's predicate is core word；

If it is not, institute's predicate is then split as individual character, the corresponding matching word segmentation result of the training text is obtained.

Preferably, described when there are word, it is based on the core word bank, judges the step of whether institute's predicate is core word packet It includes:

If the quantity of institute's predicate be it is multiple, be based on the core word bank, judge whether each word is core word respectively；

It is described if it is not, institute's predicate is then split as individual character, obtain the step of the corresponding matching word segmentation result of the training text Suddenly include:

Word is not core word if it exists, then institute's predicate is split as individual character, obtains the corresponding matching point of the training text Word result.

Preferably, described that word segmentation processing is carried out to training text using two-way maximum matching method, obtain the training text The step of corresponding individual character and/or word includes:

Word segmentation processing is carried out to training text using two-way maximum matching method, it is two-way point corresponding to obtain the training text Word result；

According to the word frequency data counted in advance, word frequency analysis is carried out to the two-way word segmentation result, determines the training text Corresponding individual character and/or word.

Preferably, the two-way word segmentation result includes that positive word segmentation result and reverse word segmentation result, the basis count in advance Word frequency data, word frequency analysis is carried out to the two-way word segmentation result, determines the corresponding individual character of the training text and/or word Step includes:

According to the word frequency data, word frequency analysis is carried out to the positive word segmentation result and reverse word segmentation result respectively, is obtained Take the positive corresponding first word frequency total amount of word segmentation result and the corresponding second word frequency total amount of the reverse word segmentation result；

According to the first word frequency total amount and the second word frequency total amount, the positive word segmentation result or described reverse is determined Word segmentation result is the matching word segmentation result；

By the matching word segmentation result individual character and/or word that include, be determined as the corresponding individual character of the training text and/or Word.

Preferably, described according to the first word frequency total amount and the second word frequency total amount, determine the positive participle knot The step of fruit or the reverse word segmentation result are the matching word segmentation result include:

When the ratio of the first word frequency total amount and the second word frequency total amount is greater than default ratio, the forward direction is determined Word segmentation result is the matching word segmentation result；

When the ratio of the first word frequency total amount and the second word frequency total amount is less than the default ratio, described in determination Reverse word segmentation result is the matching word segmentation result.

Preferably, described according to the first word frequency total amount and the second word frequency total amount, determine the positive participle knot The step of fruit or reverse word segmentation result are the matching word segmentation result include:

When the difference of the first word frequency total amount and the second word frequency total amount is greater than preset difference value, the forward direction is determined Word segmentation result is the matching word segmentation result；

When the difference of the first word frequency total amount and the second word frequency total amount is less than the preset difference value, described in determination Reverse word segmentation result is the matching word segmentation result.

Preferably, described according to the word frequency data, the positive word segmentation result and reverse word segmentation result are carried out respectively Word frequency analysis obtains the positive corresponding first word frequency total amount of word segmentation result and the reverse word segmentation result corresponding the The step of two word frequency total amounts includes:

According to the word frequency data, individual character and/or the corresponding word frequency amount of word that the positive word segmentation result includes are carried out It is cumulative, obtain the corresponding first word frequency total amount of the positive word segmentation result；

According to the word frequency data, the corresponding word frequency amount of the individual character and/or word for including by the reverse word segmentation result is carried out It is cumulative, obtain the corresponding second word frequency total amount of the reverse word segmentation result.

In addition, to achieve the above object, the present invention also proposes a kind of learning training device of semantic model, the semanteme mould The learning training device of type includes: memory, processor and is stored on the memory and can run on the processor Semantic model learning training program；It realizes when the semantic model learning training program is executed by the processor such as institute above The step of learning training method for the semantic model stated.

In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Semantic model learning training program is stored on storage medium, it is real when the semantic model learning training program is executed by processor The step of learning training method of semantic model now as described above.

In technical solution of the present invention, when carrying out semantic model learning training, it is based on preset core word bank, use is two-way Maximum matching method carries out word segmentation processing to training text, obtain the corresponding matching word segmentation result of training text (including individual character and/or Core word), when there are core word, determine the corresponding dictionary type of core word, and according to preset semantic model, obtain individual character And/or dictionary type corresponding index coding, with according to acquired index coded representation training text, due in semantic model In each individual character, dictionary type respectively correspond unique index coding, rather than the corresponding unique index coding of each word, individual character Quantity be far smaller than the quantity of word, the order of magnitude of semantic model reduces, and this improves the learning training of semantic model effects Rate.

Detailed description of the invention

Fig. 1 is that the learning training apparatus structure of the semantic model for the hardware running environment that the embodiment of the present invention is related to shows It is intended to；

Fig. 2 is the flow diagram of the learning training method first embodiment of semantic model of the invention；

Fig. 3 is the flow diagram of the learning training method second embodiment of semantic model of the invention.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

The solution of the embodiment of the present invention is mainly: when carrying out semantic model learning training, being based on preset core Dictionary carries out word segmentation processing to training text using two-way maximum matching method, obtains the corresponding matching word segmentation result of training text (including individual character and/or core word) determines the corresponding dictionary type of core word, and according to preset language when there are core word Adopted model obtains individual character and/or the corresponding index coding of dictionary type, according to acquired index coded representation training text This, since individual character each in semantic model, dictionary type respectively correspond unique index coding, rather than each word is corresponding only One index coding, the quantity of individual character are far smaller than the quantity of word, and the order of magnitude of semantic model substantially reduces, therefore, greatly Improve the learning training efficiency of semantic model.Technical solution through the embodiment of the present invention solves the study of semantic model The low problem of training effectiveness.

The embodiment of the present invention proposes a kind of learning training device of semantic model.

Referring to Fig.1, Fig. 1 is the learning training dress of the semantic model for the hardware running environment that the embodiment of the present invention is related to Set structural schematic diagram.

In subsequent description, it is only using the suffix for indicating such as " module ", " component " or " unit " of element Be conducive to explanation of the invention, itself there is no a specific meaning.Therefore, " module ", " component " or " unit " can mix Ground uses.

As shown in Figure 1, the learning training device of the semantic model may include: processor 1001, communication bus 1002, use Family interface 1003, network interface 1004, memory 1005.Wherein, communication bus 1002 is for realizing the company between these components Connect letter.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), can be selected Family interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 optionally may include standard Wireline interface, wireless interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable deposit Reservoir (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned place Manage the storage device of device 1001.

It will be understood by those skilled in the art that the learning training apparatus structure of semantic model shown in Fig. 1 is not constituted Restriction to the learning training device of semantic model may include than illustrating more or fewer components, or the certain portions of combination Part or different component layouts.

As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe module and semantic model learning training program.

In the present invention, the learning training device of the semantic model is called in memory 1005 by processor 1001 to be stored Semantic model learning training program, and execute following operation:

Further, processor 1001 can call the semantic model learning training program stored in memory 1005, also Execute following operation:

By the matching word segmentation result individual character and/or word that include, be determined as the corresponding individual character of the training text and/or Word breath.

The present embodiment through the above scheme, when carrying out semantic model learning training, is based on preset core word bank, uses Two-way maximum matching method carries out word segmentation processing to training text, obtains the corresponding matching word segmentation result of training text (including individual character And/or core word), when there are core word, determine the corresponding dictionary type of core word, and according to preset semantic model, obtain Individual character and/or dictionary type corresponding index coding are taken, with according to acquired index coded representation training text, due in language Each individual character, dictionary type respectively correspond unique index coding in adopted model, rather than the corresponding unique index of each word is compiled Code, the quantity of individual character are far smaller than the quantity of word, and the order of magnitude of semantic model substantially reduces, and therefore, greatly improves semanteme The learning training efficiency of model.

Based on above-mentioned hardware configuration, the learning training embodiment of the method for semantic model of the present invention is proposed.

It is the flow diagram of the learning training method first embodiment of semantic model of the present invention referring to Fig. 2, Fig. 2.

In the first embodiment, the semantic model learning training method the following steps are included:

Step S10 is based on preset core word bank, carries out word segmentation processing to training text using two-way maximum matching method, The corresponding matching word segmentation result of the training text is obtained, the matching word segmentation result includes individual character and/or core word；

Step S20 determines the corresponding dictionary type of the core word when there are core word；

Step S30 obtains the individual character and/or the corresponding index of the dictionary type is compiled according to preset semantic model Code, with the training text according to the index coded representation, wherein in the semantic model, each individual character is corresponding unique Index coding, the corresponding unique index coding of each dictionary type.

Currently, being usually by each word using unique index coded representation, since the order of magnitude of dictionary is often tens Ten thousand grades, then the order of magnitude of corresponding index coding is also hundreds of thousands grade, this will lead to dimension disaster, and semantic model is especially big, Learning training speed is especially slow, low efficiency.

In order to improve the learning training efficiency of semantic model, the present invention proposes a kind of learning training method of semantic model, The learning training device of semantic model suitable for above-described embodiment.In the present embodiment, compiled by the index to semantic model Code carries out dimensionality reduction, to improve the learning training speed of semantic model.

Specifically, core word is predefined, establishes the corresponding core word bank of core word, core word includes movie name, drills Member's name, director names etc., correspondingly core word bank includes movie name library, Yan Yuanku, Dao Yanku, video display verb library etc., wherein Preset the corresponding unique index coding of dictionary type and the corresponding unique index coding of each individual character of each dictionary.For example, The default corresponding unique index of individual character " I " is encoded to " 39 ", and the corresponding unique index of individual character " one " is encoded to " 345 ", dictionary class The corresponding unique index coding " 9 " of type " film ".Based on the corresponding unique index coding of each dictionary type and each individual character Corresponding unique index coding, establishes semantic model.

Optionally, in order to further enhance the real-time of semantic model, dynamic update is carried out to core word bank.For example, logical The movie data library for crossing honeybee video carries out dynamic update to core word bank.

Chinese character removes traditional font and has more than common 3000 altogether, in addition more than uncommon 8000, so at most also with regard to more than 8000 A Chinese individual character is also by dimension from several plus core word adds the order of magnitude of individual character to be substantially no more than 9,000 grades after core word 100000 orders of magnitude are reduced to 9,000 orders of magnitude, and the size of semantic model is reduced to 5M or so from several G easily.And since a dictionary is Corresponding unique index coding, namely say in performer library that Liu De China, schoolmate, the Zhou Xing core words such as speed all correspond to the same index Coding, in this way, newly-increased core word and dictionary do not influence the final order of magnitude, the order of magnitude of semantic model is stablized 9,000 Grade.Since the order of magnitude of semantic model substantially reduces, the learning training speed of semantic model is greatly improved.

When carrying out semantic model learning training, for training text, firstly, preset core word bank is based on, using double Word segmentation processing is carried out to training text to maximum matching method, obtains the corresponding matching word segmentation result of training text, wherein matching point Word result includes the individual character and/or core word after splitting training text.

For example, by taking training text " I will see the film whose schoolmate of that so-and-so acts the leading role " as an example, based on preset Core word bank carries out word segmentation processing to training text using two-way maximum matching method, obtains the corresponding matching point of the training text Word result are as follows:

I, and sees, and one, it is a, that, it is a, who, who, who, schoolmate acts the leading role, film

Wherein, including individual character " I ", " wanting " " seeing " " one " " a " " that " " a " " who ", " who ", " who ", " ", Yi Jihe Heart word " schoolmate ", " protagonist ", " film ".

If matching in word segmentation result includes individual character, for individual character, it is based on semantic model, is determined single in matching word segmentation result The corresponding unique index coding of word.

If matching in word segmentation result includes core word, for core word, it is first determined the corresponding dictionary type of core word, For example, matching includes core word " schoolmate ", " protagonist " and " film " in word segmentation result still by taking above-mentioned example as an example, determine The corresponding dictionary type of core word " schoolmate " is performer library, and the corresponding dictionary type of core word " protagonist " is video display verb library, The corresponding dictionary type of core word " film " is movie name library.After the corresponding dictionary type of core word has been determined, it is based on language Adopted model determines the corresponding unique index coding of dictionary type.

Later, it according to the individual character of acquisition and/or the corresponding index coded representation training text of dictionary type, realizes to semanteme The learning training of model.For example, still by it is above-mentioned enumerate example for, training text is that " I will see that so-and-so whose Zhang Xue The film that friend acts the leading role ", obtaining matching word segmentation result by word segmentation processing includes individual character " I ", " wanting " " seeing " " one " " a " " that " " a " " who ", " who ", " who ", " " and core word " schoolmate ", " protagonist ", " film ".Based on semantic model, determine " I " corresponding unique index coding " 39 ", " will " unique index coding " 8230 " is corresponded to, " seeing " corresponding uniquely index is compiled Code " 8228 ", " one " corresponding unique index coding " 345 ", " a " corresponding unique index coding " 238 ", " that " is corresponding unique Index encode " 467 ", " who " corresponding unique index coding " 2490 ", " " corresponding unique index coding " 4611 ", " The corresponding unique index coding " 2 " of dictionary type belonging to schoolmate ", the corresponding unique index coding of dictionary type belonging to " protagonist " " 8235 ", the corresponding unique index coding " 9 " of dictionary type belonging to " film ", then by training text " I to see one that who So-and-so opens the film of schoolmate's protagonist " index of reference coded representation are as follows: [39,8230,8228,345,238,467,238,2490, 2490,2490,2,8235,4611,9].

Scheme provided in this embodiment is based on preset core word bank, using double when carrying out semantic model learning training Word segmentation processing is carried out to training text to maximum matching method, obtain the corresponding matching word segmentation result of training text (including individual character with/ Or core word), when there are core word, determine the corresponding dictionary type of core word, and according to preset semantic model, obtain single Word and/or dictionary type corresponding index coding, with according to acquired index coded representation training text, due in semantic mould Each individual character, dictionary type respectively correspond unique index coding in type, rather than each word corresponds to unique index coding, single The quantity of word is far smaller than the quantity of word, and the order of magnitude of semantic model substantially reduces, and therefore, greatly improves semantic model Learning training efficiency.

Further, the learning training method second embodiment that semantic model of the present invention is proposed based on first embodiment, In the present embodiment, as shown in figure 3, the step S10 includes:

Step S11 carries out word segmentation processing to training text using two-way maximum matching method, it is corresponding to obtain the training text Individual character and/or word；

Step S12 is based on the core word bank, judges whether institute's predicate is core word when there are word；If so, holding The row step S20；If it is not, thening follow the steps S13；

Institute's predicate is split as individual character by step S13, obtains the corresponding matching word segmentation result of the training text.

After carrying out word segmentation processing to training text using two-way maximum matching method, the corresponding individual character of training text is obtained And/or word.If including word, word may be the core word in core word bank, it is also possible to non-core word.In the present embodiment, When including word in the corresponding matching word segmentation result of training text, it is based on preset core word bank, whether grammatical term for the character is core word.

To the word of then core word, operates as described in the first embodiment, determine the corresponding dictionary type of core word, with Obtain corresponding index coding.

For not being the word of core word, word is split as individual character, in this way, the corresponding matching participle knot of the training text obtained Individual character and core word are just only included in fruit.

In a kind of situation, word segmentation processing is carried out to training text using two-way maximum matching method, it is corresponding to obtain training text Matching word segmentation result in include multiple words, at this point, be based on core word bank, judge whether each word is core word respectively.For It is not the word of core word in multiple words, which is split as individual character；For being the word of core word in multiple words, core word pair is determined The dictionary type answered, to obtain corresponding index coding.

Optionally, the step S11 includes:

Step a carries out word segmentation processing to training text using two-way maximum matching method, it is corresponding to obtain the training text Two-way word segmentation result；

Step b carries out word frequency analysis to the two-way word segmentation result, determines the instruction according to the word frequency data counted in advance Practice the corresponding individual character of text and/or word.

When carrying out word segmentation processing to training text using two-way maximum matching method, namely Forward Maximum Method is respectively adopted Method and reverse maximum matching method carry out word segmentation processing to training text, obtain corresponding two-way word segmentation result, including positive participle As a result with reverse word segmentation result.

For example, carrying out word segmentation processing by taking training text " film about ocean " as an example using Forward Maximum Method method, obtaining The positive word segmentation result obtained are as follows: about, ocean, film；Using reverse maximum matching method carry out word segmentation processing, acquisition it is inverse To word segmentation result are as follows: it closes, Yu Haiyang, film.

In the present embodiment, also statistics has the corresponding word frequency data of each word in advance.For example, carrying out word according to search dog corpus Frequency counts, and obtains corresponding word frequency data.

Forward Maximum Method method and reverse maximum matching method is being respectively adopted, word segmentation processing is carried out to training text, is obtained After corresponding forward direction word segmentation result and reverse word segmentation result, according to the word frequency data counted in advance, to positive word segmentation result and inverse Word frequency analysis is carried out to word segmentation result, determines that the corresponding matching word segmentation result of training text is positive word segmentation result, or is reverse Word segmentation result, namely determine the corresponding individual character of training text and/or word.

Optionally, the step b includes:

Step b1 carries out word frequency to the positive word segmentation result and reverse word segmentation result respectively according to the word frequency data Analysis obtains the positive corresponding first word frequency total amount of word segmentation result and corresponding second word of the reverse word segmentation result Frequency total amount；

Step b2, according to the first word frequency total amount and the second word frequency total amount, determine the positive word segmentation result or The reverse word segmentation result is the matching word segmentation result；

Step b3, the individual character and/or word for including by the matching word segmentation result, is determined as the corresponding list of the training text Word and/or word.

Optionally, Forward Maximum Method method and reverse maximum matching method is being respectively adopted, training text is being carried out at participle Reason, after obtaining corresponding positive word segmentation result and reverse word segmentation result, according to the word frequency data counted in advance, respectively to forward direction point Word result and reverse word segmentation result carry out word frequency analysis, obtain the corresponding word frequency total amount of positive word segmentation result, and reverse The corresponding word frequency total amount of word segmentation result.For ease of description, the corresponding word frequency total amount of positive word segmentation result is hereafter known as first The corresponding word frequency total amount of reverse word segmentation result is known as the second word frequency total amount by word frequency total amount.

Later, the first word frequency total amount is compared with the second word frequency total amount, training text pair is determined according to comparison result The matching word segmentation result answered is positive word segmentation result or reverse word segmentation result, will individual character included by matching word segmentation result and/or Word is determined as training text and carries out segmenting corresponding individual character and/or word.

For example, still by taking above-mentioned " film about ocean " training text enumerated as an example, obtain positive word segmentation result and After reverse word segmentation result, according to the word frequency data counted in advance, the word frequency of " about+ocean " is much larger than the word of " pass+Yu Haiyang " Frequently, accordingly, it is determined that positive word segmentation result is that the corresponding matching word segmentation result of training text namely training text segment are as follows: about, Ocean, film.

Optionally, the step b1 includes:

Step b11, according to the word frequency data, the individual character and/or the corresponding word of word for including by the positive word segmentation result Frequency amount adds up, and obtains the corresponding first word frequency total amount of the positive word segmentation result；

Step b12, according to the word frequency data, the individual character for including by the reverse word segmentation result and/or the corresponding word of word Frequency amount adds up, and obtains the corresponding second word frequency total amount of the reverse word segmentation result.

Optionally, after obtaining corresponding positive word segmentation result and reverse word segmentation result, according to the word frequency number counted in advance According to, each individual character and/or the corresponding word frequency amount of word that the positive word segmentation result of acquisition includes, and the list for including by positive word segmentation result Word and/or the corresponding word frequency amount of word add up, and obtain the corresponding first word frequency total amount of positive word segmentation result.And count in advance Word frequency data, obtain each individual character and/or the corresponding word frequency amount of word that reverse word segmentation result includes, and by reverse word segmentation result packet The corresponding word frequency amount of the individual character and/or word included adds up, and obtains the corresponding second word frequency total amount of reverse word segmentation result.

Optionally, in a kind of implementation example, the step b2 includes:

When the ratio of the first word frequency total amount and the second word frequency total amount is greater than default ratio, the forward direction is determined Word segmentation result is the matching word segmentation result；Described in being less than when the ratio of the first word frequency total amount and the second word frequency total amount When default ratio, determine that the reverse word segmentation result is the matching word segmentation result.

Optionally, the corresponding first word frequency total amount of positive word segmentation result is being got, and inversely word segmentation result is corresponding After second word frequency total amount, the ratio of the first word frequency total amount and the second word frequency total amount is calculated, the ratio and default ratio are carried out Compare, if the ratio of the first word frequency total amount and the second word frequency total amount is greater than default ratio, namely illustrates that positive word segmentation result is corresponding The first word frequency total amount be much larger than the corresponding second word frequency total amount of reverse word segmentation result, at this point, determine forward direction word segmentation result be instruction Practice the corresponding matching word segmentation result of text.Conversely, if the ratio of the first word frequency total amount and the second word frequency total amount is less than default ratio, Namely illustrate that the corresponding first word frequency total amount of positive word segmentation result is less than the corresponding second word frequency total amount of reverse word segmentation result, this When, determine that reverse word segmentation result is the corresponding matching word segmentation result of training text.

Optionally, implement in example at another, the step b2 includes:

When the difference of the first word frequency total amount and the second word frequency total amount is greater than preset difference value, the forward direction is determined Word segmentation result is the matching word segmentation result；Described in being less than when the difference of the first word frequency total amount and the second word frequency total amount When preset difference value, determine that the reverse word segmentation result is the matching word segmentation result.

Optionally, implement in example at another, when getting the corresponding first word frequency total amount of positive word segmentation result, with And after the corresponding second word frequency total amount of reverse word segmentation result, the difference of the first word frequency total amount and the second word frequency total amount is calculated, it will The difference is compared with preset difference value, if the difference of the first word frequency total amount and the second word frequency total amount is greater than preset difference value, namely Illustrate that the corresponding first word frequency total amount of positive word segmentation result is much larger than the corresponding second word frequency total amount of reverse word segmentation result, at this point, Determine that forward direction word segmentation result is the corresponding matching word segmentation result of training text.Conversely, if the first word frequency total amount and the second word frequency are total The difference of amount is less than preset difference value, at this point, determining that reverse word segmentation result is the corresponding matching word segmentation result of training text.

Scheme provided in this embodiment is carrying out word segmentation processing to training text using two-way maximum matching method, is being instructed After practicing the corresponding individual character of text and/or word, if wherein there is word, it is based on core word bank, judges whether the word is core word, if It is not core word, then the word is split as individual character, just only includes individual character and core in the corresponding matching word segmentation result of training text Therefore word avoids the training study to the non-core word of enormous quantity grade, further improves the learning training of semantic model Efficiency.

The present invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage has semanteme Model learning training program, the semantic model learning training program can be executed by one or more than one processor with In:

Further, following operation is also realized when the semantic model learning training program is executed by processor:

Each reality of learning training method of computer readable storage medium specific embodiment of the present invention and above-mentioned semantic model It is essentially identical to apply example, details are not described herein.

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or system.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.

The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. a kind of learning training method of semantic model, which is characterized in that the learning training method of the semantic model include with Lower step:

Based on preset core word bank, word segmentation processing is carried out to training text using two-way maximum matching method, obtains the training The corresponding matching word segmentation result of text, the matching word segmentation result includes individual character and/or core word；

According to preset semantic model, the individual character and/or the corresponding index coding of the dictionary type are obtained, according to Index training text described in coded representation, wherein in the semantic model, the corresponding unique index coding of each individual character, often The corresponding unique index coding of a dictionary type.

2. the learning training method of semantic model as described in claim 1, which is characterized in that described to be based on preset core word Library carries out word segmentation processing to training text using two-way maximum matching method, obtains the corresponding matching participle knot of the training text The step of fruit includes:

Using two-way maximum matching method to training text carry out word segmentation processing, obtain the corresponding individual character of the training text and/or Word；

3. the learning training method of semantic model as claimed in claim 2, which is characterized in that it is described when there are word, it is based on The core word bank judges that the step of whether institute's predicate is core word includes:

It is described if it is not, institute's predicate is then split as individual character, the step of obtaining the training text corresponding matching word segmentation result packet It includes:

Word is not core word if it exists, then institute's predicate is split as individual character, obtains the corresponding matching participle knot of the training text Fruit.

4. the learning training method of semantic model as claimed in claim 2 or claim 3, which is characterized in that described to use two-way maximum Matching method carries out word segmentation processing to training text, and the step of obtaining the corresponding individual character of the training text and/or word includes:

Word segmentation processing is carried out to training text using two-way maximum matching method, obtains the corresponding two-way participle knot of the training text Fruit；

According to the word frequency data counted in advance, word frequency analysis is carried out to the two-way word segmentation result, determines that the training text is corresponding Individual character and/or word.

5. the learning training method of semantic model as claimed in claim 4, which is characterized in that the two-way word segmentation result includes Positive word segmentation result and reverse word segmentation result, the word frequency data that the basis counts in advance carry out word to the two-way word segmentation result Frequency analysis, the step of determining the corresponding individual character of the training text and/or word include:

According to the word frequency data, word frequency analysis is carried out to the positive word segmentation result and reverse word segmentation result respectively, obtains institute State the corresponding first word frequency total amount of positive word segmentation result and the corresponding second word frequency total amount of the reverse word segmentation result；

According to the first word frequency total amount and the second word frequency total amount, the positive word segmentation result or the reverse participle are determined It as a result is the matching word segmentation result；

The individual character and/or word for including by the matching word segmentation result, are determined as the corresponding individual character of the training text and/or word.

6. the learning training method of semantic model as claimed in claim 5, which is characterized in that described according to first word frequency Total amount and the second word frequency total amount, determine the positive word segmentation result or the reverse word segmentation result is matching participle knot The step of fruit includes:

When the ratio of the first word frequency total amount and the second word frequency total amount is greater than default ratio, the positive participle is determined It as a result is the matching word segmentation result；

When the ratio of the first word frequency total amount and the second word frequency total amount is less than the default ratio, determine described reverse Word segmentation result is the matching word segmentation result.

7. the learning training method of semantic model as claimed in claim 5, which is characterized in that described according to first word frequency Total amount and the second word frequency total amount, determine the positive word segmentation result or reverse word segmentation result is the matching word segmentation result Step includes:

When the difference of the first word frequency total amount and the second word frequency total amount is greater than preset difference value, the positive participle is determined It as a result is the matching word segmentation result；

When the difference of the first word frequency total amount and the second word frequency total amount is less than the preset difference value, determine described reverse Word segmentation result is the matching word segmentation result.

8. the learning training method of semantic model as claimed in claim 5, which is characterized in that described according to the word frequency number According to respectively to the positive word segmentation result and reverse word segmentation result progress word frequency analysis, the acquisition positive word segmentation result is corresponding The first word frequency total amount and the reverse word segmentation result corresponding second word frequency total amount the step of include:

According to the word frequency data, individual character and/or the corresponding word frequency amount of word that the positive word segmentation result includes are added up, Obtain the corresponding first word frequency total amount of the positive word segmentation result；

According to the word frequency data, the corresponding word frequency amount of individual character and/or word that the reverse word segmentation result includes is added up, Obtain the corresponding second word frequency total amount of the reverse word segmentation result.

9. a kind of learning training device of semantic model, which is characterized in that the learning training device of the semantic model includes: to deposit Reservoir, processor and the semantic model learning training program that is stored on the memory and can run on the processor； Such as semanteme of any of claims 1-8 is realized when the semantic model learning training program is executed by the processor The step of learning training method of model.

10. a kind of computer readable storage medium, which is characterized in that be stored with semantic mould on the computer readable storage medium Type learning training program is realized when the semantic model learning training program is executed by processor as any in claim 1-8 The step of learning training method of semantic model described in.