CN106844356A - A kind of method that English-Chinese mechanical translation quality is improved based on data selection - Google Patents
A kind of method that English-Chinese mechanical translation quality is improved based on data selection Download PDFInfo
- Publication number
- CN106844356A CN106844356A CN201710031264.7A CN201710031264A CN106844356A CN 106844356 A CN106844356 A CN 106844356A CN 201710031264 A CN201710031264 A CN 201710031264A CN 106844356 A CN106844356 A CN 106844356A
- Authority
- CN
- China
- Prior art keywords
- data
- words
- word
- field
- final
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of method for improving English-Chinese mechanical translation quality based on data selection, methods described includes:The form of expression of data separate bag of words is showed again;Recycle computational methods performance the distance between sentence of cosine, then by the correlation computations to cosine obtain each to final score;Conventional data is ranked up using score, the related data of final choice carry out the systematic training of machine translation system.One aspect of the present invention can reduce time cost and memory space cost in statictic machine translation system training process, because compared to the system trained with multi-field conventional data, the method can reduce the data volume of training data;On the other hand it is to compare related in content because the data for choosing all are to come from same field with data to be tested, so the performance of the system of the data training selected using the method in theory can be better than the machine translation system trained with all data.
Description
Technical field
Select to improve English-Chinese machine translation matter the invention belongs to data selection technique field, more particularly to a kind of data that are based on
The method of amount.
Background technology
With the proposition of IBM statistical models, the machine translation method based on statistics instead of rule-based translation gradually
Method turns into the machine translation method of main flow at this stage.Its basic idea is using the method for counting from large-scale bilingual language
Automatic study translation knowledge, builds translation model in material.
In traditional statistical machine translation, the quality of corpus directly decides the quality of final translation system.At this
The epoch of individual information explosion, the information of internet growth exponentially, while also for machine translation provides substantial amounts of list
Language or bilingual corpora.
In theory with the increase of training data quantity, the quality of translation system can become better and better.However, experimentation have shown that working as
Training data is reached after an order of magnitude, and being further added by the quality of training data can only allow the translation result of translation system to obtain very
Small lifting, or even can sometimes reduce the translation quality of translation system, it can be seen that the quality of translation system not only with training
The quantity of data has relation.Data source on internet is complicated, while also tend to belong to different fields in content, including political affairs
Control, economic, tourism, amusement etc..When being test for data and training data and belonging to same field, often effect than with
The effect of multiple fields or the translation system of other single field training is good.For example, if the data of test set are come
From in political realms, an English-Chinese translation system trained with 500W political realms data with 500W than being entertained in theory
It is more preferable that the English-Chinese translation system of FIELD Data training is showed.
In summary 2 points, training data is not The more the better, and a word of English may in different fields
Having different translators of Chinese causes the increasing for sometimes quantity the performance of translation system is become worse.Based on data selection
Domain-adaptive method be suggested to solve this problem, the core concept of this method is exactly at one
The training data related to test data is selected in multi-field data, translation system is trained using the data for choosing, then
Data to be translated are translated using this system.
In sum, most machine translation systems are obtained by tens million of or even more than one hundred million bilingual data training at this stage
's;Whole training process needs the substantial amounts of training time, while being also required to huge disk space comes data storage and model, together
When the translation system trained with a large amount of multi-field training datas the translation result of certain specific area can not be reached it is best, and
Translation result is optimal merely with the part in these data or by giving specific data weight higher in fact.
The content of the invention
It is an object of the invention to provide a kind of method for improving English-Chinese mechanical translation quality based on data selection, it is intended to solve
Certainly most machine translation systems need the substantial amounts of training time in whole training process at this stage, while being also required to huge magnetic
Disk space comes data storage and model, and the translation system at the same time with a large amount of multi-field training datas training is specific to certain
The translation result in field can not reach best problem.
The present invention is achieved in that
A kind of method that English-Chinese mechanical translation quality is improved based on data selection, it is described that English-Chinese machine is improved based on data selection
The method of device translation quality includes:
Step one, the form of expression of data separate bag of words is showed again;
Step 2, recycles the distance between computational methods performance sentence of cosine, then by the correlation computations to cosine
Obtain each to final score;
Step 3, is ranked up using score to conventional data, and the related data of final choice carry out machine translation system
Systematic training.
Further, data need to prepare three kinds of data before being converted to the form of bag of words, and a kind of is comprising the general of each field
Data;It is for second data in the field of data related to testing data or specific area;The third is and test set
Data outside unrelated or completely irrelevant with the specific area field of data.
Further, in step one, the form of expression of data separate bag of words is showed again, is specifically included:
Three kinds of data are all converted into the form of bag of words;The bag of words are the matrix of row of N row, and the number of N is equal to whole
The total amount of word in individual data;
Assuming that in short having n word, SiI-th word is represented, then i ∈ [1,2,3 ..., n], if the common m row of bag of words, Vj
Represent the representative word of jth row, VjcRepresent the final numerical value of the row, Vjc∈ [0,1] then bag of words m row in jth row final number
Value is represented with equation below:
For every a word, if comprising the corresponding word of the i-th row, the train value is 1, is represented with 0 if not comprising if.
If certain packet is respectively I am a boy and I am a girl containing two words, the data include five altogether
The value that kind word is respectively I am a boy girl, N is 5.It is assumed that this five columns value respectively correspond to word I, am, a, boy and
girl.Then the bag of words form of expression of a word is (1,1,1,1,0), and the second word can then be expressed as (1,1,1,0,1).
Further, in step 2, the distance between computational methods performance sentence of cosine is recycled, then by cosine
Correlation computations obtain each to final score, specifically include,
The algorithm for comparing correlation is calculated using cosine value, and the cosine value of every two word is calculated with formula below:
Wherein S and T corresponding two word respectively, i represent the value of the vector i-th row, in conventional data original language it is every
Word, calculates its every cosine value of words of original language with data in field, then all cosine values corresponding to the word are carried out
Summation is averaged
Wherein CjThe cosine value of the words and the jth word of data in field is represented, m represents the sentence of data in field
Number;Identical operation is carried out to the words of original language every of data outside field again and tries to achieve POS, it is also carried out for object language same
P is tried to achieve in operationITAnd POT;The scoring of final the words is determined by following formula:
P=PIS-POS+PIT-POT,
P in formulaISRepresent that this belongs to the probability of data in the field in original language direction, POSThen represent that the sentence belongs to
The probability of the field extraneous data in original language direction, PITWith POTRepresent respectively this belong in the field in object language direction and
The probability of field extraneous data.
Further, in step 3, conventional data is ranked up using score, the related data of final choice carry out machine
The systematic training of translation system, specifically includes:
Select specific data;There is a final scoring for every words inside conventional data after step 2, use
Cosine value represent two words apart from when numerical value it is bigger represent two words it is more similar, according to this score to the institute in data
The order for carrying out from high to low is ranked up with sentence, the final data for choosing special ratios are used as final training data;Institute
It is N words before specific to state final training data, or the specific percentage of selection data;The data for choosing are most
Whole training data.By extract in training data each to word alignment and corresponding probability obtain translation model, lead to
Cross statistics object language list language data n units frequency and carry out train language model, and by issuable during phrase extraction
Phrase or word reconfigure to train reconstructed models.
A kind of domain-adaptive method based on data selection that the present invention is provided, it can select relatively effective
Training data, using this partial data train translation system so that the performance of English-Chinese translation system gets a promotion.
This domain-adaptive method based on data selection of the present invention on the one hand can be with time-consuming and space cost, separately
On the one hand the performance for training the translation system come can also be allowed better than the performance that the translation system for obtaining is trained with all data.
One aspect of the present invention can reduce the time cost and memory space in statictic machine translation system training process
Cost.Because compared to the system trained with multi-field conventional data, the method can reduce the data volume of training data.It is another
Aspect is all to come from same field with data to be tested due to the data for choosing, and is to compare related, institute in content
The performance of the system trained with the data selected using the method in theory can be better than the machine translation system trained with all data
System.On time cost, by the use of a 20000000 general sentences to training a translation system to take around 24 as training data
Hour, and about select 5,000,000 data can then to train a more preferable specific area translation system of performance using this method, and
5000000 training datas train a translation system to only need to about 4 hours using same configuration and training parameter.Storing into
In sheet, system translation model, language model and the reconstructed models that 20,000,000 data are trained have altogether and account for 37GB, and 5,000,000 training
These three models that data are produced add up about 9GB altogether.In News Field, using above-mentioned data and data selecting method
The translation result of test set can be allowed to lift 1 to 2 BLEU values.
Brief description of the drawings
Fig. 1 is the method flow diagram for improving English-Chinese mechanical translation quality based on data selection provided in an embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
Application principle of the invention is described in detail below in conjunction with the accompanying drawings.
As shown in figure 1, a kind of side for improving English-Chinese mechanical translation quality based on data selection provided in an embodiment of the present invention
Method, the method for improving English-Chinese mechanical translation quality based on data selection includes:
S101:The form of expression of data separate bag of words is showed again.
S102:The distance between computational methods performance sentence of cosine is recycled, then is obtained by the correlation computations to cosine
To each to final score.
S103:Conventional data is ranked up using score, the related data of final choice carry out machine translation system
Systematic training.
Application principle of the invention is further described with reference to specific embodiment.
The method of the data-driven that statistical machine translation is utilized, so data volume is bigger in theory, machine translation system
Performance it is better.The order of magnitude of most of commercial system training data has reached ten million or even hundred million grades on the market, so huge
Data volume on the one hand can take substantial amounts of memory space, on the other hand also may require that huge time cost.But actually
Quantity of the quality of translation system not only with training data has relation, while also having very big pass with the quality of training data
System.When data to be tested and training data compare related in terms of content, there is a strong possibility that property can not yet for test result
It is wrong.Domain-adaptive method based on data selection puts forward to solve such problem.The master of this method
Want thought be pick out specific area from a big multi-field data or the data related to data to be tested enter
The training of row translation system, the method that the present invention puts forward falls within one kind of this method.
Method proposed by the present invention can substantially be divided into three steps,
The first step is the form that all data are converted to bag of words;
Second step be each sentence in more common data with field in and field extraneous data correlation;
3rd step is the final profit to being ranked up according to the relevance score of previous step to each English-Chinese sentence of conventional data
With the machine translation system of the data training need elected.
Application principle of the invention is further described with reference to data conversion.
Before data selecting method is carried out, three kinds of data need to be prepared, one kind is conventional data, i.e., comprising the number in each field
According to, this partial data enormous amount, the data of final training are also to be chosen from this data.Second is in field
Data, i.e., data related to testing data or the data of specific area.Last one kind is data outside field, i.e., with test
Unrelated or completely irrelevant with the specific area data of collection data.
In this step, three kinds of data are all converted into the form of bag of words.Bag of words are a kind of matrix method for expressing, are one
The matrix of row of N row is planted, the number of N is equal to the total amount of word in whole data.For every a word, if right comprising the i-th row
The word answered, then the train value is 1, is represented with 0 if not comprising if.If certain packet is respectively I am a containing two words
Boy and I am a girl, the value that the data are respectively I am a boy girl, N comprising five kinds of words altogether is 5.It is assumed that this
Five columns values correspond to word I, am, a, boy and girl respectively.Then the bag of words form of expression of a word is (1,1,1,1,0), and
Second word can then be expressed as (1,1,1,0,1).
Relatively application principle of the invention is further described with reference to correlation.
The step is primarily to every words in more common FIELD Data and number outside the estimated field of data in field
According to correlation.
The main algorithm of correlation is calculated using cosine value, and the cosine value of every two word can be calculated with formula below:
Wherein S and T corresponding two word respectively, i represent the value of the vector i-th row, in conventional data original language it is every
Word, calculates its every cosine value of words of original language with data in field, then all cosine values corresponding to the word are carried out
Summation is averaged
Wherein CjThe cosine value of the words and the jth word of data in field is represented, m represents the sentence of data in field
Number.Identical operation is carried out to the words of original language every of data outside field again and tries to achieve POS, it is also carried out for object language same
P is tried to achieve in operationITAnd POT.The scoring of final the words is determined by following formula:
P=PIS-POS+PIT-POT;
P in formulaISRepresent that this belongs to the probability of data in the field in original language direction, POSThen represent that the sentence belongs to
The probability of the field extraneous data in original language direction, PITWith POTRepresent respectively this belong in the field in object language direction and
The probability of field extraneous data.
Application principle of the invention is further described with reference to data selection.
This step mainly selects specific data.Had for every words inside conventional data after second step
Individual final scoring, with cosine value represent two words apart from when numerical value is bigger represents that two words are more similar, so according to
This scoring to data in so sentence is ranked up to the order for carrying out from high to low, the final data for choosing special ratios are made
It is final training data, can is specific preceding N words, it is also possible to select the data of specific percentage.Selected using data
Select translation model, language model and the reconstructed models of the training machine translation system elected.
The present invention form of expression of data separate bag of words is showed again, recycle cosine computational methods performance sentence it
Between distance, then by the correlation computations to cosine obtain each to final score, conventional data is carried out using score
Sequence, the related data of final choice carry out the systematic training of machine translation system.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention
Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.
Claims (5)
1. a kind of method that English-Chinese mechanical translation quality is improved based on data selection, it is characterised in that described based on data selection
The method for improving English-Chinese mechanical translation quality includes:
Step one, the form of expression of data separate bag of words is showed again;
Step 2, recycles the distance between computational methods performance sentence of cosine, then obtain by the correlation computations to cosine
Each to final score;
Step 3, is ranked up using score to conventional data, and what the related data of final choice carried out machine translation system is
System training.
2. it is as claimed in claim 1 to be based on the method that data selection improves English-Chinese mechanical translation quality, it is characterised in that data
Need to prepare three kinds of data before being converted to the form of bag of words, a kind of is the conventional data comprising each field;Be for second with it is to be measured
Data in the field of the related data of data or specific area;The third be it is unrelated with test set data or with specific neck
Data outside the completely irrelevant field in domain.
3. it is as claimed in claim 1 to be based on the method that data selection improves English-Chinese mechanical translation quality, it is characterised in that step
In one, the form of expression of data separate bag of words is showed again, specifically included:
Three kinds of data are all converted into the form of bag of words;The bag of words are the matrix of row of N row, and the number of N is equal to whole number
According to the total amount of middle word;
Assuming that in short having n word, SiI-th word is represented, then i ∈ [1,2,3 ..., n], if the common m row of bag of words, VjRepresent
The representative word of jth row, VjcRepresent the final numerical value of the row, Vjc∈ [0,1] then bag of words m row in jth row final numerical value use
Equation below is represented:
For every a word, if comprising the corresponding word of the i-th row, the train value is 1, is represented with 0 if not comprising if.
4. it is as claimed in claim 1 to be based on the method that data selection improves English-Chinese mechanical translation quality, it is characterised in that step
In two, the distance between computational methods performance sentence of cosine is recycled, then each sentence is obtained by the correlation computations to cosine
To final score, specifically include,
The algorithm for comparing correlation is calculated using cosine value, and the cosine value of every two word is calculated with formula below:
Wherein S and T corresponding two word respectively, i represents the value of the vector i-th row, for every words of original language in conventional data,
Its every cosine value of words of original language with data in field is calculated, then all cosine values corresponding to the word carry out summation and take
Averagely
Wherein CjThe cosine value of the words and the jth word of data in field is represented, m represents the sentence number of data in field;It is right again
Every words of the original language of data carry out identical operation and try to achieve P outside fieldOS, it is also carried out same operation for object language and tries to achieve
PITAnd POT;The scoring of final the words is determined by following formula:
P=PIS-POS+PIT-POT,
P in formulaISRepresent that this belongs to the probability of data in the field in original language direction, POSThen represent that the sentence belongs to original language
The probability of the field extraneous data in direction, PITWith POTRepresent respectively this belong in the field in object language direction and field without
Close the probability of data.
5. it is as claimed in claim 1 to be based on the method that data selection improves English-Chinese mechanical translation quality, it is characterised in that step
In three, conventional data is ranked up using score, the related data of final choice carry out the systematic training of machine translation system,
Specifically include:
Select specific data;There is a final scoring for every words inside conventional data after step 2, use cosine
Value represent two words apart from when numerical value it is bigger represent two words it is more similar, according to this score to data in so sentence
Order to carrying out from high to low is ranked up, and the final data for choosing special ratios are used as final training data;It is described most
Whole training data is N words before specific, or the specific percentage of selection data;The data for choosing are final
Training data.By extract in training data each to word alignment and corresponding probability obtain translation model, by system
Meter object language list language data n units frequency carrys out train language model, and by issuable phrase during phrase extraction
Or word reconfigures to train reconstructed models.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710031264.7A CN106844356B (en) | 2017-01-17 | 2017-01-17 | Method for improving English-Chinese machine translation quality based on data selection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710031264.7A CN106844356B (en) | 2017-01-17 | 2017-01-17 | Method for improving English-Chinese machine translation quality based on data selection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106844356A true CN106844356A (en) | 2017-06-13 |
CN106844356B CN106844356B (en) | 2020-04-14 |
Family
ID=59123537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710031264.7A Active CN106844356B (en) | 2017-01-17 | 2017-01-17 | Method for improving English-Chinese machine translation quality based on data selection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106844356B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402919A (en) * | 2017-08-07 | 2017-11-28 | 中译语通科技(北京)有限公司 | Machine translation data selecting method and machine translation data selection system based on figure |
CN108257534A (en) * | 2018-01-11 | 2018-07-06 | 六盘水市人民医院 | A kind of cardiothoracic surgery health education control system |
CN109388808A (en) * | 2017-08-10 | 2019-02-26 | 陈虎 | It is a kind of for establishing the training data method of sampling of word translation model |
CN109740143A (en) * | 2018-11-28 | 2019-05-10 | 平安科技(深圳)有限公司 | Based on the sentence of machine learning apart from mapping method, device and computer equipment |
CN110889295A (en) * | 2019-09-12 | 2020-03-17 | 华为技术有限公司 | Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150286632A1 (en) * | 2014-04-03 | 2015-10-08 | Xerox Corporation | Predicting the quality of automatic translation of an entire document |
CN105701089A (en) * | 2015-12-31 | 2016-06-22 | 成都数联铭品科技有限公司 | Post-editing processing method for correction of wrong words in machine translation |
-
2017
- 2017-01-17 CN CN201710031264.7A patent/CN106844356B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150286632A1 (en) * | 2014-04-03 | 2015-10-08 | Xerox Corporation | Predicting the quality of automatic translation of an entire document |
CN105701089A (en) * | 2015-12-31 | 2016-06-22 | 成都数联铭品科技有限公司 | Post-editing processing method for correction of wrong words in machine translation |
Non-Patent Citations (2)
Title |
---|
LILIN ZHANG ET AL.: "Extract Domain-specific Paraphrase from Monolingual Corpus for Automatic Evaluation of Machine Translation", 《PROCEEDINGS OF THE FIRST CONFERENCE ON MACHINE TRANSLATION》 * |
陈燕 等: "《数据挖掘与聚类分析》", 30 November 2012, 大连海事大学出版社 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402919A (en) * | 2017-08-07 | 2017-11-28 | 中译语通科技(北京)有限公司 | Machine translation data selecting method and machine translation data selection system based on figure |
CN107402919B (en) * | 2017-08-07 | 2021-02-09 | 中译语通科技股份有限公司 | Machine translation data selection method and machine translation data selection system based on graph |
CN109388808A (en) * | 2017-08-10 | 2019-02-26 | 陈虎 | It is a kind of for establishing the training data method of sampling of word translation model |
CN109388808B (en) * | 2017-08-10 | 2024-03-08 | 陈虎 | Training data sampling method for establishing word translation model |
CN108257534A (en) * | 2018-01-11 | 2018-07-06 | 六盘水市人民医院 | A kind of cardiothoracic surgery health education control system |
CN109740143A (en) * | 2018-11-28 | 2019-05-10 | 平安科技(深圳)有限公司 | Based on the sentence of machine learning apart from mapping method, device and computer equipment |
CN109740143B (en) * | 2018-11-28 | 2022-08-23 | 平安科技(深圳)有限公司 | Sentence distance mapping method and device based on machine learning and computer equipment |
CN110889295A (en) * | 2019-09-12 | 2020-03-17 | 华为技术有限公司 | Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora |
CN110889295B (en) * | 2019-09-12 | 2021-10-01 | 华为技术有限公司 | Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora |
Also Published As
Publication number | Publication date |
---|---|
CN106844356B (en) | 2020-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107563498B (en) | Image description method and system based on visual and semantic attention combined strategy | |
CN101079026B (en) | Text similarity, acceptation similarity calculating method and system and application system | |
CN106844356A (en) | A kind of method that English-Chinese mechanical translation quality is improved based on data selection | |
CN107463607B (en) | Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning | |
CN102662931B (en) | Semantic role labeling method based on synergetic neural network | |
CN104199857B (en) | A kind of tax document hierarchy classification method based on multi-tag classification | |
US20210342371A1 (en) | Method and Apparatus for Processing Knowledge Graph | |
CN106844658A (en) | A kind of Chinese text knowledge mapping method for auto constructing and system | |
CN107895000B (en) | Cross-domain semantic information retrieval method based on convolutional neural network | |
US11775594B2 (en) | Method for disambiguating between authors with same name on basis of network representation and semantic representation | |
CN108846000A (en) | A kind of common sense semanteme map construction method and device based on supernode and the common sense complementing method based on connection prediction | |
CN107544958B (en) | Term extraction method and device | |
CN109508460A (en) | Unsupervised composition based on Subject Clustering is digressed from the subject detection method and system | |
CN108549718A (en) | A kind of general theme incorporation model joint training method | |
CN108021682A (en) | Open information extracts a kind of Entity Semantics method based on wikipedia under background | |
CN104537280B (en) | Protein interactive relation recognition methods based on text relation similitude | |
CN109815478A (en) | Medicine entity recognition method and system based on convolutional neural networks | |
CN104008301B (en) | A kind of field concept hierarchical structure method for auto constructing | |
CN108491399A (en) | Chinese to English machine translation method based on context iterative analysis | |
CN110991193A (en) | Translation matrix model selection system based on OpenKiwi | |
Firoozi et al. | The effect of fine-tuned word embedding techniques on the accuracy of automated essay scoring systems using neural networks | |
Kocoń et al. | Context-sensitive sentiment propagation in wordnet | |
CN115795018B (en) | Multi-strategy intelligent search question-answering method and system for power grid field | |
Lin et al. | Implanting rational knowledge into distributed representation at morpheme level | |
CN106250367B (en) | Method based on the improved Nivre algorithm building interdependent treebank of Vietnamese |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100040 Shijingshan District railway building, Beijing, the 16 floor Applicant after: Chinese translation language through Polytron Technologies Inc Address before: 100040 Shijingshan District railway building, Beijing, the 16 floor Applicant before: Mandarin Technology (Beijing) Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |