CN106844356A - A kind of method that English-Chinese mechanical translation quality is improved based on data selection - Google Patents

A kind of method that English-Chinese mechanical translation quality is improved based on data selection Download PDF

Info

Publication number
CN106844356A
CN106844356A CN201710031264.7A CN201710031264A CN106844356A CN 106844356 A CN106844356 A CN 106844356A CN 201710031264 A CN201710031264 A CN 201710031264A CN 106844356 A CN106844356 A CN 106844356A
Authority
CN
China
Prior art keywords
data
words
word
field
final
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710031264.7A
Other languages
Chinese (zh)
Other versions
CN106844356B (en
Inventor
程国艮
汪鸣
汪一鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mandarin Technology (beijing) Co Ltd
Original Assignee
Mandarin Technology (beijing) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mandarin Technology (beijing) Co Ltd filed Critical Mandarin Technology (beijing) Co Ltd
Priority to CN201710031264.7A priority Critical patent/CN106844356B/en
Publication of CN106844356A publication Critical patent/CN106844356A/en
Application granted granted Critical
Publication of CN106844356B publication Critical patent/CN106844356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of method for improving English-Chinese mechanical translation quality based on data selection, methods described includes:The form of expression of data separate bag of words is showed again;Recycle computational methods performance the distance between sentence of cosine, then by the correlation computations to cosine obtain each to final score;Conventional data is ranked up using score, the related data of final choice carry out the systematic training of machine translation system.One aspect of the present invention can reduce time cost and memory space cost in statictic machine translation system training process, because compared to the system trained with multi-field conventional data, the method can reduce the data volume of training data;On the other hand it is to compare related in content because the data for choosing all are to come from same field with data to be tested, so the performance of the system of the data training selected using the method in theory can be better than the machine translation system trained with all data.

Description

A kind of method that English-Chinese mechanical translation quality is improved based on data selection
Technical field
Select to improve English-Chinese machine translation matter the invention belongs to data selection technique field, more particularly to a kind of data that are based on The method of amount.
Background technology
With the proposition of IBM statistical models, the machine translation method based on statistics instead of rule-based translation gradually Method turns into the machine translation method of main flow at this stage.Its basic idea is using the method for counting from large-scale bilingual language Automatic study translation knowledge, builds translation model in material.
In traditional statistical machine translation, the quality of corpus directly decides the quality of final translation system.At this The epoch of individual information explosion, the information of internet growth exponentially, while also for machine translation provides substantial amounts of list Language or bilingual corpora.
In theory with the increase of training data quantity, the quality of translation system can become better and better.However, experimentation have shown that working as Training data is reached after an order of magnitude, and being further added by the quality of training data can only allow the translation result of translation system to obtain very Small lifting, or even can sometimes reduce the translation quality of translation system, it can be seen that the quality of translation system not only with training The quantity of data has relation.Data source on internet is complicated, while also tend to belong to different fields in content, including political affairs Control, economic, tourism, amusement etc..When being test for data and training data and belonging to same field, often effect than with The effect of multiple fields or the translation system of other single field training is good.For example, if the data of test set are come From in political realms, an English-Chinese translation system trained with 500W political realms data with 500W than being entertained in theory It is more preferable that the English-Chinese translation system of FIELD Data training is showed.
In summary 2 points, training data is not The more the better, and a word of English may in different fields Having different translators of Chinese causes the increasing for sometimes quantity the performance of translation system is become worse.Based on data selection Domain-adaptive method be suggested to solve this problem, the core concept of this method is exactly at one The training data related to test data is selected in multi-field data, translation system is trained using the data for choosing, then Data to be translated are translated using this system.
In sum, most machine translation systems are obtained by tens million of or even more than one hundred million bilingual data training at this stage 's;Whole training process needs the substantial amounts of training time, while being also required to huge disk space comes data storage and model, together When the translation system trained with a large amount of multi-field training datas the translation result of certain specific area can not be reached it is best, and Translation result is optimal merely with the part in these data or by giving specific data weight higher in fact.
The content of the invention
It is an object of the invention to provide a kind of method for improving English-Chinese mechanical translation quality based on data selection, it is intended to solve Certainly most machine translation systems need the substantial amounts of training time in whole training process at this stage, while being also required to huge magnetic Disk space comes data storage and model, and the translation system at the same time with a large amount of multi-field training datas training is specific to certain The translation result in field can not reach best problem.
The present invention is achieved in that
A kind of method that English-Chinese mechanical translation quality is improved based on data selection, it is described that English-Chinese machine is improved based on data selection The method of device translation quality includes:
Step one, the form of expression of data separate bag of words is showed again;
Step 2, recycles the distance between computational methods performance sentence of cosine, then by the correlation computations to cosine Obtain each to final score;
Step 3, is ranked up using score to conventional data, and the related data of final choice carry out machine translation system Systematic training.
Further, data need to prepare three kinds of data before being converted to the form of bag of words, and a kind of is comprising the general of each field Data;It is for second data in the field of data related to testing data or specific area;The third is and test set Data outside unrelated or completely irrelevant with the specific area field of data.
Further, in step one, the form of expression of data separate bag of words is showed again, is specifically included:
Three kinds of data are all converted into the form of bag of words;The bag of words are the matrix of row of N row, and the number of N is equal to whole The total amount of word in individual data;
Assuming that in short having n word, SiI-th word is represented, then i ∈ [1,2,3 ..., n], if the common m row of bag of words, Vj Represent the representative word of jth row, VjcRepresent the final numerical value of the row, Vjc∈ [0,1] then bag of words m row in jth row final number Value is represented with equation below:
For every a word, if comprising the corresponding word of the i-th row, the train value is 1, is represented with 0 if not comprising if.
If certain packet is respectively I am a boy and I am a girl containing two words, the data include five altogether The value that kind word is respectively I am a boy girl, N is 5.It is assumed that this five columns value respectively correspond to word I, am, a, boy and girl.Then the bag of words form of expression of a word is (1,1,1,1,0), and the second word can then be expressed as (1,1,1,0,1).
Further, in step 2, the distance between computational methods performance sentence of cosine is recycled, then by cosine Correlation computations obtain each to final score, specifically include,
The algorithm for comparing correlation is calculated using cosine value, and the cosine value of every two word is calculated with formula below:
Wherein S and T corresponding two word respectively, i represent the value of the vector i-th row, in conventional data original language it is every Word, calculates its every cosine value of words of original language with data in field, then all cosine values corresponding to the word are carried out Summation is averaged
Wherein CjThe cosine value of the words and the jth word of data in field is represented, m represents the sentence of data in field Number;Identical operation is carried out to the words of original language every of data outside field again and tries to achieve POS, it is also carried out for object language same P is tried to achieve in operationITAnd POT;The scoring of final the words is determined by following formula:
P=PIS-POS+PIT-POT,
P in formulaISRepresent that this belongs to the probability of data in the field in original language direction, POSThen represent that the sentence belongs to The probability of the field extraneous data in original language direction, PITWith POTRepresent respectively this belong in the field in object language direction and The probability of field extraneous data.
Further, in step 3, conventional data is ranked up using score, the related data of final choice carry out machine The systematic training of translation system, specifically includes:
Select specific data;There is a final scoring for every words inside conventional data after step 2, use Cosine value represent two words apart from when numerical value it is bigger represent two words it is more similar, according to this score to the institute in data The order for carrying out from high to low is ranked up with sentence, the final data for choosing special ratios are used as final training data;Institute It is N words before specific to state final training data, or the specific percentage of selection data;The data for choosing are most Whole training data.By extract in training data each to word alignment and corresponding probability obtain translation model, lead to Cross statistics object language list language data n units frequency and carry out train language model, and by issuable during phrase extraction Phrase or word reconfigure to train reconstructed models.
A kind of domain-adaptive method based on data selection that the present invention is provided, it can select relatively effective Training data, using this partial data train translation system so that the performance of English-Chinese translation system gets a promotion.
This domain-adaptive method based on data selection of the present invention on the one hand can be with time-consuming and space cost, separately On the one hand the performance for training the translation system come can also be allowed better than the performance that the translation system for obtaining is trained with all data.
One aspect of the present invention can reduce the time cost and memory space in statictic machine translation system training process Cost.Because compared to the system trained with multi-field conventional data, the method can reduce the data volume of training data.It is another Aspect is all to come from same field with data to be tested due to the data for choosing, and is to compare related, institute in content The performance of the system trained with the data selected using the method in theory can be better than the machine translation system trained with all data System.On time cost, by the use of a 20000000 general sentences to training a translation system to take around 24 as training data Hour, and about select 5,000,000 data can then to train a more preferable specific area translation system of performance using this method, and 5000000 training datas train a translation system to only need to about 4 hours using same configuration and training parameter.Storing into In sheet, system translation model, language model and the reconstructed models that 20,000,000 data are trained have altogether and account for 37GB, and 5,000,000 training These three models that data are produced add up about 9GB altogether.In News Field, using above-mentioned data and data selecting method The translation result of test set can be allowed to lift 1 to 2 BLEU values.
Brief description of the drawings
Fig. 1 is the method flow diagram for improving English-Chinese mechanical translation quality based on data selection provided in an embodiment of the present invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
Application principle of the invention is described in detail below in conjunction with the accompanying drawings.
As shown in figure 1, a kind of side for improving English-Chinese mechanical translation quality based on data selection provided in an embodiment of the present invention Method, the method for improving English-Chinese mechanical translation quality based on data selection includes:
S101:The form of expression of data separate bag of words is showed again.
S102:The distance between computational methods performance sentence of cosine is recycled, then is obtained by the correlation computations to cosine To each to final score.
S103:Conventional data is ranked up using score, the related data of final choice carry out machine translation system Systematic training.
Application principle of the invention is further described with reference to specific embodiment.
The method of the data-driven that statistical machine translation is utilized, so data volume is bigger in theory, machine translation system Performance it is better.The order of magnitude of most of commercial system training data has reached ten million or even hundred million grades on the market, so huge Data volume on the one hand can take substantial amounts of memory space, on the other hand also may require that huge time cost.But actually Quantity of the quality of translation system not only with training data has relation, while also having very big pass with the quality of training data System.When data to be tested and training data compare related in terms of content, there is a strong possibility that property can not yet for test result It is wrong.Domain-adaptive method based on data selection puts forward to solve such problem.The master of this method Want thought be pick out specific area from a big multi-field data or the data related to data to be tested enter The training of row translation system, the method that the present invention puts forward falls within one kind of this method.
Method proposed by the present invention can substantially be divided into three steps,
The first step is the form that all data are converted to bag of words;
Second step be each sentence in more common data with field in and field extraneous data correlation;
3rd step is the final profit to being ranked up according to the relevance score of previous step to each English-Chinese sentence of conventional data With the machine translation system of the data training need elected.
Application principle of the invention is further described with reference to data conversion.
Before data selecting method is carried out, three kinds of data need to be prepared, one kind is conventional data, i.e., comprising the number in each field According to, this partial data enormous amount, the data of final training are also to be chosen from this data.Second is in field Data, i.e., data related to testing data or the data of specific area.Last one kind is data outside field, i.e., with test Unrelated or completely irrelevant with the specific area data of collection data.
In this step, three kinds of data are all converted into the form of bag of words.Bag of words are a kind of matrix method for expressing, are one The matrix of row of N row is planted, the number of N is equal to the total amount of word in whole data.For every a word, if right comprising the i-th row The word answered, then the train value is 1, is represented with 0 if not comprising if.If certain packet is respectively I am a containing two words Boy and I am a girl, the value that the data are respectively I am a boy girl, N comprising five kinds of words altogether is 5.It is assumed that this Five columns values correspond to word I, am, a, boy and girl respectively.Then the bag of words form of expression of a word is (1,1,1,1,0), and Second word can then be expressed as (1,1,1,0,1).
Relatively application principle of the invention is further described with reference to correlation.
The step is primarily to every words in more common FIELD Data and number outside the estimated field of data in field According to correlation.
The main algorithm of correlation is calculated using cosine value, and the cosine value of every two word can be calculated with formula below:
Wherein S and T corresponding two word respectively, i represent the value of the vector i-th row, in conventional data original language it is every Word, calculates its every cosine value of words of original language with data in field, then all cosine values corresponding to the word are carried out Summation is averaged
Wherein CjThe cosine value of the words and the jth word of data in field is represented, m represents the sentence of data in field Number.Identical operation is carried out to the words of original language every of data outside field again and tries to achieve POS, it is also carried out for object language same P is tried to achieve in operationITAnd POT.The scoring of final the words is determined by following formula:
P=PIS-POS+PIT-POT
P in formulaISRepresent that this belongs to the probability of data in the field in original language direction, POSThen represent that the sentence belongs to The probability of the field extraneous data in original language direction, PITWith POTRepresent respectively this belong in the field in object language direction and The probability of field extraneous data.
Application principle of the invention is further described with reference to data selection.
This step mainly selects specific data.Had for every words inside conventional data after second step Individual final scoring, with cosine value represent two words apart from when numerical value is bigger represents that two words are more similar, so according to This scoring to data in so sentence is ranked up to the order for carrying out from high to low, the final data for choosing special ratios are made It is final training data, can is specific preceding N words, it is also possible to select the data of specific percentage.Selected using data Select translation model, language model and the reconstructed models of the training machine translation system elected.
The present invention form of expression of data separate bag of words is showed again, recycle cosine computational methods performance sentence it Between distance, then by the correlation computations to cosine obtain each to final score, conventional data is carried out using score Sequence, the related data of final choice carry out the systematic training of machine translation system.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims (5)

1. a kind of method that English-Chinese mechanical translation quality is improved based on data selection, it is characterised in that described based on data selection The method for improving English-Chinese mechanical translation quality includes:
Step one, the form of expression of data separate bag of words is showed again;
Step 2, recycles the distance between computational methods performance sentence of cosine, then obtain by the correlation computations to cosine Each to final score;
Step 3, is ranked up using score to conventional data, and what the related data of final choice carried out machine translation system is System training.
2. it is as claimed in claim 1 to be based on the method that data selection improves English-Chinese mechanical translation quality, it is characterised in that data Need to prepare three kinds of data before being converted to the form of bag of words, a kind of is the conventional data comprising each field;Be for second with it is to be measured Data in the field of the related data of data or specific area;The third be it is unrelated with test set data or with specific neck Data outside the completely irrelevant field in domain.
3. it is as claimed in claim 1 to be based on the method that data selection improves English-Chinese mechanical translation quality, it is characterised in that step In one, the form of expression of data separate bag of words is showed again, specifically included:
Three kinds of data are all converted into the form of bag of words;The bag of words are the matrix of row of N row, and the number of N is equal to whole number According to the total amount of middle word;
Assuming that in short having n word, SiI-th word is represented, then i ∈ [1,2,3 ..., n], if the common m row of bag of words, VjRepresent The representative word of jth row, VjcRepresent the final numerical value of the row, Vjc∈ [0,1] then bag of words m row in jth row final numerical value use Equation below is represented:
V j c = 1 , S i = V j 0 , O t h e r s
For every a word, if comprising the corresponding word of the i-th row, the train value is 1, is represented with 0 if not comprising if.
4. it is as claimed in claim 1 to be based on the method that data selection improves English-Chinese mechanical translation quality, it is characterised in that step In two, the distance between computational methods performance sentence of cosine is recycled, then each sentence is obtained by the correlation computations to cosine To final score, specifically include,
The algorithm for comparing correlation is calculated using cosine value, and the cosine value of every two word is calculated with formula below:
C = Σ i = 1 n S i T i Σ i = 1 n S i 2 Σ i = 1 n T i 2
Wherein S and T corresponding two word respectively, i represents the value of the vector i-th row, for every words of original language in conventional data, Its every cosine value of words of original language with data in field is calculated, then all cosine values corresponding to the word carry out summation and take Averagely
P I S = Σ j = 1 m C j m
Wherein CjThe cosine value of the words and the jth word of data in field is represented, m represents the sentence number of data in field;It is right again Every words of the original language of data carry out identical operation and try to achieve P outside fieldOS, it is also carried out same operation for object language and tries to achieve PITAnd POT;The scoring of final the words is determined by following formula:
P=PIS-POS+PIT-POT,
P in formulaISRepresent that this belongs to the probability of data in the field in original language direction, POSThen represent that the sentence belongs to original language The probability of the field extraneous data in direction, PITWith POTRepresent respectively this belong in the field in object language direction and field without Close the probability of data.
5. it is as claimed in claim 1 to be based on the method that data selection improves English-Chinese mechanical translation quality, it is characterised in that step In three, conventional data is ranked up using score, the related data of final choice carry out the systematic training of machine translation system, Specifically include:
Select specific data;There is a final scoring for every words inside conventional data after step 2, use cosine Value represent two words apart from when numerical value it is bigger represent two words it is more similar, according to this score to data in so sentence Order to carrying out from high to low is ranked up, and the final data for choosing special ratios are used as final training data;It is described most Whole training data is N words before specific, or the specific percentage of selection data;The data for choosing are final Training data.By extract in training data each to word alignment and corresponding probability obtain translation model, by system Meter object language list language data n units frequency carrys out train language model, and by issuable phrase during phrase extraction Or word reconfigures to train reconstructed models.
CN201710031264.7A 2017-01-17 2017-01-17 Method for improving English-Chinese machine translation quality based on data selection Active CN106844356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710031264.7A CN106844356B (en) 2017-01-17 2017-01-17 Method for improving English-Chinese machine translation quality based on data selection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710031264.7A CN106844356B (en) 2017-01-17 2017-01-17 Method for improving English-Chinese machine translation quality based on data selection

Publications (2)

Publication Number Publication Date
CN106844356A true CN106844356A (en) 2017-06-13
CN106844356B CN106844356B (en) 2020-04-14

Family

ID=59123537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710031264.7A Active CN106844356B (en) 2017-01-17 2017-01-17 Method for improving English-Chinese machine translation quality based on data selection

Country Status (1)

Country Link
CN (1) CN106844356B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402919A (en) * 2017-08-07 2017-11-28 中译语通科技(北京)有限公司 Machine translation data selecting method and machine translation data selection system based on figure
CN108257534A (en) * 2018-01-11 2018-07-06 六盘水市人民医院 A kind of cardiothoracic surgery health education control system
CN109388808A (en) * 2017-08-10 2019-02-26 陈虎 It is a kind of for establishing the training data method of sampling of word translation model
CN109740143A (en) * 2018-11-28 2019-05-10 平安科技(深圳)有限公司 Based on the sentence of machine learning apart from mapping method, device and computer equipment
CN110889295A (en) * 2019-09-12 2020-03-17 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286632A1 (en) * 2014-04-03 2015-10-08 Xerox Corporation Predicting the quality of automatic translation of an entire document
CN105701089A (en) * 2015-12-31 2016-06-22 成都数联铭品科技有限公司 Post-editing processing method for correction of wrong words in machine translation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286632A1 (en) * 2014-04-03 2015-10-08 Xerox Corporation Predicting the quality of automatic translation of an entire document
CN105701089A (en) * 2015-12-31 2016-06-22 成都数联铭品科技有限公司 Post-editing processing method for correction of wrong words in machine translation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LILIN ZHANG ET AL.: "Extract Domain-specific Paraphrase from Monolingual Corpus for Automatic Evaluation of Machine Translation", 《PROCEEDINGS OF THE FIRST CONFERENCE ON MACHINE TRANSLATION》 *
陈燕 等: "《数据挖掘与聚类分析》", 30 November 2012, 大连海事大学出版社 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402919A (en) * 2017-08-07 2017-11-28 中译语通科技(北京)有限公司 Machine translation data selecting method and machine translation data selection system based on figure
CN107402919B (en) * 2017-08-07 2021-02-09 中译语通科技股份有限公司 Machine translation data selection method and machine translation data selection system based on graph
CN109388808A (en) * 2017-08-10 2019-02-26 陈虎 It is a kind of for establishing the training data method of sampling of word translation model
CN109388808B (en) * 2017-08-10 2024-03-08 陈虎 Training data sampling method for establishing word translation model
CN108257534A (en) * 2018-01-11 2018-07-06 六盘水市人民医院 A kind of cardiothoracic surgery health education control system
CN109740143A (en) * 2018-11-28 2019-05-10 平安科技(深圳)有限公司 Based on the sentence of machine learning apart from mapping method, device and computer equipment
CN109740143B (en) * 2018-11-28 2022-08-23 平安科技(深圳)有限公司 Sentence distance mapping method and device based on machine learning and computer equipment
CN110889295A (en) * 2019-09-12 2020-03-17 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora
CN110889295B (en) * 2019-09-12 2021-10-01 华为技术有限公司 Machine translation model, and method, system and equipment for determining pseudo-professional parallel corpora

Also Published As

Publication number Publication date
CN106844356B (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN107563498B (en) Image description method and system based on visual and semantic attention combined strategy
CN101079026B (en) Text similarity, acceptation similarity calculating method and system and application system
CN106844356A (en) A kind of method that English-Chinese mechanical translation quality is improved based on data selection
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
CN102662931B (en) Semantic role labeling method based on synergetic neural network
CN104199857B (en) A kind of tax document hierarchy classification method based on multi-tag classification
US20210342371A1 (en) Method and Apparatus for Processing Knowledge Graph
CN106844658A (en) A kind of Chinese text knowledge mapping method for auto constructing and system
CN107895000B (en) Cross-domain semantic information retrieval method based on convolutional neural network
US11775594B2 (en) Method for disambiguating between authors with same name on basis of network representation and semantic representation
CN108846000A (en) A kind of common sense semanteme map construction method and device based on supernode and the common sense complementing method based on connection prediction
CN107544958B (en) Term extraction method and device
CN109508460A (en) Unsupervised composition based on Subject Clustering is digressed from the subject detection method and system
CN108549718A (en) A kind of general theme incorporation model joint training method
CN108021682A (en) Open information extracts a kind of Entity Semantics method based on wikipedia under background
CN104537280B (en) Protein interactive relation recognition methods based on text relation similitude
CN109815478A (en) Medicine entity recognition method and system based on convolutional neural networks
CN104008301B (en) A kind of field concept hierarchical structure method for auto constructing
CN108491399A (en) Chinese to English machine translation method based on context iterative analysis
CN110991193A (en) Translation matrix model selection system based on OpenKiwi
Firoozi et al. The effect of fine-tuned word embedding techniques on the accuracy of automated essay scoring systems using neural networks
Kocoń et al. Context-sensitive sentiment propagation in wordnet
CN115795018B (en) Multi-strategy intelligent search question-answering method and system for power grid field
Lin et al. Implanting rational knowledge into distributed representation at morpheme level
CN106250367B (en) Method based on the improved Nivre algorithm building interdependent treebank of Vietnamese

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100040 Shijingshan District railway building, Beijing, the 16 floor

Applicant after: Chinese translation language through Polytron Technologies Inc

Address before: 100040 Shijingshan District railway building, Beijing, the 16 floor

Applicant before: Mandarin Technology (Beijing) Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant