CN112861516A - Experimental method for verifying influence of common sub-words on XLM translation model effect - Google Patents
Experimental method for verifying influence of common sub-words on XLM translation model effect Download PDFInfo
- Publication number
- CN112861516A CN112861516A CN202110079357.3A CN202110079357A CN112861516A CN 112861516 A CN112861516 A CN 112861516A CN 202110079357 A CN202110079357 A CN 202110079357A CN 112861516 A CN112861516 A CN 112861516A
- Authority
- CN
- China
- Prior art keywords
- sub
- words
- common
- subwords
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013519 translation Methods 0.000 title claims abstract description 51
- 238000002474 experimental method Methods 0.000 title claims abstract description 13
- 230000000694 effects Effects 0.000 title claims abstract description 11
- 238000000926 separation method Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 abstract description 10
- 238000011160 research Methods 0.000 abstract description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/226—Validation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to an experimental method for verifying influence of common sub-words on XLM translation model effects. The invention comprises the following steps: preprocessing a corpus pre-trained by an XLM (XLM translation model); verifying whether the performance of the XLM translation model is degraded: and pre-training the XLM by using the preprocessed corpus, initializing the translation model by using the pre-trained model, and observing the BLEU value of the new translation model. The pretreatment comprises the following steps: firstly, acquiring common subwords and word frequencies of all subwords in English and French subwords; then, randomly separating the common sub-words according to the separation proportion; reading all word lists of English-French sub-words, storing the word lists in a dictionary, and generating separated sub-word files in a follow-up mode; and initializing a dictionary by using the generated separated sub-word file, and finally structuring the model language database file by using the initialized dictionary. The invention verifies the influence of the common subwords on the BLEU value, and is helpful for the low-resource neural machine translation research of non-homologous languages.
Description
Technical Field
The invention relates to an experimental method for verifying influence of common sub-words on XLM translation model effects, and belongs to the technical field of natural language processing.
Background
Machine translation is one of the tasks in the field of natural language processing, is widely applied and has great research value and commercial value, and the development of machine translation is greatly promoted by the emergence of neural network machine translation. The neural machine translation needs a large amount of parallel linguistic data, and the development of low-resource neural machine translation is particularly important. Like english, low-resource neural machine translation of english-german equivalent source language pairs develops well, but chinese-english, a non-homologous language pair, does not work well. In order to analyze the reason that the middle English pair degenerates on a translation model and further to have deeper understanding on the low-resource neural machine translation of the non-homologous language pair, an experimental method for verifying the influence of common sub-words on the performance of an XLM translation model is provided.
Disclosure of Invention
The invention provides an experimental method for verifying the influence of common subwords on an XLM translation model effect, which is used for verifying the influence of the common subwords on a BLEU value, further analyzing the cause of degradation of the common subwords on the translation model, further providing more deep understanding on the low-resource neural machine translation of non-homologous language pairs, and facilitating the auxiliary proposal to solve the degradation problem of the low-resource neural machine translation of the non-homologous language pairs.
The technical scheme of the invention is as follows: an experimental method for verifying influence of common sub-words on XLM translation model effects, the method comprising:
step1, preprocessing a corpus pre-trained by an XLM translation model;
step2, verifying whether the performance of the XLM translation model is degraded: and pre-training the XLM by using the preprocessed corpus, initializing the translation model by using the pre-trained model, and observing the BLEU value of the new translation model.
Wherein the Step1 pretreatment comprises the following steps:
firstly, acquiring common subwords and word frequencies of all subwords in English and French subwords; then, randomly separating the common sub-words according to the separation proportion; reading all word lists of English-French sub-words, storing the word lists in a dictionary, and generating separated sub-word files in a follow-up mode; and initializing a dictionary by using the generated separated sub-word file, and finally structuring the model language database file by using the initialized dictionary.
As a further scheme of the invention, the method comprises the following specific steps:
step1.1, acquiring common subwords and word frequencies of all subwords in English and French subwords;
step1.2, randomly separating the common sub-words according to the separation proportion to obtain separated sub-word files;
firstly, multiplying the total number of the common subwords and the separation ratio to calculate the number of the common subwords to be separated, screening the common subwords by using a random function to obtain the common subwords to be separated and the common subwords which are not separated, and storing the common subwords and the common subwords separately; searching and storing the word frequency of the common sub-words to be separated in the English-French vocabulary respectively;
step1.3, reading a word list containing all English-French sub-words and storing the word list in a dictionary; all the word lists of English-French sub-words contain sub-words and word frequency;
step1.4, generating a separated sub-word file;
firstly, reading a dictionary containing a word list of all English-French sub-words, judging whether the common sub-words are common sub-words or not according to read data, and judging whether the common sub-words are separated or not if the common sub-words are common words; if the common sub-words are not the common sub-words, the judgment of whether to separate is not needed; when judging whether the common sub-words are separated or not, if the common sub-words are separated, marking the word frequency in English and French, and if the common sub-words are not separated, marking the word frequency in English and French as the total word frequency; finally, storing different types of sub-words in the same file by different marks;
step1.5, initializing a dictionary by using the generated separated sub-word file;
reading the file generated by Step1.4, respectively adding suffixes to the separated common subwords for distinguishing, and respectively representing the common subwords by different id serial numbers; and the corresponding word frequencies are also stored, for the common sub-words which are not separated, the corresponding id of the sub-word is directly recorded, the word frequency is recorded, and various members in the dictionary class are initialized;
step1.6, using the initialized dictionary to structure the model corpus file;
reading the sub-words in each row of sentences in the English-to-French corpus file processed by the BPE, replacing the corresponding sub-words by using the id serial numbers of the sub-words according to the initialized dictionary, adding ending identifiers at the tail end of each row, and finally storing the ending identifiers in an array; meanwhile, the beginning and end positions of the sentence identifiers are also saved in the binary file.
The invention has the beneficial effects that:
1. the invention verifies the influence of the common sub-words on the BLEU value;
2. the present invention helps in the study of low-resource neural machine translation in non-homologous languages, where in the XLM model, the source language and the target language share a vocabulary, and in the process of training the encoder, common subwords in homologous languages (such as English and French) are aligned in semantic space, and non-common subwords in English and French can also be aligned in semantic space better according to their positions relative to the common subwords. That is to say, the common sub-words in english and french provide the alignment information in semantic space for the english-french translation model, which is equivalent to the role of anchor point. Non-homologous languages (e.g., chinese and english) do not contain substantially common subwords, causing alignment of the source and target languages in the semantic space to be problematic. Experiments show that the missing of the common subword information has great influence on translation of the non-homologous language, and according to the missing, methods such as adding a bilingual dictionary to increase alignment information can be provided to improve the machine translation performance of the non-homologous language. Therefore, the invention is helpful for the low-resource neural machine translation research of non-homologous languages.
Drawings
FIG. 1 is a flow chart of the present invention for generating and applying a separated common subword file;
FIG. 2 is a graph illustrating the effect of different ratios of separating common sub-words on the English-to-legal BLEU value according to the present invention;
FIG. 3 is a graph of the effect of different scale separation common sub-words on the French-British BLEU value in accordance with the present invention.
Detailed Description
Example 1: as shown in fig. 1-3, an experimental method for verifying influence of common subwords on XLM translation model effects includes:
step1, preprocessing a corpus pre-trained by an XLM translation model;
step2, verifying whether the performance of the XLM translation model is degraded: and pre-training the XLM by using the preprocessed corpus, initializing the translation model by using the pre-trained model, and observing the BLEU value of the new translation model.
Wherein the Step1 pretreatment comprises the following steps:
firstly, acquiring common subwords and word frequencies of all subwords in English and French subwords; then, randomly separating the common sub-words according to the separation proportion; reading all word lists of English-French sub-words, storing the word lists in a dictionary, and generating separated sub-word files in a follow-up mode; and initializing a dictionary by using the generated separated sub-word file, and finally structuring the model language database file by using the initialized dictionary.
As a further scheme of the invention, the method comprises the following specific steps:
step1.1, acquiring common subwords and word frequencies of all subwords in English and French subwords;
english sub-words and word frequencies, French sub-words and word frequencies are respectively stored in two English and French word lists (vocab.en, vocab.fr) generated after BPE processing. Traversing an English word list, putting the subwords in a set, copying the set into a newly-built common subword set, then solving an intersection between the common subword set and the set of the French word list, wherein the intersection is the common subwords of English and French, and searching a word list file to store the common subwords and the corresponding words in a dictionary frequently.
Step1.2, randomly separating the common sub-words according to the separation proportion to obtain separated sub-word files;
firstly, multiplying the total number of the common subwords and the separation ratio to calculate the number of the common subwords to be separated, screening the common subwords by using a random function random. sample to obtain the common subwords to be separated and the common subwords not to be separated, and storing the common subwords and the common subwords separately; searching and storing the word frequency of the common sub-words to be separated in the English-French vocabulary respectively;
step1.3, reading a word list containing all English-French sub-words and storing the word list in a dictionary; all the word lists of English-French sub-words contain sub-words and word frequency;
BPE generates vocab.en-fr, which contains all english and french subwords, and stores the subwords and word frequencies in a dictionary for easy searching.
Step1.4, generating a separated sub-word file;
firstly, reading a dictionary containing a word list of all English-French sub-words, judging whether the common sub-words are common sub-words or not according to read data, and judging whether the common sub-words are separated or not if the common sub-words are common words; if the common sub-words are not the common sub-words, the judgment of whether to separate is not needed; when judging whether the common sub-words are separated or not, if the common sub-words are separated, marking the word frequency in English and French, and if the common sub-words are not separated, marking the word frequency in English and French as the total word frequency; finally, storing different types of sub-words in the same file by different marks;
1 represents true and 0 represents false. The final file form is as in table 1:
TABLE 1 separate subword files generated
Step1.5, initializing a dictionary by using the generated separated sub-word file;
reading the file generated by Step1.4, respectively adding suffixes (_1 represents English sub-words and _2represents French sub-words) to the separated common sub-words for distinguishing, and respectively representing by different id (serial number); storing the word frequencies corresponding to the sub words, directly recording the id corresponding to the sub word and recording the word frequency of the sub word for the common sub word which is not separated, initializing class member id2 words (each id corresponds to the dictionary of the word), word2id (each id corresponds to the dictionary of the id), counts (records the dictionary of the word frequency of each sub word) and split words (used for recording the collection of the common sub words) in a dictionary class (split dictionary);
step1.6, using the initialized dictionary to structure the model corpus file;
the train.en and train.fr files were structured using members of the splitdectionary. (index _ data function in spandictionary. py)
Reading the sub-words in each row of sentences in the English-to-French corpus file processed by the BPE, replacing the corresponding sub-words by using the id serial numbers of the sub-words according to the initialized dictionary, adding ending identifiers at the tail end of each row, and finally storing the ending identifiers in an array; meanwhile, the beginning and end positions of the sentence identifiers are also saved in the binary file.
Fig. 2 and fig. 3 show the translation results obtained by using the present invention, and the evaluation method uses the international universal BLEU index, and the higher the value, the better. In the figure, the X-axis represents the separation ratio of common subwords, and the Y-axis represents the evaluation index. Fig. 2 shows a translation result from english to french, and fig. 3 shows a translation result from french to english.
Tests show that the method effectively verifies the importance of the common sub-words in the English translation model. Effective analysis experiments are carried out for the degradation of the Chinese-English translation model.
Experiments show that the missing of the common subword information has great influence on translation of the non-homologous language, and according to the missing, methods such as adding a bilingual dictionary to increase alignment information can be provided to improve the machine translation performance of the non-homologous language. Therefore, the invention is helpful for the low-resource neural machine translation research of non-homologous languages.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.
Claims (2)
1. An experimental method for verifying influence of common sub-words on XLM translation model effect is characterized in that: the method comprises the following steps:
step1, preprocessing a corpus pre-trained by an XLM translation model;
step2, verifying whether the performance of the XLM translation model is degraded: and pre-training the XLM by using the preprocessed corpus, initializing the translation model by using the pre-trained model, and observing the BLEU value of the new translation model.
Wherein the Step1 pretreatment comprises the following steps:
firstly, acquiring common subwords and word frequencies of all subwords in English and French subwords; then, randomly separating the common sub-words according to the separation proportion; reading all word lists of English-French sub-words, storing the word lists in a dictionary, and generating separated sub-word files in a follow-up mode; and initializing a dictionary by using the generated separated sub-word file, and finally structuring the model language database file by using the initialized dictionary.
2. The experimental method of claim 1 for verifying the effect of common subwords on XLM translation models, wherein: the method comprises the following specific steps:
step1.1, acquiring common subwords and word frequencies of all subwords in English and French subwords;
step1.2, randomly separating the common sub-words according to the separation proportion to obtain separated sub-word files;
firstly, multiplying the total number of the common subwords and the separation ratio to calculate the number of the common subwords to be separated, screening the common subwords by using a random function to obtain the common subwords to be separated and the common subwords which are not separated, and storing the common subwords and the common subwords separately; searching and storing the word frequency of the common sub-words to be separated in the English-French vocabulary respectively;
step1.3, reading a word list containing all English-French sub-words and storing the word list in a dictionary; all the word lists of English-French sub-words contain sub-words and word frequency;
step1.4, generating a separated sub-word file;
firstly, reading a dictionary containing a word list of all English-French sub-words, judging whether the common sub-words are common sub-words or not according to read data, and judging whether the common sub-words are separated or not if the common sub-words are common words; if the common sub-words are not the common sub-words, the judgment of whether to separate is not needed; when judging whether the common sub-words are separated or not, if the common sub-words are separated, marking the word frequency in English and French, and if the common sub-words are not separated, marking the word frequency in English and French as the total word frequency; finally, storing different types of sub-words in the same file by different marks;
step1.5, initializing a dictionary by using the generated separated sub-word file;
reading the file generated by Step1.4, respectively adding suffixes to the separated common subwords for distinguishing, and respectively representing the common subwords by different id serial numbers; and the corresponding word frequencies are also stored, for the common sub-words which are not separated, the corresponding id of the sub-word is directly recorded, the word frequency is recorded, and various members in the dictionary class are initialized;
step1.6, using the initialized dictionary to structure the model corpus file;
reading the sub-words in each row of sentences in the English-to-French corpus file processed by the BPE, replacing the corresponding sub-words by using the id serial numbers of the sub-words according to the initialized dictionary, adding ending identifiers at the tail end of each row, and finally storing the ending identifiers in an array; meanwhile, the beginning and end positions of the sentence identifiers are also saved in the binary file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110079357.3A CN112861516B (en) | 2021-01-21 | 2021-01-21 | Experimental method for verifying influence of common subword on XLM translation model effect |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110079357.3A CN112861516B (en) | 2021-01-21 | 2021-01-21 | Experimental method for verifying influence of common subword on XLM translation model effect |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112861516A true CN112861516A (en) | 2021-05-28 |
CN112861516B CN112861516B (en) | 2023-05-16 |
Family
ID=76008519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110079357.3A Active CN112861516B (en) | 2021-01-21 | 2021-01-21 | Experimental method for verifying influence of common subword on XLM translation model effect |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112861516B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108563640A (en) * | 2018-04-24 | 2018-09-21 | 中译语通科技股份有限公司 | A kind of multilingual pair of neural network machine interpretation method and system |
CN109033042A (en) * | 2018-06-28 | 2018-12-18 | 中译语通科技股份有限公司 | BPE coding method and system, machine translation system based on the sub- word cell of Chinese |
CN109815456A (en) * | 2019-02-13 | 2019-05-28 | 北京航空航天大学 | A method of it is compressed based on term vector memory space of the character to coding |
CN110413736A (en) * | 2019-07-25 | 2019-11-05 | 百度在线网络技术(北京)有限公司 | Across language text representation method and device |
CN110674646A (en) * | 2019-09-06 | 2020-01-10 | 内蒙古工业大学 | Mongolian Chinese machine translation system based on byte pair encoding technology |
CN110688862A (en) * | 2019-08-29 | 2020-01-14 | 内蒙古工业大学 | Mongolian-Chinese inter-translation method based on transfer learning |
CN110991192A (en) * | 2019-11-08 | 2020-04-10 | 昆明理工大学 | Method for constructing semi-supervised neural machine translation model based on word-to-word translation |
CN111414771A (en) * | 2020-03-03 | 2020-07-14 | 云知声智能科技股份有限公司 | Phrase-based neural machine translation method and system |
-
2021
- 2021-01-21 CN CN202110079357.3A patent/CN112861516B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108563640A (en) * | 2018-04-24 | 2018-09-21 | 中译语通科技股份有限公司 | A kind of multilingual pair of neural network machine interpretation method and system |
CN109033042A (en) * | 2018-06-28 | 2018-12-18 | 中译语通科技股份有限公司 | BPE coding method and system, machine translation system based on the sub- word cell of Chinese |
CN109815456A (en) * | 2019-02-13 | 2019-05-28 | 北京航空航天大学 | A method of it is compressed based on term vector memory space of the character to coding |
CN110413736A (en) * | 2019-07-25 | 2019-11-05 | 百度在线网络技术(北京)有限公司 | Across language text representation method and device |
CN110688862A (en) * | 2019-08-29 | 2020-01-14 | 内蒙古工业大学 | Mongolian-Chinese inter-translation method based on transfer learning |
CN110674646A (en) * | 2019-09-06 | 2020-01-10 | 内蒙古工业大学 | Mongolian Chinese machine translation system based on byte pair encoding technology |
CN110991192A (en) * | 2019-11-08 | 2020-04-10 | 昆明理工大学 | Method for constructing semi-supervised neural machine translation model based on word-to-word translation |
CN111414771A (en) * | 2020-03-03 | 2020-07-14 | 云知声智能科技股份有限公司 | Phrase-based neural machine translation method and system |
Non-Patent Citations (5)
Title |
---|
GUILLAUME LAMPLE 等: ""Cross-lingual Language Model Pretraining"", 《ARXIV》 * |
RICO SENNRICH 等: ""Neural Machine Translation of Rare Words with Subword Units"", 《ARXIV》 * |
RIOS ANNETTE 等: ""Subword segmentation and a single bridge language affect zero-shot neural machine translation"", 《5TH CONFERENCE ON MACHINE TRANSLATION》 * |
孙凌浩 等: ""基于跨语言迁移学习的实体关系抽取算法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
徐毓 等: ""基于深度可分离卷积的汉越神经机器翻译"", 《厦门大学学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112861516B (en) | 2023-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109726293A (en) | A kind of causal event map construction method, system, device and storage medium | |
CN105095196B (en) | The method and apparatus of new word discovery in text | |
CN111143531A (en) | Question-answer pair construction method, system, device and computer readable storage medium | |
Akhiroh | The influence of translation technique on the quality of the translation of international news in Seputar Indonesia daily | |
Sabtan et al. | Teaching Arabic machine translation to EFL student translators: A case study of Omani translation undergraduates | |
CN110674722B (en) | Test paper splitting method and system | |
CN113934814B (en) | Automatic scoring method for subjective questions of ancient poems | |
Ciobanu et al. | Temporal text classification for romanian novels set in the past | |
Akhmanova | Exact methods in linguistic research | |
Yue et al. | Translationese and interlanguage in inverse translation: A case study | |
CN112861516B (en) | Experimental method for verifying influence of common subword on XLM translation model effect | |
Cristea et al. | Automatic discrimination between inherited and borrowed Latin words in Romance languages | |
CN112085985B (en) | Student answer automatic scoring method for English examination translation questions | |
Berkling et al. | WISE: A Web-Interface for Spelling Error Recognition for German: A Description and Evaluation of the Underlying Algorithm. | |
CN115017404A (en) | Target news topic abstracting method based on compressed space sentence selection | |
CN113569560A (en) | Automatic scoring method for Chinese bilingual composition | |
Pilán et al. | Exploring word embeddings and phonological similarity for the unsupervised correction of language learner errors | |
CN117544831B (en) | Automatic decomposing method and system for classroom teaching links | |
Nurhandini et al. | The Translation Methods Used in the Subtitles of Dialogues in Maleficent Movie | |
CN112836047B (en) | Electronic medical record text data enhancement method based on sentence semantic replacement | |
CN114595688B (en) | Chinese cross-language word embedding method fusing word cluster constraint | |
Mbaye et al. | Beqi: Revitalize the senegalese wolof language with a robust spelling corrector | |
Chathuranga et al. | Opinion target extraction for student course feedback | |
Zmandar et al. | Multilingual Financial Word Embeddings for Arabic, English and French | |
Egli et al. | Voting Booklet Bias: Stance Detection in Swiss Federal Communication |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |