CN111723587A - Chinese-Thai entity alignment method oriented to cross-language knowledge graph - Google Patents

Chinese-Thai entity alignment method oriented to cross-language knowledge graph Download PDF

Info

Publication number
CN111723587A
CN111723587A CN202010578711.2A CN202010578711A CN111723587A CN 111723587 A CN111723587 A CN 111723587A CN 202010578711 A CN202010578711 A CN 202010578711A CN 111723587 A CN111723587 A CN 111723587A
Authority
CN
China
Prior art keywords
chinese
entity
thai
model
bilingual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010578711.2A
Other languages
Chinese (zh)
Inventor
黄永忠
吴辉文
庄浩宇
徐鑫宇
张晨昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianyun Xin'an Technology Co ltd
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202010578711.2A priority Critical patent/CN111723587A/en
Publication of CN111723587A publication Critical patent/CN111723587A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a cross-language knowledge graph-oriented Chinese and Tai entity alignment method, which is characterized by comprising the following steps of: 1) acquiring a bilingual data set; 2) constructing and training a machine translation model; 3) extracting entities; 4) and (5) translating and matching the entity. The method can more effectively and accurately realize the alignment of the bilingual entities, and solves the problem of low alignment degree of the existing entity constructed by crossing the language knowledge graph.

Description

Chinese-Thai entity alignment method oriented to cross-language knowledge graph
Technical Field
The invention relates to the field of artificial intelligence, belongs to a cross-language knowledge graph technology, and particularly relates to a cross-language knowledge graph-oriented Chinese and Tai entity alignment method.
Background
With the continuous development of artificial intelligence, knowledge is especially important in various fields of artificial intelligence. In recent years, cross-language knowledge maps have been constructed as a hot area for current research. Although sentences related to bilingual alignment on the internet are richer and richer at present, the accuracy of the alignment of multi-language entities is not satisfactory and the construction of a cross-language knowledge graph is limited due to the low degree of the alignment of the entities.
Generally, the entity alignment method commonly used at present is to perform entity identification first, and then find out an entity that is the same or similar in different languages through a corresponding technology, thereby implementing entity alignment of multiple languages. In aligned bilingual sentences, entities in the sentences all have corresponding entities in the aligned sentences, if the existing translation software such as *** translation, track translation or Baidu translation is directly used, the translation accuracy of the common translation software is higher for a small part of famous entities such as names of people and places, but for a large part of non-famous entities such as names of people, places and organizations, the common translation software is difficult to accurately translate the entities, so that the misreading is easy to occur, and the alignment effect is poor.
In order to improve the accuracy of entity alignment in bilingual sentences, in the case of entities such as non-famous names of people, places, organizations and the like, a feasible method is to train the existing bilingual sentences by a machine translation method to obtain corresponding machine translation models, extract entities in sentences of one language by a corresponding entity extraction method, and finally translate the extracted entities by using the trained translation models so as to match entities aligned in the other language in the sentences to achieve bilingual entity alignment. Because each entity word needing to be aligned in the bilingual sentence is contained in the trained translation model, the translation accuracy is more accurate for various non-famous entities, and the entity alignment effect is improved.
Disclosure of Invention
The invention aims to provide a cross-language knowledge graph-oriented Chinese-Tai entity alignment method in the cross-language knowledge graph construction process, aiming at the problem that the non-famous entity in a bilingual sentence in the prior art is not high in alignment accuracy. The method can more effectively and accurately realize the alignment of the bilingual entities, and solves the problem of low alignment degree of the existing entity constructed by crossing the language knowledge graph.
The technical scheme for realizing the purpose of the invention is as follows:
a cross-language knowledge graph-oriented Chinese-Tai entity alignment method comprises the following steps:
1) bilingual dataset acquisition: acquiring Chinese-Thai bilingual alignment data from a Wikidata and YAGO multi-language knowledge base or each big Chinese-Thai bilingual website, wherein the data sets are aligned Chinese-Thai bilingual sentences, and the aligned entities of the Chinese sentences can be found in the Thai sentences by the entities existing in the Chinese sentences;
2) constructing and training a machine translation model: the Machine Translation (MT) is a process of converting a natural language, i.e., a source language, into another natural language, i.e., a target language, by using a computer, inputting a source language sentence, and outputting a corresponding target language sentence, training a bilingual data set acquired in step 1) through a constructed machine translation model to obtain a trained hantao translation model, and then translating an extracted entity in step 4) through step 3), wherein the process is as follows:
1-2) data preprocessing: preprocessing the Chinese and Thai bilingual data set obtained in the step 1), converting the preprocessed Chinese and Thai bilingual data set into a standard data format for training a machine translation model, and dividing the bilingual data set into a Chinese sentence file Ch.txt and a Thai sentence file Th.txt, wherein each sentence in Ch.txt corresponds to each sentence in Th.txt;
2-2) word segmentation: the method comprises the following steps that a jieba word segmentation tool is adopted for segmenting words in a Chinese data set, a cutkum tool is adopted for segmenting words in a Thai data set, and a space is used for separating the words;
3-2) constructing a Transformer translation model: the Transformer model adopts a framework structure of an Encoder-Decoder, namely an Encoder-Decoder, which is typical in the Seq2Seq model, but unlike the Seq2Seq model, a Transformer Encoder and Decoder do not use a structure of a recurrent neural network, and the main structures of the Encoder and Decoder are as follows:
1-3-2) encoder: the coding layer in the Transformer model is composed of a plurality of same layer stacks, each layer is composed of two sub-layers of Multi-Head Attention, namely Multi-Head Attention, and fully-connected feedforward, namely Feed-Forward network, the multi-head Attention is used in the model to implement Self-Attention, compared with the common Attention mechanism, the Multi-Head Attention mechanism carries out Multi-path linear transformation on input, then respectively calculating the results of the Attention, splicing all the results, performing linear transformation again and outputting, wherein the Attention uses Dot Product, which is to avoid entering the saturation region of softmax due to the result of Dot Product being too large, therefore, after dot product, scale processing is carried out, the fully-connected feedforward network can carry out the same calculation, namely Position-wise, on each Position in the sequence, and the fully-connected feedforward network adopts a structure that ReLU activation is carried out in the middle of two times of linear transformation;
2-3-2) decoder: the decoder and the encoder have similar structures, but the layer of the decoder is added with a sub-layer with multi-head Attention compared with the layer of the encoder, so as to realize the Attention output by the encoder;
3-3-2) construction of Transformer translation model: constructing by adopting a hundred-degree PaddlePaddle, Pythrch or TensorFlow frame;
4-3-2) after the model is constructed, loading the data after word segmentation in the step 2-2) into the Transformer translation model for training to obtain a trained Transformer translation model, namely a Hantai translation model:
Ch-Th-Translation.model;
3) and (3) entity extraction: selecting a currently open-source Chinese entity extraction tool such as Stanford NLP or extracting entities in Chinese sentences by adopting a common Chinese named entity identification model such as BiLSTM + CRF, CRF + +, and the like;
4) entity translation and matching: the entity translation adopts the combination of the currently common translation software and a Transformer translation model, and the specific process is as follows:
1-4) firstly, translating the Chinese entity NER-A extracted in the step 3) by adopting currently common translation software such as Google translation, track translation or Baidu translation to obtain A translated entity NER1-A, then matching with A corresponding Thai sentence, if matching is successful, aligning the next entity, and if matching is failed, turning to the step 2-4);
2-4) translating the entity NER-A which is failed to be matched in the step 1-4) by utilizing the Ch-Th-translation model trained in the step 4-3-2) to obtain A translated entity NER2-A, matching the translated entity NER2-A with A corresponding Thai sentence, and obtaining an entity NER-A in the Chinese sentence and A corresponding entity NER-B in the Thai sentence if the matching is successful;
3-4) finally, the aligned "NER-A: NER-B ", namely, the entity alignment in the Chinese Thai bilingual sentence is completed.
Compared with the prior art, the method solves the problems of low translation accuracy and poor alignment effect of the existing translation software on the non-famous entities, improves the alignment quality of the multi-language entities, and reduces the difficulty in constructing the cross-language knowledge graph.
Drawings
FIG. 1 is a schematic diagram of a network structure of a transform translation model in an embodiment;
FIG. 2 is a schematic structural view of a head according to a further embodiment;
FIG. 3 is a schematic diagram illustrating an alignment process of Hantai bilingual entity in an embodiment;
FIG. 4 is an exemplary diagram of a jieba participle key code in an embodiment;
FIG. 5 is a diagram illustrating an example of data after jieba participle in the embodiment;
FIG. 6 is an exemplary diagram of a cutkum participle key code in an embodiment;
FIG. 7 is a diagram illustrating an example of data after word segmentation of cutkum in an embodiment;
fig. 8 is an exemplary diagram of the Stanford NLP entity extraction key code in the embodiment.
Detailed Description
The invention will be further elucidated with reference to the drawings and examples, without however being limited thereto.
Example (b):
the example takes a Chinese and Tai bilingual dataset as an example, takes Python as a development language, takes Pycharm software as a development environment,
referring to fig. 3, a cross-language knowledge graph-oriented Chinese Thai entity alignment method includes the following steps:
1) bilingual dataset acquisition: acquiring Chinese-Thai bilingual alignment data from a Wikidata, YAGO multi-language knowledge base or each big Chinese-Thai bilingual website, wherein the data sets are aligned Chinese-Thai bilingual sentences, and aligned entities of the Chinese sentences can be found in the Thai sentences by the entities existing in the Chinese sentences, in the example, as shown in the table a, the Chinese entities in the sentences 1-A in Chinese can find aligned Thai entities in sentences 1-B in Thai;
table a hantao aligned sentence data example
Figure BDA0002552306490000041
2) Constructing and training a machine translation model: constructing a Transformer translation model, training the Hantai bilingual data set obtained in the step 1) to obtain a trained Hantai translation model, and translating the extracted entity in the step 4) through the step 3), wherein the process is as follows:
1-2) data preprocessing: preprocessing the Chinese and Thai bilingual data set obtained in the step 1), converting the preprocessed Chinese and Thai bilingual data set into a standard data format for training a machine translation model, and dividing the bilingual data set into a Chinese sentence file Ch.txt and a Thai sentence file Th.txt, wherein each sentence in Ch.txt corresponds to each sentence in Th.txt;
2-2) word segmentation: the Chinese data set Ch.txt adopts a jieba word segmentation tool to perform word segmentation, and stores the data after word segmentation into a Ch _ Seq.txt file, for example, as a key code example of the jieba word segmentation shown in FIG. 4, words are separated from each other by a space, as shown in FIG. 5, sentences of the Thai data set Th.txt file are segmented by a cutkum tool, wherein FIG. 6 is a key code example of the cutkum word segmentation, the data after word segmentation is stored into the Th _ Seq.txt file, and the words are also separated by a space, as shown in FIG. 7;
3-2) constructing a Transformer translation model: the Transformer model adopts a framework structure of a typical Encoder-Decoder, namely an Encoder-Decoder in the Seq2Seq model, but unlike the Seq2Seq model, a Transformer Encoder and Decoder do not have a structure using a recurrent neural network, and the overall network structure is shown in fig. 1, and the main structures of the Encoder and Decoder are as follows:
1-3-2) encoder: the coding layer in the Transformer model is composed of a plurality of same layer stacks, each layer is composed of two sub-layers of Multi-Head Attention, namely Multi-Head Attention, and fully-connected feedforward, namely Feed-Forward network, the multi-head Attention is used in the model to implement Self-Attention, compared with the common Attention mechanism, the Multi-Head Attention mechanism carries out Multi-path linear transformation on input, then, the results of the Attention are calculated respectively, all the results are spliced, linear transformation is performed again, and the results are output, as shown in FIG. 2, wherein the Attention uses Dot Product, which is to avoid entering the saturation region of softmax due to the result of Dot Product being too large, therefore, after dot product, scale processing is carried out, the fully-connected feedforward network can carry out the same calculation, namely Position-wise, on each Position in the sequence, and the fully-connected feedforward network adopts a structure that ReLU activation is carried out in the middle of two times of linear transformation;
2-3-2) decoder: the decoder and the encoder have similar structures, but the layer of the decoder is added with a sub-layer with multi-head Attention compared with the layer of the encoder, so as to realize the Attention output by the encoder;
3-3-2) construction and training of a Transformer model: a Transformer model is constructed by adopting a Baidu PaddlePaddle framework, and the example adopts the following website for downloading:
https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/ machine_translation/transformer
4-3-2) after the model is constructed, loading the data after word segmentation in the step 2-2) into the Transformer model for training to obtain a trained Transformer translation model, namely a Hantai translation model: model of Ch-Th-transformation;
3) and (3) entity extraction: in the embodiment, Stanford NLP is adopted to extract the entities of Chinese sentences, and the process is as follows:
1-3) downloading Stanford CoreNLP file first
http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
Decompressing; model jar file of Chinese is downloaded again
http://nlp.stanford.edu/software/stanford-chinese-corenlp-2016-10-31- models.jarPut under the root directory;
2-3) A key code example for StanfordNLP entity extraction is shown in FIG. 8, the StanfordNLP tool is used for carrying out entity extraction on A sentence 1-A in A Chinese Ch.txt file to obtain A Chinese entity NER-A;
4) entity translation and matching: the entity translation adopts the combination of the currently common translation software and a Transformer translation model, and the specific process is as follows:
1-4) firstly, translating the Chinese entity NER-A extracted in the step 2-3) by adopting the currently common Google translation software to obtain A translated entity NER1-A, then matching with the corresponding Thai sentence 1-B, aligning the next entity if matching is successful, and turning to the step 2-4 if matching is failed);
2-4) translating the entity NER-A which fails in matching in the step 1-4) by using the Chinese Thai translation model Ch-Th-translation model trained in the step 4-3-2) to obtain A translated entity NER2-A, matching with the corresponding Thai sentence 1-B, if matching is successful, obtaining an entity in the sentence 1-A and A corresponding entity NER-B in the sentence 1-B, and if matching fails, aligning the next entity;
3-4) finally, the aligned "NER-A: NER-B ", namely, the entity alignment in the Chinese Thai bilingual sentence is completed.

Claims (1)

1. A cross-language knowledge graph-oriented Chinese-Tai entity alignment method is characterized by comprising the following steps:
1) bilingual dataset acquisition: acquiring a Chinese-Thai bilingual alignment data set from a Wikidata and YAGO multi-language knowledge base or each big Chinese-Thai bilingual website, wherein the data set is aligned Chinese-Thai bilingual sentences, and the aligned entities of the Chinese sentences can be found in the Thai sentences by the entities in the Chinese sentences;
2) constructing and training a machine translation model: constructing a Transformer translation model, training the bilingual data set obtained in the step 1) through the constructed Transformer translation model to obtain a trained Hantai translation model, wherein the process is as follows:
1-2) data preprocessing: preprocessing the Chinese and Thai bilingual data set obtained in the step 1), converting the preprocessed Chinese and Thai bilingual data set into a standard data format for training a machine translation model, and dividing the bilingual data set into a Chinese sentence file Ch.txt and a Thai sentence file Th.txt, wherein each sentence in Ch.txt corresponds to each sentence in Th.txt;
2-2) word segmentation: the method comprises the following steps that a jieba word segmentation tool is adopted for segmenting words in a Chinese data set, a cutkum tool is adopted for segmenting words in a Thai data set, and a space is used for separating the words;
3-2) constructing a Transformer translation model: the Transformer model adopts a framework structure of an Encoder-Decoder, namely an Encoder-Decoder, which is typical in the Seq2Seq model, but unlike the Seq2Seq model, a Transformer Encoder and Decoder do not have a structure using a recurrent neural network, and the main structures of the Encoder and Decoder are as follows:
1-3-2) encoder: the coding layer in the Transformer model is composed of a group of same layer stacks, each layer is composed of two sublayers of Multi-Head Attention, namely Multi-Head Attention, and a fully-connected feedforward, namely Feed-Forward network, wherein the Multi-Head Attention is used for realizing Self-Attention in the model, a Multi-Head Attention mechanism carries out Multi-path linear transformation on input, then the results of the Attention are respectively calculated, all the results are spliced, linear transformation is carried out again and output, the Attention uses Dot Product, namely Dot-Product, and scale processing is carried out after the Dot Product, the fully-connected feedforward network carries out the same calculation, namely Position-wise, on each Position in the sequence, and the fully-connected feedforward network adopts a structure in which ReLU activation is carried out in the middle of two linear transformations;
2-3-2) decoder: the decoder and the encoder have similar structures, but the layer of the decoder is added with a sub-layer with multi-head Attention compared with the layer of the encoder, so as to realize the Attention output by the encoder;
3-3-2) construction of Transformer translation model: constructing by adopting a hundred-degree PaddlePaddle, Pythrch or TensorFlow frame;
4-3-2) after the model is constructed, loading the data after word segmentation in the step 2-2) into the Transformer translation model for training to obtain a trained translation model, namely a Hantai translation model;
3) and (3) entity extraction: selecting a Chinese entity extraction tool which is open at present or extracting entities in Chinese sentences by adopting a common Chinese named entity identification model;
4) entity translation and matching: the entity translation adopts the combination of the currently common translation software and a Transformer translation model, and the specific process is as follows:
1-4) firstly, translating the Chinese entity NER-A extracted in the step 3) by adopting currently common translation software to obtain A translated entity NER1-A, then matching with A corresponding Thai sentence, aligning the next entity if matching is successful, and turning to the step 2-4 if matching is failed);
2-4) translating the entity NER-A which is failed in the matching in the step 1-4) by using the Chinese and Thai translation model trained in the step 4-3-2) to obtain A translated entity NER2-A, matching the translated entity NER2-A with A corresponding Thai sentence, and obtaining an entity NER-A in the Chinese sentence and A corresponding entity NER-B in the Thai sentence if the matching is successful;
3-4) finally, the aligned "NER-A: NER-B ", namely, the entity alignment in the Chinese Thai bilingual sentence is completed.
CN202010578711.2A 2020-06-23 2020-06-23 Chinese-Thai entity alignment method oriented to cross-language knowledge graph Pending CN111723587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010578711.2A CN111723587A (en) 2020-06-23 2020-06-23 Chinese-Thai entity alignment method oriented to cross-language knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010578711.2A CN111723587A (en) 2020-06-23 2020-06-23 Chinese-Thai entity alignment method oriented to cross-language knowledge graph

Publications (1)

Publication Number Publication Date
CN111723587A true CN111723587A (en) 2020-09-29

Family

ID=72568256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010578711.2A Pending CN111723587A (en) 2020-06-23 2020-06-23 Chinese-Thai entity alignment method oriented to cross-language knowledge graph

Country Status (1)

Country Link
CN (1) CN111723587A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417159A (en) * 2020-11-02 2021-02-26 武汉大学 Cross-language entity alignment method of context alignment enhanced graph attention network
CN112674734A (en) * 2020-12-29 2021-04-20 电子科技大学 Pulse signal noise detection method based on supervision Seq2Seq model
CN113220975A (en) * 2021-05-20 2021-08-06 北京欧拉认知智能科技有限公司 Atlas-based search analysis method and system
CN115455981A (en) * 2022-11-11 2022-12-09 合肥智能语音创新发展有限公司 Semantic understanding method, device, equipment and storage medium for multi-language sentences

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676898A (en) * 2008-09-17 2010-03-24 中国科学院自动化研究所 Method and device for translating Chinese organization name into English with the aid of network knowledge
CN103853710A (en) * 2013-11-21 2014-06-11 北京理工大学 Coordinated training-based dual-language named entity identification method
CN106682670A (en) * 2016-12-19 2017-05-17 Tcl集团股份有限公司 Method and system for identifying station caption
CN107633079A (en) * 2017-09-25 2018-01-26 重庆邮电大学 A kind of vehicle device natural language human-machine interactions algorithm based on database and neutral net
CN109190131A (en) * 2018-09-18 2019-01-11 北京工业大学 A kind of English word and its capital and small letter unified prediction based on neural machine translation
CN109670178A (en) * 2018-12-20 2019-04-23 龙马智芯(珠海横琴)科技有限公司 Sentence-level bilingual alignment method and device, computer readable storage medium
CN110083826A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of old man's bilingual alignment method based on Transformer model
CN111159426A (en) * 2019-12-30 2020-05-15 武汉理工大学 Industrial map fusion method based on graph convolution neural network
CN111259652A (en) * 2020-02-10 2020-06-09 腾讯科技(深圳)有限公司 Bilingual corpus sentence alignment method and device, readable storage medium and computer equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101676898A (en) * 2008-09-17 2010-03-24 中国科学院自动化研究所 Method and device for translating Chinese organization name into English with the aid of network knowledge
CN103853710A (en) * 2013-11-21 2014-06-11 北京理工大学 Coordinated training-based dual-language named entity identification method
CN106682670A (en) * 2016-12-19 2017-05-17 Tcl集团股份有限公司 Method and system for identifying station caption
CN107633079A (en) * 2017-09-25 2018-01-26 重庆邮电大学 A kind of vehicle device natural language human-machine interactions algorithm based on database and neutral net
CN109190131A (en) * 2018-09-18 2019-01-11 北京工业大学 A kind of English word and its capital and small letter unified prediction based on neural machine translation
CN109670178A (en) * 2018-12-20 2019-04-23 龙马智芯(珠海横琴)科技有限公司 Sentence-level bilingual alignment method and device, computer readable storage medium
CN110083826A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of old man's bilingual alignment method based on Transformer model
CN111159426A (en) * 2019-12-30 2020-05-15 武汉理工大学 Industrial map fusion method based on graph convolution neural network
CN111259652A (en) * 2020-02-10 2020-06-09 腾讯科技(深圳)有限公司 Bilingual corpus sentence alignment method and device, readable storage medium and computer equipment

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
BLACK_SHUANG: "图解Transformer模型(Multi-Head Attention)", 《HTTPS://BLOG.CSDN.NET/BLACK_SHUANG/ARTICLE/DETAILS/95384597》, 10 July 2019 (2019-07-10), pages 1 - 3 *
SHIZE KANG等: "Iterative Cross-Lingual Entity Alignment B ased on TransC", 《IEICE TRANS. INF. & SYST.》, 30 May 2020 (2020-05-30), pages 1002 - 1005 *
ZEQUN SUN等: "Cross-lingual Entity Alignment via Joint Attribute-Preserving Embedding", 《ARXIV:1708.05045V2》, 26 September 2017 (2017-09-26), pages 14 *
刘庆峰等: "会议场景下融合外部词典知识的领域个性化机器翻译方法", 《中文信息学报》, vol. 33, no. 10, 15 October 2019 (2019-10-15), pages 31 - 37 *
吴辉文: "泰语分词与实体抽取技术研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 2, 15 February 2022 (2022-02-15), pages 138 - 1336 *
康世泽等: "一种基于实体描述和知识向量相似度的", 《电子学报》, vol. 47, no. 9, 15 September 2019 (2019-09-15), pages 1841 - 1847 *
张金鹏等: "融合人名知识分布特征的汉泰双语人名对齐", 《HTTP://KNS.CNKI.NET/KCMS/DETAIL/11.2127.TP.20190305.1453.002.HTML》, 6 March 2019 (2019-03-06), pages 1 - 11 *
胡弘思等: "基于***的双语可比语料的句子对齐", 《中文信息学报》, vol. 30, no. 1, 15 January 2016 (2016-01-15), pages 198 - 203 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417159A (en) * 2020-11-02 2021-02-26 武汉大学 Cross-language entity alignment method of context alignment enhanced graph attention network
CN112674734A (en) * 2020-12-29 2021-04-20 电子科技大学 Pulse signal noise detection method based on supervision Seq2Seq model
CN113220975A (en) * 2021-05-20 2021-08-06 北京欧拉认知智能科技有限公司 Atlas-based search analysis method and system
CN115455981A (en) * 2022-11-11 2022-12-09 合肥智能语音创新发展有限公司 Semantic understanding method, device, equipment and storage medium for multi-language sentences
CN115455981B (en) * 2022-11-11 2024-03-19 合肥智能语音创新发展有限公司 Semantic understanding method, device and equipment for multilingual sentences and storage medium

Similar Documents

Publication Publication Date Title
CN111723587A (en) Chinese-Thai entity alignment method oriented to cross-language knowledge graph
CN110334361B (en) Neural machine translation method for Chinese language
CN109684648B (en) Multi-feature fusion automatic translation method for ancient and modern Chinese
CN107632981B (en) Neural machine translation method introducing source language chunk information coding
JP2022028887A (en) Method, apparatus, electronic device and storage medium for correcting text errors
Schulz et al. Multi-modular domain-tailored OCR post-correction
CN110765791B (en) Automatic post-editing method and device for machine translation
CN109635288A (en) A kind of resume abstracting method based on deep neural network
CN112818712B (en) Machine translation method and device based on translation memory library
CN110110334B (en) Remote consultation record text error correction method based on natural language processing
CN110837736B (en) Named entity recognition method of Chinese medical record based on word structure
CN111680169A (en) Electric power scientific and technological achievement data extraction method based on BERT model technology
CN117010500A (en) Visual knowledge reasoning question-answering method based on multi-source heterogeneous knowledge joint enhancement
Hsu et al. Prompt-learning for cross-lingual relation extraction
CN115510864A (en) Chinese crop disease and pest named entity recognition method fused with domain dictionary
Serrano et al. Interactive handwriting recognition with limited user effort
CN111444720A (en) Named entity recognition method for English text
CN115860015B (en) Translation memory-based transcription text translation method and computer equipment
Shi et al. Adding Visual Information to Improve Multimodal Machine Translation for Low‐Resource Language
CN115392255A (en) Few-sample machine reading understanding method for bridge detection text
CN112380882B (en) Mongolian Chinese neural machine translation method with error correction function
Naranpanawa et al. Analyzing subword techniques to improve english to sinhala neural machine translation
CN114139610A (en) Traditional Chinese medicine clinical literature data structuring method and device based on deep learning
CN114139561A (en) Multi-field neural machine translation performance improving method
Romero et al. A historical document handwriting transcription end-to-end system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220429

Address after: 100193 room 316, floor 3, building 4, yard 8, Dongbeiwang West Road, Haidian District, Beijing

Applicant after: Beijing Tianyun Xin'an Technology Co.,Ltd.

Address before: 541004 1 Jinji Road, Qixing District, Guilin, the Guangxi Zhuang Autonomous Region

Applicant before: GUILIN University OF ELECTRONIC TECHNOLOGY

TA01 Transfer of patent application right
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200929

WD01 Invention patent application deemed withdrawn after publication