CN113051913A - Tibetan word segmentation information processing method, system, storage medium, terminal and application - Google Patents

Tibetan word segmentation information processing method, system, storage medium, terminal and application Download PDF

Info

Publication number
CN113051913A
CN113051913A CN202110380044.1A CN202110380044A CN113051913A CN 113051913 A CN113051913 A CN 113051913A CN 202110380044 A CN202110380044 A CN 202110380044A CN 113051913 A CN113051913 A CN 113051913A
Authority
CN
China
Prior art keywords
***
word
information processing
word segmentation
processing method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110380044.1A
Other languages
Chinese (zh)
Inventor
刘清民
程国艮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Tone Communication Technology Co ltd
Original Assignee
Global Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Tone Communication Technology Co ltd filed Critical Global Tone Communication Technology Co ltd
Priority to CN202110380044.1A priority Critical patent/CN113051913A/en
Publication of CN113051913A publication Critical patent/CN113051913A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of information processing, and discloses a Tibetan word segmentation information processing method, a Tibetan word segmentation information processing system, a storage medium, a terminal and application. The Tibetan word segmentation information processing system comprises: a word vector preprocessing module; a model structure building module; a word vector training module; and a word vector training stop judging module. In Tibetan, the method uses an artificial neural network and deep learning solution, and predicts the boundaries of words by learning Tibetan word vectors and using a Convolutional Neural Network (CNN) model and a Conditional Random Field (CRF); the network is iteratively trained by matching the sequence of characters in the sentence to the sequence of manually labeled word boundaries, obtaining weights, i.e., the final parameters.

Description

Tibetan word segmentation information processing method, system, storage medium, terminal and application
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a Tibetan word segmentation information processing method, a Tibetan word segmentation information processing system, a Tibetan word segmentation information processing storage medium, a Tibetan word segmentation information terminal and application.
Background
Tibetan language
Figure 10000229453643
It refers to the Tibetan language used by Tibetan. The Tibetan language belongs to the Tibetan language of the Tibetan Burmese family of the Hanzang language, and is mainly applicable to Tibetan people in China and a part of people in Nepal, Plumbum preparatium, India and Pakistan. The Tibetan belongs to the vowel annex characters of the phonological characters, and there are two statements about the origin of the Tibetan. The scholars think that the Tu-Dynasty Gong Yuan 7 th century is created by sending King Songzhan cloth to Tibetan linguists to swallow mulberry cloth and then learning Sanskrit in North India, and introducing Sanskrit letters after returning to the country. Yong and Zhong are the present teaching that the Tibetan evolves from the elephant-male.
English (English) belongs to the western japanese language branch of the japanese language family in the european system, evolved from languages spoken by japanese people who ancient times move the great british island from the continental europe, saxon and jute tribe, and spread to all over the world through the activities of colonists in the united kingdom.
Tibetan differs from English in that words in Tibetan are often written together without word boundary markers. Whereas the constituent letters of words in english are independent, with boundary markers. For Tibetan, word segmentation is one of the first tasks to build natural language processing applications, such as topic classification, sentiment analysis, document similarity, machine translation, etc.
For a computer, the difficulty of processing characters and texts without word boundary marks exists, and the prior art adopts an artificial neural network and deep learning solution; the Convolutional Neural Network (CNN) is a special neural network, and is one of the most successful models in NLP at present; predicting the boundaries of words by learning Tibetan word vectors (word 2 vec) by using a CNN model and a Conditional Random Field (CRF); the network is iteratively trained by matching the sequence of characters in the sentence to the sequence of manually labeled word boundaries, obtaining weights, i.e., the final parameters. Because the open corpus that discloses is less, and the corpus cost of manufacture is high, only tests under limited parameter at present, and different parameters can be adopted to test in the later stage, lead to the shortcoming that prior art exists: (1) it is to be expected to increase the number of training corpora. (2) The selection of the parameters has an optimization space.
Through the above analysis, the problems and defects of the prior art are as follows:
(1) in the prior art, characters and texts without word boundary marks are processed by using an artificial neural network and deep learning, so that fewer open corpora exist, and the corpus manufacturing cost is high.
(2) In the prior art, the artificial neural network and the deep learning are used for processing characters without word boundary marks, the selection space of the parameters of the text exists in the experiment under the limited parameters, and different parameters can be adopted for the experiment in the later period.
The difficulty in solving the above problems and defects is: the cost of manually marking the participle corpus is too high; the parameter selection needs a plurality of experiments to determine the better version.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a Tibetan word segmentation information processing method, a Tibetan word segmentation information processing system, a storage medium, a terminal and application.
The Tibetan word segmentation information processing method is realized by learning word segmentation linguistic data through word vectors, a convolutional neural network and a conditional random field to generate a Tibetan word boundary rule, and finally achieves word segmentation of the Tibetan. Firstly, through word2vec, the expression method for learning Tibetan words learns the possibility of word segmentation of the Tibetan at a certain position through the existing word segmentation linguistic data and the learned word vectors by using a convolutional neural network and a conditional random field, and performs word segmentation on the Tibetan at a place with high possibility.
Furthermore, the Tibetan word segmentation information processing method predicts the word boundary by learning the Tibetan word vector word2vec and utilizing a Convolutional Neural Network (CNN) model and a Conditional Random Field (CRF).
Further, the Tibetan word segmentation information processing method matches the character sequence in the sentence with the sequence of the manually marked word boundary to iteratively train the network, and obtains the weight, namely the final parameter.
Further, the Tibetan word segmentation information processing method specifically comprises the following steps:
firstly, preprocessing marked word segmentation linguistic data, and learning word vectors of Tibetan through word2vec, namely the representation of each word in deep learning and dictionaries of all the marked words, wherein an unknown word position is specially added;
secondly, building a CNN model, and calculating loss by using CRF;
thirdly, training by using the marked Tibetan and the trained word vectors through the built model;
and fourthly, stopping training when the training reaches a certain accuracy rate after the development set is trained, thereby obtaining word segmentation rules.
Further, the model structure is composed of a convolutional neural network plus a conditional random field.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of: and (3) learning the word segmentation linguistic data through word vectors, a convolutional neural network and a conditional random field to generate a Tibetan word boundary rule, and finally realizing word segmentation of the Tibetan.
It is another object of the present invention to provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of: and (3) learning the word segmentation linguistic data through word vectors, a convolutional neural network and a conditional random field to generate a Tibetan word boundary rule, and finally realizing word segmentation of the Tibetan.
The invention also aims to provide an information data processing terminal, which is used for realizing the Tibetan word segmentation information processing method.
Another object of the present invention is to provide a Tibetan segmentation information processing system for implementing the Tibetan segmentation information processing method, the Tibetan segmentation information processing system comprising:
the word vector preprocessing module is used for training the Tibetan language with the divided words to learn word vectors of the Tibetan language through the Tibetan language word vectors and storing the word vectors and the dictionary;
the model structure building module is used for building a model structure, and the model structure consists of a convolutional neural network and a conditional random field;
the word vector training module is used for training a model through the marked Tibetan and the trained word vectors;
and the word vector training stopping and judging module is used for stopping training after the development set reaches a certain accuracy.
The invention also aims to provide a computer information processing terminal which is used for realizing the Tibetan word segmentation information processing method.
By combining all the technical schemes, the invention has the advantages and positive effects that: the method learns the word segmentation linguistic data through word vectors, a convolutional neural network and a conditional random field to generate a Tibetan word boundary rule, and finally achieves word segmentation of Tibetan. The Tibetan word segmentation tool can achieve the accuracy of 90 in a test set and can help achieve a better translation effect in machine translation.
In Tibetan, the method uses an artificial neural network and deep learning solution, and predicts the boundaries of words by learning Tibetan word vectors (word 2 vec) and using a Convolutional Neural Network (CNN) model and a Conditional Random Field (CRF); the network is iteratively trained by matching the sequence of characters in the sentence to the sequence of manually labeled word boundaries, obtaining weights, i.e., the final parameters.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.
Fig. 1 is a flowchart of a Tibetan word segmentation information processing method according to an embodiment of the present invention.
FIG. 2 is a schematic structural diagram of a Tibetan word segmentation information processing system according to an embodiment of the present invention;
in fig. 2: 1. a word vector preprocessing module; 2. a model structure building module; 3. a word vector training module; 4. and a word vector training stop judging module.
Fig. 3 is a flowchart of an implementation of the method for processing Tibetan word segmentation information according to the embodiment of the present invention.
Fig. 4 is a graph of the results of the effects provided by the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Aiming at the problems in the prior art, the invention provides a Tibetan word segmentation information processing method, a Tibetan word segmentation information processing system, a storage medium, a terminal and application, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the method for processing the Tibetan word segmentation information provided by the present invention comprises the following steps:
s101: training the Tibetan language with the divided words to learn word vectors of the Tibetan language through the word vectors (word 2 vec) of the Tibetan language, and storing the word vectors and the dictionary;
s102: building a model structure, which consists of a Convolutional Neural Network (CNN) and a Conditional Random Field (CRF);
s103: training a model through the marked Tibetan and the trained word vectors;
s104: and stopping training after the development set reaches a certain accuracy.
The Tibetan word segmentation information processing method provided by the invention specifically comprises the following steps:
firstly, preprocessing the marked participle corpus, learning word vectors of Tibetan through word2vec, namely representing each word in deep learning, and adding an unknown word position occupation (representing the word for the word which is not encountered later) specially for all the participle dictionaries;
secondly, building a CNN model, and calculating loss by using CRF;
thirdly, training by using the marked Tibetan and the trained word vectors through the built model;
and fourthly, stopping training when the training reaches a certain accuracy rate after the development set is trained, thereby obtaining word segmentation rules.
Those skilled in the art can also implement the method for processing the Tibetan segmentation information provided by the present invention by using other steps, and the method for processing the Tibetan segmentation information provided by the present invention in fig. 1 is only a specific embodiment.
As shown in fig. 2, the Tibetan word segmentation information processing system provided by the present invention includes:
the word vector preprocessing module 1 is used for training the Tibetan language with the divided words to learn word vectors of the Tibetan language through the Tibetan language word vectors and storing the word vectors and the dictionary;
the model structure building module 2 is used for building a model structure, and the model structure consists of a convolutional neural network and a conditional random field;
the word vector training module 3 is used for training a model through marked Tibetan and trained word vectors;
and the word vector training stopping judgment module 4 is used for stopping training after the development set reaches a certain accuracy.
The technical solution of the present invention is further described below with reference to the accompanying drawings.
The invention utilizes word vectors, Convolutional Neural Networks (CNN) and Conditional Random Fields (CRF) to carry out word segmentation on Tibetan. The problem of dividing words for preprocessing Tibetan before the translation training of a neural machine is mainly solved.
As shown in fig. 3, the method for processing the Tibetan word segmentation information provided by the present invention comprises the following steps:
firstly, training the Tibetan language with the divided words to learn word vectors of the Tibetan language through word vectors (word 2 vec) of the Tibetan language, and storing the word vectors and a dictionary.
And secondly, building a model structure which consists of a Convolutional Neural Network (CNN) and a Conditional Random Field (CRF).
And thirdly, training the model through the marked Tibetan and the trained word vector.
And fourthly, stopping training after the development set reaches a certain accuracy.
The word vector can help better learn the deep learning method of the relationship between words and can help better split the possibility between words; the model obtained by the CNN-CRF training has better speed in word segmentation.
It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.
The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. The Tibetan word segmentation information processing method is characterized in that the Tibetan word segmentation information processing method learns the possibility of word segmentation of the Tibetan at a certain position through word2vec, the existing word segmentation linguistic data and the learned word vector by using a convolutional neural network and a conditional random field, and performs word segmentation on the Tibetan at a place with high possibility.
2. The Tibetan word segmentation information processing method of claim 1, wherein the Tibetan word segmentation information processing method is used for predicting word boundaries by learning a Tibetan word vector word2vec and utilizing a Convolutional Neural Network (CNN) model and a Conditional Random Field (CRF).
3. The Tibetan segmentation information processing method as claimed in claim 2, wherein the Tibetan segmentation information processing method iteratively trains a network by matching a sequence of characters in a sentence with a sequence of manually labeled word boundaries to obtain weights, i.e., final parameters.
4. The Tibetan word segmentation information processing method of claim 1, wherein the Tibetan word segmentation information processing method specifically comprises:
firstly, preprocessing marked word segmentation linguistic data, and learning word vectors of Tibetan through word2vec, namely the representation of each word in deep learning and dictionaries of all the marked words, wherein an unknown word position is specially added;
secondly, building a CNN model, and calculating loss by using CRF;
thirdly, training by using the marked Tibetan and the trained word vectors through the built model;
and fourthly, stopping training when the training reaches a certain accuracy rate after the development set is trained, thereby obtaining word segmentation rules.
5. The Tibetan word segmentation information processing method as claimed in claim 4, wherein the structure of the built model is composed of a convolutional neural network and a conditional random field.
6. A computer device, characterized in that the computer device comprises a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of: and (3) learning the word segmentation linguistic data through word vectors, a convolutional neural network and a conditional random field to generate a Tibetan word boundary rule, and finally realizing word segmentation of the Tibetan.
7. A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of: and (3) learning the word segmentation linguistic data through word vectors, a convolutional neural network and a conditional random field to generate a Tibetan word boundary rule, and finally realizing word segmentation of the Tibetan.
8. An information data processing terminal, characterized in that the information data processing terminal is used for realizing the Tibetan word segmentation information processing method of any one of claims 1 to 5.
9. A Tibetan word segmentation information processing system for implementing the Tibetan word segmentation information processing method of any one of claims 1 to 5, the Tibetan word segmentation information processing system comprising:
the word vector preprocessing module is used for training the Tibetan language with the divided words to learn word vectors of the Tibetan language through the Tibetan language word vectors and storing the word vectors and the dictionary;
the model structure building module is used for building a model structure, and the model structure consists of a convolutional neural network and a conditional random field;
the word vector training module is used for training a model through the marked Tibetan and the trained word vectors;
and the word vector training stopping and judging module is used for stopping training after the development set reaches a certain accuracy.
10. A computer information processing terminal is characterized in that the computer information processing terminal is used for realizing the Tibetan word segmentation information processing method of any one of claims 1-5.
CN202110380044.1A 2021-04-09 2021-04-09 Tibetan word segmentation information processing method, system, storage medium, terminal and application Pending CN113051913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110380044.1A CN113051913A (en) 2021-04-09 2021-04-09 Tibetan word segmentation information processing method, system, storage medium, terminal and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110380044.1A CN113051913A (en) 2021-04-09 2021-04-09 Tibetan word segmentation information processing method, system, storage medium, terminal and application

Publications (1)

Publication Number Publication Date
CN113051913A true CN113051913A (en) 2021-06-29

Family

ID=76519028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110380044.1A Pending CN113051913A (en) 2021-04-09 2021-04-09 Tibetan word segmentation information processing method, system, storage medium, terminal and application

Country Status (1)

Country Link
CN (1) CN113051913A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809176A (en) * 2015-04-13 2015-07-29 中央民族大学 Entity relationship extracting method of Zang language
CN108268444A (en) * 2018-01-10 2018-07-10 南京邮电大学 A kind of Chinese word cutting method based on two-way LSTM, CNN and CRF
CN110287961A (en) * 2019-05-06 2019-09-27 平安科技(深圳)有限公司 Chinese word cutting method, electronic device and readable storage medium storing program for executing
EP3564964A1 (en) * 2018-05-04 2019-11-06 Avaintec Oy Method for utilising natural language processing technology in decision-making support of abnormal state of object
CN111126068A (en) * 2019-12-25 2020-05-08 中电云脑(天津)科技有限公司 Chinese named entity recognition method and device and electronic equipment
US20200301919A1 (en) * 2017-05-05 2020-09-24 Ping An Technology (Shenzhen) Co., Ltd. Method and system of mining information, electronic device and readable storable medium
CN112328946A (en) * 2020-12-10 2021-02-05 青海民族大学 Method and system for automatically generating Tibetan language webpage abstract

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809176A (en) * 2015-04-13 2015-07-29 中央民族大学 Entity relationship extracting method of Zang language
US20200301919A1 (en) * 2017-05-05 2020-09-24 Ping An Technology (Shenzhen) Co., Ltd. Method and system of mining information, electronic device and readable storable medium
CN108268444A (en) * 2018-01-10 2018-07-10 南京邮电大学 A kind of Chinese word cutting method based on two-way LSTM, CNN and CRF
EP3564964A1 (en) * 2018-05-04 2019-11-06 Avaintec Oy Method for utilising natural language processing technology in decision-making support of abnormal state of object
CN110287961A (en) * 2019-05-06 2019-09-27 平安科技(深圳)有限公司 Chinese word cutting method, electronic device and readable storage medium storing program for executing
CN111126068A (en) * 2019-12-25 2020-05-08 中电云脑(天津)科技有限公司 Chinese named entity recognition method and device and electronic equipment
CN112328946A (en) * 2020-12-10 2021-02-05 青海民族大学 Method and system for automatically generating Tibetan language webpage abstract

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
***,杨鸿武,宋志蒙: "基于多分类器的藏文文本分类方法", 《南京邮电大学学报》 *

Similar Documents

Publication Publication Date Title
CN111079406B (en) Natural language processing model training method, task execution method, equipment and system
US20230080671A1 (en) User intention recognition method and apparatus based on statement context relationship prediction
CN108932226A (en) A kind of pair of method without punctuate text addition punctuation mark
CN107577662A (en) Towards the semantic understanding system and method for Chinese text
Xu et al. A deep neural network approach for sentence boundary detection in broadcast news.
CN111339750A (en) Spoken language text processing method for removing stop words and predicting sentence boundaries
WO2023137911A1 (en) Intention classification method and apparatus based on small-sample corpus, and computer device
He et al. Can chatgpt detect intent? evaluating large language models for spoken language understanding
CN111414745A (en) Text punctuation determination method and device, storage medium and electronic equipment
CN116523031B (en) Training method of language generation model, language generation method and electronic equipment
CN113723105A (en) Training method, device and equipment of semantic feature extraction model and storage medium
CN112016271A (en) Language style conversion model training method, text processing method and device
CN116595999B (en) Machine translation model training method and device
US20230334265A1 (en) Method and system for processing multilingual user inputs via application programming interface
WO2023184633A1 (en) Chinese spelling error correction method and system, storage medium, and terminal
KR20240006688A (en) Correct multilingual grammar errors
CN110287483B (en) Unregistered word recognition method and system utilizing five-stroke character root deep learning
Min et al. Exploring the integration of large language models into automatic speech recognition systems: An empirical study
Cheng et al. Research on automatic error correction method in English writing based on deep neural network
CN110377691A (en) Method, apparatus, equipment and the storage medium of text classification
CN113051913A (en) Tibetan word segmentation information processing method, system, storage medium, terminal and application
CN114818728A (en) Text style migration model training and text style migration method and device
CN113420121B (en) Text processing model training method, voice text processing method and device
CN111597827B (en) Method and device for improving accuracy of machine translation
Alfaidi et al. Exploring the performance of farasa and CAMeL taggers for arabic dialect tweets.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210629

RJ01 Rejection of invention patent application after publication