CN113705223A - Personalized English text simplification method taking reader as center - Google Patents

Personalized English text simplification method taking reader as center Download PDF

Info

Publication number
CN113705223A
CN113705223A CN202111025610.3A CN202111025610A CN113705223A CN 113705223 A CN113705223 A CN 113705223A CN 202111025610 A CN202111025610 A CN 202111025610A CN 113705223 A CN113705223 A CN 113705223A
Authority
CN
China
Prior art keywords
sentence
word
reader
words
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111025610.3A
Other languages
Chinese (zh)
Inventor
强继朋
张峰
李云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN202111025610.3A priority Critical patent/CN113705223A/en
Publication of CN113705223A publication Critical patent/CN113705223A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a personalized English text simplification method taking readers as centers, which comprises the following steps of 1, setting the simplification grade of the current simplification method according to the English grade currently possessed by the readers, and acquiring a word bank corresponding to the grade; step 2, performing clause processing on the text read by the reader to obtain a clause set; and 3, simplifying each sentence in the sentence set from front to back in sequence by adopting a sentence and word simplification method, acquiring a simplified sentence set, and returning the simplified sentence set to a reader. The invention fully utilizes the pre-training language model and the dictionary library, meets the requirement of different readers on English text simplification, and simultaneously improves the accuracy of English text simplification.

Description

Personalized English text simplification method taking reader as center
Technical Field
The invention relates to the field of English text simplification, in particular to a personalized English text simplification method taking readers as centers.
Background
In recent years, with the development of the internet, a large amount of english text appears in the public sight. For example, many professional papers downloaded in english journals are selected by many people to read the papers directly, rather than translating the papers into their native languages first and then reading the papers. For these texts, if the texts contain a large amount of rare and uncommon words, the understanding of the reader on the meaning of the text contents is severely restricted. Research proves that if 90% of English words in the text are in the cognitive range of the reader, even if the text is long and complex, the content meaning of the text can be easily understood by the reader.
English text simplification aims to simplify words or syntax in the text, so that a reader can read and understand the meaning of the text and simultaneously retain original text information to the maximum extent. For a given piece of input text, the text reduction system replacing complex words in the text with simple words requires two conditions to be met: 1) the output text should preserve the meaning of the input as much as possible; 2) outputting text should minimize the number of complex words (words that the reader cannot understand). These two conditions can create a conflict, and to reduce the complexity of the text, the system can select the simplest candidate surrogate. However, when the simplest candidate alternative word is selected, the original semantics of the text cannot be guaranteed. The existing text simplification algorithm does not consider the cognitive level of readers, blindly sets some vocabularies with relatively low word frequency as complex vocabularies, and replaces the complex vocabularies with similar word senses, thereby achieving the text simplification effect and increasing the risk of the difference between the simplified text meaning and the original text meaning.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a personalized English text simplification method taking a reader as the center.
The purpose of the invention is realized as follows: a personalized English text simplification method taking a reader as a center comprises the following steps:
step 1, according to the English level currently possessed by a reader, setting the simplification level of the current simplification method, and acquiring a word bank R corresponding to the level;
step 2, supposing the Text of the document currently read by the reader, and adopting a sentence segmentation methodDividing the Text to obtain a sentence set T ═ c1,…,ci,…,cm}; m represents the number of sentences in the set T;
step 3, each sentence c in the T is sequentially subjected to the simplification method of the sentences and the words from front to backi(i is more than or equal to 1 and less than or equal to m) is simplified, and a simplified sentence set SS is obtained as the set { s ═ s%1,…,si,…,smAnd returns the SS to the reader.
As a further limitation of the present invention, the step 2 specifically includes the following steps:
step 2.1: defining a set T, wherein the initial value is null;
step 2.2: deleting special symbols and redundant characters in the document Text;
step 2.3: segmenting the document Text according to' to obtain an initial sentence set T _ init;
step 2.4: sequentially traversing sentences sent in the set T _ initaAnd the initial value of a is 1.
As a further limitation of the present invention, the step 2.4 specifically includes the following steps:
step 2.4.1: for sentaJudging sentaIs it there? ',' |! ' symbol, if present, performs the following steps; otherwise sendaAdding the obtained mixture into the set T, and executing a step 2.4.4;
step 2.4.2: if sentaIncluding'! 'sign, then according to'! ' opposite SentaCarrying out segmentation; obtain a set of clauses ta
Step 2.4.3: if sentaContains'? Sign, then according to? ' opposite SentaCarrying out segmentation; adding the obtained clauses into the set T in sequence;
step 2.4.4: let a be a +1, repeat step 2.4 until all sentences in the set T _ init have been traversed.
As a further limitation of the present invention, the step 2.4.2 specifically comprises the following steps:
step 2.4.2.1: traverse set taDetermine if each sentence in the set contains'?The' sign, if any, is as? Dividing the sentence, and sequentially adding the obtained clauses into a set T; otherwise, directly adding the sentence into the set T.
As a further limitation of the present invention, the step 3 specifically includes the following steps:
step 3.1: using word segmentation tool to pair sentence ciPerforming word segmentation to obtain a corresponding word set and a corresponding part of speech tag ci={{w1,p1},…,{wj,pj},…,{wn,pn}};wj(1. ltoreq. j. ltoreq.n) represents the jth word in the sentence, pjIs wjCorresponding part of speech tag, n represents sentence ciThe number of words of (a);
step 3.2: initializing j to 1, and converting the original sentence ciAssigning to the simplified sentence si
Step 3.3: if j is equal to n +1, return to the simplified sentence siAnd terminating the iteration; otherwise, continuing to execute the step 3.4;
step 3.4: judgment of wjWhether the stop word belongs to or not, if not, executing the step 3.5; otherwise, j +1 is assigned to j, and step 3.3 is executed;
step 3.5: judgment of pjIf the part of speech set belongs to { noun (n), verb (v), adjective (adj), adverb (adv) }, if so, executing step 3.6; otherwise, j +1 is assigned to j, and step 3.3 is executed;
step 3.6: extracting w using stem extraction tooljStem ofjJudging stemjWhether the reader belongs to the word bank R corresponding to the reader or not, if not, executing the step 3.7; otherwise, j +1 is assigned to j, and step 3.3 is executed;
step 3.7: obtaining word w by using public online synonym dictionary libraryjThe synonym set Syn;
step 3.8: obtaining a sentence c by adopting a word simplification method based on a pre-training language representation model BertiChinese word wjCandidate replacement word CS ═ { CS ═1,…,csk,…,csp};csk(1. ltoreq. k. ltoreq.p) representsThe kth word in the CS, p is the number of candidate substitutional words specified by the user;
step 3.9: screening candidate substitute words CS to determine the final substitute word subjFor use in combination with subjSubstitution of simplified sentence siOriginal word wj
As a further limitation of the present invention, the step 3.8 specifically includes the following steps:
step 3.8.1: obtaining an English pre-training language representation model Bert;
step 3.8.2: using "[ MASK ] peculiar to the Bert model]"symbol substitution sentence ciWord w injThe sentence after substitution is defined as ci’;
Step 3.8.3: concatenating symbols "[ CLS ] in sequence]", sentence ciSentence ci'sum symbol' [ SEP]", the sequence after splicing is defined as Q { [ CLS { [],ci,ci’,[SEP]};
Step 3.8.4: obtaining "[ MASK ] by the formula (1)]"production order X of all words in the position correspondence vocabulary v ═ X1,…,xy,…,xv};xy(1 ≦ y ≦ v) representing outputting the word ordered at the y-th bit, v representing the number of words of the vocabulary of the Bert model;
X(·|[MASK])=Bert(Q) (1)
step 3.8.5: defining the initial value of the set CS to be null, and initializing y to be 1;
step 3.8.6: if the number of words in the CS set is equal to p, terminating iteration; otherwise, continue to step 3.8.7;
step 3.8.7: obtaining x with stemming toolyIf the stem of the word is not equal to wjStem ofjThen x isyAdding the data into the set CS; otherwise, y +1 is assigned to y and step 3.8.6 is performed.
As a further limitation of the present invention, the step 3.9 specifically includes the following steps:
step 3.9.1: initializing k to 1;
step 3.9.2: if k is equal to p +1, the original word wjIs assigned tosubjAnd terminating the iteration; otherwise, continue to step 3.9.3;
step 3.9.3: judging cskWhether the synonym set Syn belongs to, if yes, execute step 3.9.4; otherwise, k +1 is assigned to k and step 3.9.2 is performed;
step 3.9.4: judging cskWhether it belongs to reader's thesaurus R, if so, cskAssign to subjAnd the iteration is terminated; otherwise, k +1 is assigned to k and step 3.9.2 is performed.
By adopting the technical scheme, compared with the prior art, the invention has the beneficial effects that: 1. the method identifies the words in the text which are not in the cognitive range of the reader, and replaces the words as complex words, thereby avoiding unnecessary simplification, furthest keeping the original semantics of the text and simultaneously considering the reading capability of the reader.
2. The method simultaneously utilizes the pre-training language model to generate candidate words of the complex words and corresponding synonyms in the dictionary library. The replaced text is more accurate in semantic aspect.
3. The invention selects the intersection part of the candidate word set generated by the pre-training model and the synonym set in the dictionary library, and retrieves the vocabulary in the cognitive range of the reader from the intersection part, thereby achieving the purpose of simplifying the text.
Detailed Description
A personalized English text simplification method taking readers as centers specifically comprises the following steps:
step 1, according to the English level currently possessed by a reader, setting the simplification level of the current simplification method, and acquiring a word bank R corresponding to the level; the English grades of the invention are divided into 4 types, which are respectively as follows: college students grade four and six in English, and grade four and eight in English major; the lexicon R can be inhttps://github.com/mahavivo/english-wordlistAnd (6) obtaining.
Step 2, supposing that the document Text is read currently by the reader, the Text is divided by adopting a sentence segmentation method to obtain a sentence set T ═ c1,…,ci,…,cm}; m represents the number of sentences in the set T;
step 2.1: defining a set T, wherein the initial value is null;
step 2.2: deleting special symbols and redundant characters in the document Text; the method mainly eliminates redundant characters such as '\ n', '\ t' and brackets which cannot be matched in the document;
step 2.3: segmenting the document Text according to' to obtain an initial sentence set T _ init;
step 2.4: sequentially traversing sentences sent in the set T _ initaThe initial value of a is 1;
step 2.4.1: for sentaJudging sentaIs it there? ',' |! ' symbol, if present, performs the following steps; otherwise sendaAdding the obtained mixture into the set T, and executing a step 2.4.4;
step 2.4.2: if sentaIncluding'! 'sign, then according to'! ' opposite SentaCarrying out segmentation; obtain a set of clauses ta
Step 2.4.2.1: traverse set taDetermine if each sentence in the set contains'? The' sign, if any, is as? Dividing the sentence, and sequentially adding the obtained clauses into a set T; otherwise, directly adding the sentence into the set T;
step 2.4.3: if sentaContains'? Sign, then according to? ' opposite SentaCarrying out segmentation; adding the obtained clauses into the set T in sequence;
step 2.4.4: let a be a +1, repeat step 2.4 until all sentences in the set T _ init have been traversed.
Step 3, each sentence c in the T is sequentially subjected to the simplification method of the sentences and the words from front to backi(i is more than or equal to 1 and less than or equal to m) is simplified, and a simplified sentence set SS is obtained as the set { s ═ s%1,…,si,…,smAnd returning SS to the reader;
step 3.1: using word segmentation tool to pair sentence ciPerforming word segmentation to obtain a corresponding word set and a corresponding part of speech tag ci={{w1,p1},…,{wj,pj},…,{wn,pn}};wj(1. ltoreq. j. ltoreq.n) represents the jth word in the sentence, pjIs wjCorresponding part of speech tag, n represents sentence ciThe number of words of (a);
step 3.2: initializing j to 1, and converting the original sentence ciAssigning to the simplified sentence si
Step 3.3: if j is equal to n +1, return to the simplified sentence siAnd terminating the iteration; otherwise, continuing to execute the step 3.4;
step 3.4: judgment of wjWhether the stop word belongs to or not, if not, executing the step 3.5; otherwise, j +1 is assigned to j, and step 3.3 is executed;
step 3.5: judgment of pjIf the part of speech set belongs to { noun (n), verb (v), adjective (adj), adverb (adv) }, if so, executing step 3.6; otherwise, j +1 is assigned to j, and step 3.3 is executed;
step 3.6: extracting w using stem extraction tooljStem ofjJudging stemjWhether the reader belongs to the word bank R corresponding to the reader or not, if not, executing the step 3.7; otherwise, j +1 is assigned to j, and step 3.3 is executed; the word segmentation tool, the disabled word bank, the word stem extraction tool and the acquired part of speech tag tool used in the steps 3.1 to 3.6 are all from an nltk bank of python language;
step 3.7: obtaining word w by using public online synonym dictionary libraryjThe synonym set Syn; the thesaurus dictionary repository used here is: BigHuge Thesaurus, the website is:https://words.bighugelabs.com/(ii) a Obtaining word w by using the online API of the thesaurus dictionary repositoryjThe synonym set Syn; can be selected fromhttps:// words.bighugelabs.com/site/apiObtaining;
step 3.8: obtaining a sentence c by adopting a word simplification method based on a pre-training language representation model BertiChinese word wjCandidate replacement word CS ═ { CS ═1,…,csk,…,csp};csk(k is more than or equal to 1 and less than or equal to p) represents the kth word in the CS, and p is the number of candidate substitutional words specified by the user; the pre-training language model Bert is derived from the thesis' Bert: Pre-training of deep bidirectional transformations for language understating ", published in 2018; bert is to train Mask Language Model (MLM) using a massive corpus of text. In the vocabulary simplification algorithm, a Mask Language Model (MLM) predicts the probability that all words in a vocabulary list belong to covered vocabularies by covering complex words and selects words with high probability values as candidate substitute words;
step 3.8.1: obtaining an English pre-training language representation model Bert; here, a Bert model implemented based on the pyrtch framework is selected, fromhttps://github.com/***-research/bertDownload "BERT-Large, Uncsated (wheel Word masking");
step 3.8.2: using "[ MASK ] peculiar to the Bert model]"symbol substitution sentence ciWord w injThe sentence after substitution is defined as ci’;
Step 3.8.3: concatenating symbols "[ CLS ] in sequence]", sentence ciSentence ci'sum symbol' [ SEP]", the sequence after splicing is defined as Q { [ CLS { [],ci,ci’,[SEP]}; the benefits of using sequence Q are: first, consider the sentence seniWord pairs in need of simplification in China]Predicting the influence of a result, and then improving the probability of words similar to the original words by using the NSP task of the Bert model;
step 3.8.4: obtaining "[ MASK ] by the formula (1)]"production order X of all words in the position correspondence vocabulary v ═ X1,…,xy,…,xv};xy(1 ≦ y ≦ v) representing outputting the word ordered at the y-th bit, v representing the number of words of the vocabulary of the Bert model;
X(·|[MASK])=Bert(Q) (1)
step 3.8.5: defining the initial value of the set CS to be null, and initializing y to be 1;
step 3.8.6: if the number of words in the CS set is equal to p, terminating iteration; otherwise, continue to step 3.8.7; p here takes the value 100;
step 3.8.7: obtaining x with stemming toolyIf the stem of the word is not equal to wjWord ofStem stemjThen x isyAdding the data into the set CS; otherwise, y +1 is assigned to y and step 3.8.6 is performed.
Step 3.9: screening candidate substitute words CS to determine the final substitute word subjFor use in combination with subjSubstitution of simplified sentence siOriginal word wj
Step 3.9.1: initializing k to 1;
step 3.9.2: if k is equal to p +1, the original word wjAssign to subjAnd terminating the iteration; otherwise, continue to step 3.9.3;
step 3.9.3: judging cskWhether the synonym set Syn belongs to, if yes, execute step 3.9.4; otherwise, k +1 is assigned to k and step 3.9.2 is performed;
step 3.9.4: judging cskWhether it belongs to reader's thesaurus R, if so, cskAssign to subjAnd the iteration is terminated; otherwise, k +1 is assigned to k and step 3.9.2 is performed.
The invention provides a personalized English text simplification method taking readers as centers, which only simplifies parts which cannot be understood by the readers in a text according to the cognitive level of a certain type of readers, so that the readers can well understand the text content and simultaneously retain the original text information to the maximum extent. The pre-training language model and the dictionary library are fully utilized, the requirement of different readers on English text simplification is met, and meanwhile the accuracy of English text simplification is improved.
The present invention is not limited to the above-mentioned embodiments, and based on the technical solutions disclosed in the present invention, those skilled in the art can make some substitutions and modifications to some technical features without creative efforts according to the disclosed technical contents, and these substitutions and modifications are all within the protection scope of the present invention.

Claims (7)

1. A personalized English text simplification method taking a reader as a center is characterized by comprising the following steps:
step 1, according to the English level currently possessed by a reader, setting the simplification level of the current simplification method, and acquiring a word bank R corresponding to the level;
step 2, supposing that the document Text is read currently by the reader, the Text is divided by adopting a sentence segmentation method to obtain a sentence set T ═ c1,…,ci,…,cm}; m represents the number of sentences in the set T;
step 3, each sentence c in the T is sequentially subjected to the simplification method of the sentences and the words from front to backi(i is more than or equal to 1 and less than or equal to m) is simplified, and a simplified sentence set SS is obtained as the set { s ═ s%1,…,si,…,smAnd returns the SS to the reader.
2. The reader-centric personalized english text simplification method according to claim 1, characterized in that said step 2 specifically comprises the steps of:
step 2.1: defining a set T, wherein the initial value is null;
step 2.2: deleting special symbols and redundant characters in the document Text;
step 2.3: segmenting the document Text according to' to obtain an initial sentence set T _ init;
step 2.4: sequentially traversing sentences sent in the set T _ initaAnd the initial value of a is 1.
3. The reader-centric personalized english text simplification method according to claim 2, characterized in that said step 2.4 specifically comprises the steps of:
step 2.4.1: for sentaJudging sentaIs it there? ',' |! ' symbol, if present, performs the following steps; otherwise sendaAdding the obtained mixture into the set T, and executing a step 2.4.4;
step 2.4.2: if sentaIncluding'! 'sign, then according to'! ' opposite SentaCarrying out segmentation; obtain a set of clauses ta
Step 2.4.3: if sentaContains'? Sign, then according to? ' opposite SentaCarrying out segmentation; sequentially subjecting to obtainThe clauses of (a) are added into the set T;
step 2.4.4: let a be a +1, repeat step 2.4 until all sentences in the set T _ init have been traversed.
4. The reader-centric personalized english text simplification method according to claim 3, characterized in that said step 2.4.2 specifically comprises the steps of:
step 2.4.2.1: traverse set taDetermine if each sentence in the set contains'? The' sign, if any, is as? Dividing the sentence, and sequentially adding the obtained clauses into a set T; otherwise, directly adding the sentence into the set T.
5. The reader-centric personalized english text simplification method according to claim 1, characterized in that said step 3 specifically comprises the steps of:
step 3.1: using word segmentation tool to pair sentence ciPerforming word segmentation to obtain a corresponding word set and a corresponding part of speech tag ci={{w1,p1},…,{wj,pj},…,{wn,pn}};wj(1. ltoreq. j. ltoreq.n) represents the jth word in the sentence, pjIs wjCorresponding part of speech tag, n represents sentence ciThe number of words of (a);
step 3.2: initializing j to 1, and converting the original sentence ciAssigning to the simplified sentence si
Step 3.3: if j is equal to n +1, return to the simplified sentence siAnd terminating the iteration; otherwise, continuing to execute the step 3.4;
step 3.4: judgment of wjWhether the stop word belongs to or not, if not, executing the step 3.5; otherwise, j +1 is assigned to j, and step 3.3 is executed;
step 3.5: judgment of pjIf the part of speech set belongs to { noun (n), verb (v), adjective (adj), adverb (adv) }, if so, executing step 3.6; otherwise, j +1 is assigned to j, and step 3.3 is executed;
step 3.6: extracting w using stem extraction tooljStem ofjJudging stemjWhether the reader belongs to the word bank R corresponding to the reader or not, if not, executing the step 3.7; otherwise, j +1 is assigned to j, and step 3.3 is executed;
step 3.7: obtaining word w by using public online synonym dictionary libraryjThe synonym set Syn;
step 3.8: obtaining a sentence c by adopting a word simplification method based on a pre-training language representation model BertiChinese word wjCandidate replacement word CS ═ { CS ═1,…,csk,…,csp};csk(k is more than or equal to 1 and less than or equal to p) represents the kth word in the CS, and p is the number of candidate substitutional words specified by the user;
step 3.9: screening candidate substitute words CS to determine the final substitute word subjFor use in combination with subjSubstitution of simplified sentence siOriginal word wj
6. The reader-centric personalized english text simplification method according to claim 5, characterized in that said step 3.8 specifically comprises the steps of:
step 3.8.1: obtaining an English pre-training language representation model Bert;
step 3.8.2: using "[ MASK ] peculiar to the Bert model]"symbol substitution sentence ciWord w injThe sentence after substitution is defined as ci’;
Step 3.8.3: concatenating symbols "[ CLS ] in sequence]", sentence ciSentence ci'sum symbol' [ SEP]", the sequence after splicing is defined as Q { [ CLS { [],ci,ci’,[SEP]};
Step 3.8.4: obtaining "[ MASK ] by the formula (1)]"production order X of all words in the position correspondence vocabulary v ═ X1,…,xy,…,xv};xy(1 ≦ y ≦ v) representing outputting the word ordered at the y-th bit, v representing the number of words of the vocabulary of the Bert model;
X(·|[MASK])=Bert(Q) (1)
step 3.8.5: defining the initial value of the set CS to be null, and initializing y to be 1;
step 3.8.6: if the number of words in the CS set is equal to p, terminating iteration; otherwise, continue to step 3.8.7;
step 3.8.7: obtaining x with stemming toolyIf the stem of the word is not equal to wjStem ofjThen x isyAdding the data into the set CS; otherwise, y +1 is assigned to y and step 3.8.6 is performed.
7. The reader-centric personalized english text simplification method according to claim 5, characterized in that said step 3.9 specifically comprises the steps of:
step 3.9.1: initializing k to 1;
step 3.9.2: if k is equal to p +1, the original word wjAssign to subjAnd terminating the iteration; otherwise, continue to step 3.9.3;
step 3.9.3: judging cskWhether the synonym set Syn belongs to, if yes, execute step 3.9.4; otherwise, k +1 is assigned to k and step 3.9.2 is performed;
step 3.9.4: judging cskWhether it belongs to reader's thesaurus R, if so, cskAssign to subjAnd the iteration is terminated; otherwise, k +1 is assigned to k and step 3.9.2 is performed.
CN202111025610.3A 2021-09-02 2021-09-02 Personalized English text simplification method taking reader as center Withdrawn CN113705223A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111025610.3A CN113705223A (en) 2021-09-02 2021-09-02 Personalized English text simplification method taking reader as center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111025610.3A CN113705223A (en) 2021-09-02 2021-09-02 Personalized English text simplification method taking reader as center

Publications (1)

Publication Number Publication Date
CN113705223A true CN113705223A (en) 2021-11-26

Family

ID=78658928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111025610.3A Withdrawn CN113705223A (en) 2021-09-02 2021-09-02 Personalized English text simplification method taking reader as center

Country Status (1)

Country Link
CN (1) CN113705223A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230289524A1 (en) * 2022-03-09 2023-09-14 Talent Unlimited Online Services Private Limited Articial intelligence based system and method for smart sentence completion in mobile devices

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230289524A1 (en) * 2022-03-09 2023-09-14 Talent Unlimited Online Services Private Limited Articial intelligence based system and method for smart sentence completion in mobile devices
US12039264B2 (en) * 2022-03-09 2024-07-16 Talent Unlimited Online Services Pr Artificial intelligence based system and method for smart sentence completion in mobile devices

Similar Documents

Publication Publication Date Title
Harrat et al. Machine translation for Arabic dialects (survey)
CN110543639B (en) English sentence simplification algorithm based on pre-training transducer language model
US8239188B2 (en) Example based translation apparatus, translation method, and translation program
WO2022057116A1 (en) Transformer deep learning model-based method for translating multilingual place name root into chinese
KR20180114781A (en) Apparatus and method for converting dialect into standard language
CN107391486A (en) A kind of field new word identification method based on statistical information and sequence labelling
CN111178061B (en) Multi-lingual word segmentation method based on code conversion
KR20230009564A (en) Learning data correction method and apparatus thereof using ensemble score
EP4276677A1 (en) Cross-language data enhancement-based word segmentation method and apparatus
CN110502759B (en) Method for processing Chinese-Yue hybrid network neural machine translation out-of-set words fused into classification dictionary
Tennage et al. Transliteration and byte pair encoding to improve tamil to sinhala neural machine translation
CN113705223A (en) Personalized English text simplification method taking reader as center
Fang et al. Non-Autoregressive Chinese ASR Error Correction with Phonological Training
Chiu et al. Chinese spell checking based on noisy channel model
Mekki et al. COTA 2.0: An automatic corrector of Tunisian Arabic social media texts
Singh et al. Urdu to Punjabi machine translation: An incremental training approach
Lu et al. An automatic spelling correction method for classical mongolian
Roy et al. Bangla-english neural machine translation with bidirectional long short-term memory and back translation
JP5298834B2 (en) Example sentence matching translation apparatus, program, and phrase translation apparatus including the translation apparatus
Sreeram et al. A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model.
Seresangtakul et al. Thai-Isarn dialect parallel corpus construction for machine translation
Raza et al. Saraiki Language Word Prediction And Spell Correction Framework
JP3825645B2 (en) Expression conversion method and expression conversion apparatus
Zalmout Morphological Tagging and Disambiguation in Dialectal Arabic Using Deep Learning Architectures
Astuti et al. Code-Mixed Sentiment Analysis using Transformer for Twitter Social Media Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20211126