CN111767709A - Logic method for carrying out error correction and syntactic analysis on English text - Google Patents
Logic method for carrying out error correction and syntactic analysis on English text Download PDFInfo
- Publication number
- CN111767709A CN111767709A CN201910238788.2A CN201910238788A CN111767709A CN 111767709 A CN111767709 A CN 111767709A CN 201910238788 A CN201910238788 A CN 201910238788A CN 111767709 A CN111767709 A CN 111767709A
- Authority
- CN
- China
- Prior art keywords
- english
- text
- error correction
- sentence
- english text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 32
- 238000012937 correction Methods 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 28
- 239000000203 mixture Substances 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 6
- 239000003607 modifier Substances 0.000 claims abstract description 4
- 238000011160 research Methods 0.000 claims description 6
- 230000014509 gene expression Effects 0.000 claims description 5
- 210000001747 pupil Anatomy 0.000 claims description 4
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000003672 processing method Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 244000273928 Zingiber officinale Species 0.000 description 1
- 235000006886 Zingiber officinale Nutrition 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 235000008397 ginger Nutrition 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention provides a method for English text error correction and syntax analysis, which is used for processing English text, correcting error of wrong sentences, giving out error prompt information, and analyzing grammatical phenomena of sentences for correct sentences to give out information such as basic sentence patterns, sentence composition structures, phrase dependency relations, modifiers, fixed collocation and the like. Based on the invention, English beginners can be helped to more effectively improve writing ability, pertinently correct errors and learn sentence structures of excellent sentences.
Description
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to two research aspects, namely an automatic detection technology of English text errors on one hand and grammar analysis of English text syntax on the other hand. The method is mainly applied to tutoring of English writing, and meanwhile, the two technologies can be independently used.
Background
English is the most popular language in global application, and is the necessary skill for communication in global environment, while English text is the main medium for communication, especially for academic papers and written communication in business. Therefore, the improvement of the English writing ability becomes the demand of more and more people, the English text with excellent reading related topics is a strategy selected by most people when improving the writing ability of the English text, but the quality of the information on the network is uneven, a beginner cannot well select, and the error correction and syntactic analysis method of the English text can help the English beginner to effectively improve the writing ability.
The existing related technology is more limited to the automatic error detection part for solving English texts, and the structure and the grammar analysis part of correct sentences are not involved. For example, CN 108519974 a provides an automatic detection and analysis method for grammar errors of english compositions, which performs sentence segmentation, word segmentation, and spelling check on english compositions, performs part-of-speech tagging with a stanford analyzer, then corrects part-of-speech tagging, constructs a negative example rule flow chart, and returns the result; CN 101814065B patent, which proposes a syntax analysis apparatus and syntax analysis method using regular expressions to describe syntax analysis rules; besides, the application programs Grammarly, Ginger and WhiteSmoke published abroad and the correction network commonly used by domestic students carry out the correction of English texts.
In the existing error correction method and software, only attention is paid to prompting of wrong grammar of English texts, analysis of correct graceful sentences is omitted, and writing ability of English learners is improved. On the other hand, most of English text correction, basic grammar correction and the like can be realized, but because the universality is emphasized, the difference of language application capabilities of different crowd groups is ignored, so that a plurality of prompts have no pertinence, and even errors in a specific group can not be corrected basically. For example, the common problem of English writing for pupils is obviously different from the emphasis of error correction of scientific research and academic papers, the former attaches more importance to the application of grammatical structure and vocabulary, and the latter attaches more attention to the accuracy and understandability of expression of professional vocabulary.
Disclosure of Invention
In view of the above situation, the present invention provides a method for english text error correction and syntax analysis, which is used for processing an english text, correcting an error sentence, providing error prompt information, and for a correct sentence, parsing a grammatical phenomenon of the sentence, providing a basic sentence pattern of the sentence, a sentence composition structure, a phrase dependency relationship, a modifier, and a fixed collocation.
The supplement of English writing scenes and the recognition and error correction of corresponding grammar error patterns can be continuously perfected according to the requirements of users, and are used for any English education tutoring program
Compared with the prior art, the invention has the main innovation points that:
1. by adopting a proper logic structure, not only the error correction of English texts is considered, but also the correct grammatical phenomena of sentences can be output, and the most important point is that learners can be assisted to write and learn correct sentences (including graceful sentences) by using the grammatical analysis results of the correct sentences, learn sentence composition structures of example sentences, match words and the like, and improve the writing level;
2. the error correction expansion is carried out based on the LangeTool toolkit, not only a common error rule mode is supported, but also the corresponding error mode is added in a regularized mode according to different application scenes and different users, and the user mode encapsulation under different scenes is carried out.
The logic method for performing error correction and syntactic analysis on the English text comprises the following specific technical routes:
the method comprises the steps of logically dividing two modules of an input text according to whether a sentence has errors or not, carrying out a text error correction module if the sentence has errors, carrying out a syntactic analysis module if the sentence has no errors, and carrying out the syntactic analysis module if the sentence only has word spelling errors.
The specific processing flow of the text error correction module is carried out according to the following steps:
1) identifying common errors of English text grammar, and identifying the common errors by using any error correction kit;
2) addition of error rule patterns. The established scene is divided according to the two aspects of the learning stage of the domestic English composition and the application and the cutting of the English text, and the division of the learning stage of the domestic English composition comprises the following steps: pupils, junior high school students and the three scenes above; english text application genre partitioning comprises: three scenes, namely letter, description and narrative, and scientific research paper. For different scenarios of these two aspects, respective error patterns are added.
Further, for step 2), the supplement of the english writing scene and the recognition and error correction of the corresponding grammar error pattern can be continuously perfected according to the requirements of the user, and the method is used for any english education tutoring program.
And (II) the syntactic analysis module is mainly realized by using the related technology of automatic analysis of the English text of the patent I.
Drawings
FIG. 1 is a general logic flow diagram of the method of the present invention;
FIG. 2 is a detailed process flow diagram of the text correction module;
fig. 3 is a detailed process flow diagram of the text parsing module.
Detailed Description
The English text error correction and syntax analysis method mainly comprises a text error correction module and a syntax analysis module, and the logical relationship between the text error correction module and the syntax analysis module is shown in figure 1
The text error correction module firstly corrects the common grammar and then corrects the additional grammar according to the difference of the selected application scenes; the syntactic analysis module mainly provides relevant syntactic analysis for the correct sentence, and the syntactic analysis comprises information such as basic sentence patterns of the sentence, sentence composition structures, sentence syntactical sequences, phrase dependency relations, modifiers, fixed collocation and the like.
2.2.1 module one: text error correction module
The specific flow is shown in fig. 2, and the method mainly comprises two parts, wherein the first part uses a LanguageTool toolkit to correct common problems, and the second part adds error rules corresponding to specific fields or levels according to different application scenarios.
For the first part, the recognition of common errors in the grammar of the english text, we use the LanguageTool kit to process, and call the kit to realize the correction of common english text, but it should be noted that, other text correction kits are selected, the effect is basically consistent, and it does not depart from the processing procedure of us, and only different calling modes are available.
For the second part, the addition of error rule patterns. The established scene is divided according to the two aspects of the learning stage of the domestic English composition and the application and the cutting of the English text, wherein the division of the learning stage of the domestic English composition comprises the following steps: pupils, junior high school students and the three scenes above; english text application genre partitioning comprises: three scenes, namely letter, description and narrative, and scientific research paper. For different scenes in the two aspects, the emphasis points of the added error modes are different, and the following are specific:
A. according to the scenes divided by the learning stages of the domestic English composition, the adding rules are mainly added by utilizing the recognition of a set rule mode, and the adding rules comprise grammar phrase collocation under the corresponding scene stages, common writing errors and the like.
Any rule adding method can be used for realizing, such as error pattern expansion supported by a LanguageTool toolkit, and any other rule pattern adding method based on negative examples can also be used, wherein error pattern matching rules are mainly used, such as: the elementary school students usually have error pattern play + musical instruments, the instrument lacks the word, and only negative example rules are formed according to word combination (or regular matching) to be added, while the error pattern help sb. to do and preposition to are redundant, the words help + sb. + to + part of speech tagging results VB (representing word primitive) are needed to realize, for the error pattern of noun single number, the NNS (representing noun complex number) + VBP (representing verb non-third person to be referred to as singular) is completely dependent on the combined judgment of the part of speech tagging results.
B. Applying a scene of genre division according to English texts, wherein the addition rule mainly utilizes the difference of the genres, namely the difference of the composition structures of sentences, such as letters, and puts the emphasis on whether the sentences are simplified or not, the average word number contained in the sentences and the word difficulty and the common degree are recorded, the word difficulty and the common degree are obtained by comparing 5000 common words summarized and summarized by self, and the cutting division reminding is carried out on overlength clauses; the scientific research papers mainly investigate the expression accuracy of the professional vocabulary, give out by using a fixed expression matching mode of the professional vocabulary, namely summarize and summarize the professional vocabulary in the field, give out common and commonly occurring vocabularies in the same text of the professional vocabulary (namely, the vocabularies with high occurrence frequency in the professional text but low occurrence frequency in other professional fields), then count the occurrence frequency corresponding to the vocabularies for the given text, and then carry out replacement reminding on improper words; and explaining and narrating the text, mainly record the frequency of each word, when the frequency of word is too high, carry on the replacement of the word of similar meaning and remind, if the average word number of the sentence is less, carry on the warning that the sentence merges, if the conjunctive word is too few among the sentences, carry on the warning of conjunctive word.
It is worth noting that: the method for constructing the negative example rule by the sentence segmentation, word segmentation, part of speech tagging and correction of the text exists in the rule expansion process, but the claim point of the prior patents lies in the construction process, and the protection requirement point of the prior patents does not lie in the implementation of the process.
2.2.2 Module two: syntactic analysis module
The general processing flow is shown in fig. 3, and a more detailed parsing processing method is shown in patent I, wherein the parsing processing method comprises the whole content of a data preprocessing module of the module I, the whole process of syntactic analysis in the module II and the whole content of a corresponding module III result output module.
Claims (5)
1. The logic method for carrying out error correction and syntactic analysis on the English text is characterized by comprising the following steps: the processing process of the English text comprises two module parts, one part carries out text error correction, and the other part gives related syntactic analysis for a correct sentence.
2. The method of claim 1, further comprising: the text error correction module comprises the following two parts:
1) identifying common English grammar errors by using a common English text error correction tool;
2) addition of additional error rule patterns.
3. The method of claim 1, further comprising: the syntax analysis module obtains relevant syntax analysis by using English text regular expression matching and dependency syntax analysis results, wherein the relevant syntax analysis comprises information such as basic sentence patterns of sentences, sentence composition structures, sentence syntactical sequences, phrase dependency relations, modifiers, fixed collocation and the like.
4. The method of claim 2, wherein: and adding an additional error rule mode, and adding error rules specific to the corresponding field or level according to different application scenes.
5. The method of claim 4, wherein: the established scene is divided according to the two aspects of the learning stage of the domestic English composition and the application and the cutting of the English text, and the division of the learning stage of the domestic English composition comprises the following steps: pupils, junior high school students and the three scenes above; english text application genre partitioning comprises: three scenes, namely letter, description and narrative, and scientific research paper.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910238788.2A CN111767709A (en) | 2019-03-27 | 2019-03-27 | Logic method for carrying out error correction and syntactic analysis on English text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910238788.2A CN111767709A (en) | 2019-03-27 | 2019-03-27 | Logic method for carrying out error correction and syntactic analysis on English text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111767709A true CN111767709A (en) | 2020-10-13 |
Family
ID=72717962
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910238788.2A Pending CN111767709A (en) | 2019-03-27 | 2019-03-27 | Logic method for carrying out error correction and syntactic analysis on English text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111767709A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112036135A (en) * | 2020-11-06 | 2020-12-04 | 腾讯科技(深圳)有限公司 | Text processing method and related device |
CN112667208A (en) * | 2020-12-22 | 2021-04-16 | 深圳壹账通智能科技有限公司 | Translation error recognition method and device, computer equipment and readable storage medium |
CN112988995A (en) * | 2021-03-05 | 2021-06-18 | 广州大学 | English composition reading system and method |
CN113205084A (en) * | 2021-07-05 | 2021-08-03 | 北京一起教育科技有限责任公司 | English dictation correction method and device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814065A (en) * | 2009-02-23 | 2010-08-25 | 富士通株式会社 | Syntactic analysis device and syntactic analysis method |
CN107783958A (en) * | 2016-08-31 | 2018-03-09 | 科大讯飞股份有限公司 | A kind of object statement recognition methods and device |
CN107807915A (en) * | 2017-09-27 | 2018-03-16 | 北京百度网讯科技有限公司 | Error correcting model method for building up, device, equipment and medium based on error correction platform |
CN109376355A (en) * | 2018-10-08 | 2019-02-22 | 上海起作业信息科技有限公司 | English word and sentence screening technique, device, storage medium and electronic equipment |
-
2019
- 2019-03-27 CN CN201910238788.2A patent/CN111767709A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101814065A (en) * | 2009-02-23 | 2010-08-25 | 富士通株式会社 | Syntactic analysis device and syntactic analysis method |
CN107783958A (en) * | 2016-08-31 | 2018-03-09 | 科大讯飞股份有限公司 | A kind of object statement recognition methods and device |
CN107807915A (en) * | 2017-09-27 | 2018-03-16 | 北京百度网讯科技有限公司 | Error correcting model method for building up, device, equipment and medium based on error correction platform |
CN109376355A (en) * | 2018-10-08 | 2019-02-22 | 上海起作业信息科技有限公司 | English word and sentence screening technique, device, storage medium and electronic equipment |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112036135A (en) * | 2020-11-06 | 2020-12-04 | 腾讯科技(深圳)有限公司 | Text processing method and related device |
CN112667208A (en) * | 2020-12-22 | 2021-04-16 | 深圳壹账通智能科技有限公司 | Translation error recognition method and device, computer equipment and readable storage medium |
CN112988995A (en) * | 2021-03-05 | 2021-06-18 | 广州大学 | English composition reading system and method |
CN113205084A (en) * | 2021-07-05 | 2021-08-03 | 北京一起教育科技有限责任公司 | English dictation correction method and device and electronic equipment |
CN113205084B (en) * | 2021-07-05 | 2021-10-08 | 北京一起教育科技有限责任公司 | English dictation correction method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Black et al. | Statistically-driven computer grammars of English: The IBM/Lancaster approach | |
CN111767709A (en) | Logic method for carrying out error correction and syntactic analysis on English text | |
Dolezal | World Englishes and lexicography | |
Khumphee et al. | Grammatical errors in English essays written by Thai EFL undergraduate students | |
Tesfaye | A rule-based Afan Oromo Grammar Checker | |
Spyns et al. | Essential speech and language technology for Dutch | |
Hajar et al. | THE INTERFERENCE OF INDONESIAN ON THE STUDENTS’ENGLISH WRITING OF MUHAMMADIYAH UNIVERSITY OF MAKASSAR | |
Rosen | Building and Using Corpora of Non-Native Czech. | |
Low | English in East and South Asia in the post-Kachruvian era | |
Chen | The development of an interlanguage | |
US11341961B2 (en) | Multi-lingual speech recognition and theme-semanteme analysis method and device | |
US20160267811A1 (en) | Systems and methods for teaching foreign languages | |
Sullivan et al. | The global in the local: Young multilingual language learners write in North Sámi (Finland, Norway, Sweden) | |
Matić | Perception of the English element in the scientific register of Croatian ICT university educational material with graduate ICT students | |
Pellegrini et al. | ASR-based exercises for listening comprehension practice in European Portuguese | |
CN112988955B (en) | Multilingual voice recognition and topic semantic analysis method and device | |
TWI731493B (en) | Multi-lingual speech recognition and theme-semanteme analysis method and device | |
Bannò et al. | Towards automatic spoken grammatical error correction of L2 learners of English. | |
Шамуратова | Challenges in simultaneous interpretation | |
Cho | Assessing Nativelikeness of Korean College Students' English Writing Using fastText | |
Nikulásdóttir et al. | LANGUAGE TECHNOLOGY FOR ICELANDIC 2018-2022 | |
Van Nam et al. | Building a spelling checker for documents in Khmer language | |
Leturia et al. | The BerbaTek project for Basque: Promoting a less-resourced language via language technology for translation, content management and learning | |
Jose et al. | Noisy SMS text normalization model | |
Aimuratova | Challenges in simultaneous interpretation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |