CN111767709A

CN111767709A - Logic method for carrying out error correction and syntactic analysis on English text

Info

Publication number: CN111767709A
Application number: CN201910238788.2A
Authority: CN
Inventors: 戴翰波; 李辉; 王丽
Original assignee: Wuhan Huiren Information Technology Co ltd
Current assignee: Wuhan Huiren Information Technology Co ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2020-10-13

Abstract

The invention provides a method for English text error correction and syntax analysis, which is used for processing English text, correcting error of wrong sentences, giving out error prompt information, and analyzing grammatical phenomena of sentences for correct sentences to give out information such as basic sentence patterns, sentence composition structures, phrase dependency relations, modifiers, fixed collocation and the like. Based on the invention, English beginners can be helped to more effectively improve writing ability, pertinently correct errors and learn sentence structures of excellent sentences.

Description

Logic method for carrying out error correction and syntactic analysis on English text

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to two research aspects, namely an automatic detection technology of English text errors on one hand and grammar analysis of English text syntax on the other hand. The method is mainly applied to tutoring of English writing, and meanwhile, the two technologies can be independently used.

Background

English is the most popular language in global application, and is the necessary skill for communication in global environment, while English text is the main medium for communication, especially for academic papers and written communication in business. Therefore, the improvement of the English writing ability becomes the demand of more and more people, the English text with excellent reading related topics is a strategy selected by most people when improving the writing ability of the English text, but the quality of the information on the network is uneven, a beginner cannot well select, and the error correction and syntactic analysis method of the English text can help the English beginner to effectively improve the writing ability.

The existing related technology is more limited to the automatic error detection part for solving English texts, and the structure and the grammar analysis part of correct sentences are not involved. For example, CN 108519974 a provides an automatic detection and analysis method for grammar errors of english compositions, which performs sentence segmentation, word segmentation, and spelling check on english compositions, performs part-of-speech tagging with a stanford analyzer, then corrects part-of-speech tagging, constructs a negative example rule flow chart, and returns the result; CN 101814065B patent, which proposes a syntax analysis apparatus and syntax analysis method using regular expressions to describe syntax analysis rules; besides, the application programs Grammarly, Ginger and WhiteSmoke published abroad and the correction network commonly used by domestic students carry out the correction of English texts.

In the existing error correction method and software, only attention is paid to prompting of wrong grammar of English texts, analysis of correct graceful sentences is omitted, and writing ability of English learners is improved. On the other hand, most of English text correction, basic grammar correction and the like can be realized, but because the universality is emphasized, the difference of language application capabilities of different crowd groups is ignored, so that a plurality of prompts have no pertinence, and even errors in a specific group can not be corrected basically. For example, the common problem of English writing for pupils is obviously different from the emphasis of error correction of scientific research and academic papers, the former attaches more importance to the application of grammatical structure and vocabulary, and the latter attaches more attention to the accuracy and understandability of expression of professional vocabulary.

Disclosure of Invention

In view of the above situation, the present invention provides a method for english text error correction and syntax analysis, which is used for processing an english text, correcting an error sentence, providing error prompt information, and for a correct sentence, parsing a grammatical phenomenon of the sentence, providing a basic sentence pattern of the sentence, a sentence composition structure, a phrase dependency relationship, a modifier, and a fixed collocation.

The supplement of English writing scenes and the recognition and error correction of corresponding grammar error patterns can be continuously perfected according to the requirements of users, and are used for any English education tutoring program

Compared with the prior art, the invention has the main innovation points that:

1. by adopting a proper logic structure, not only the error correction of English texts is considered, but also the correct grammatical phenomena of sentences can be output, and the most important point is that learners can be assisted to write and learn correct sentences (including graceful sentences) by using the grammatical analysis results of the correct sentences, learn sentence composition structures of example sentences, match words and the like, and improve the writing level;

2. the error correction expansion is carried out based on the LangeTool toolkit, not only a common error rule mode is supported, but also the corresponding error mode is added in a regularized mode according to different application scenes and different users, and the user mode encapsulation under different scenes is carried out.

The logic method for performing error correction and syntactic analysis on the English text comprises the following specific technical routes:

the method comprises the steps of logically dividing two modules of an input text according to whether a sentence has errors or not, carrying out a text error correction module if the sentence has errors, carrying out a syntactic analysis module if the sentence has no errors, and carrying out the syntactic analysis module if the sentence only has word spelling errors.

The specific processing flow of the text error correction module is carried out according to the following steps:

1) identifying common errors of English text grammar, and identifying the common errors by using any error correction kit;

2) addition of error rule patterns. The established scene is divided according to the two aspects of the learning stage of the domestic English composition and the application and the cutting of the English text, and the division of the learning stage of the domestic English composition comprises the following steps: pupils, junior high school students and the three scenes above; english text application genre partitioning comprises: three scenes, namely letter, description and narrative, and scientific research paper. For different scenarios of these two aspects, respective error patterns are added.

Further, for step 2), the supplement of the english writing scene and the recognition and error correction of the corresponding grammar error pattern can be continuously perfected according to the requirements of the user, and the method is used for any english education tutoring program.

And (II) the syntactic analysis module is mainly realized by using the related technology of automatic analysis of the English text of the patent I.

Drawings

FIG. 1 is a general logic flow diagram of the method of the present invention;

FIG. 2 is a detailed process flow diagram of the text correction module;

fig. 3 is a detailed process flow diagram of the text parsing module.

Detailed Description

The English text error correction and syntax analysis method mainly comprises a text error correction module and a syntax analysis module, and the logical relationship between the text error correction module and the syntax analysis module is shown in figure 1

The text error correction module firstly corrects the common grammar and then corrects the additional grammar according to the difference of the selected application scenes; the syntactic analysis module mainly provides relevant syntactic analysis for the correct sentence, and the syntactic analysis comprises information such as basic sentence patterns of the sentence, sentence composition structures, sentence syntactical sequences, phrase dependency relations, modifiers, fixed collocation and the like.

2.2.1 module one: text error correction module

The specific flow is shown in fig. 2, and the method mainly comprises two parts, wherein the first part uses a LanguageTool toolkit to correct common problems, and the second part adds error rules corresponding to specific fields or levels according to different application scenarios.

For the first part, the recognition of common errors in the grammar of the english text, we use the LanguageTool kit to process, and call the kit to realize the correction of common english text, but it should be noted that, other text correction kits are selected, the effect is basically consistent, and it does not depart from the processing procedure of us, and only different calling modes are available.

For the second part, the addition of error rule patterns. The established scene is divided according to the two aspects of the learning stage of the domestic English composition and the application and the cutting of the English text, wherein the division of the learning stage of the domestic English composition comprises the following steps: pupils, junior high school students and the three scenes above; english text application genre partitioning comprises: three scenes, namely letter, description and narrative, and scientific research paper. For different scenes in the two aspects, the emphasis points of the added error modes are different, and the following are specific:

A. according to the scenes divided by the learning stages of the domestic English composition, the adding rules are mainly added by utilizing the recognition of a set rule mode, and the adding rules comprise grammar phrase collocation under the corresponding scene stages, common writing errors and the like.

Any rule adding method can be used for realizing, such as error pattern expansion supported by a LanguageTool toolkit, and any other rule pattern adding method based on negative examples can also be used, wherein error pattern matching rules are mainly used, such as: the elementary school students usually have error pattern play + musical instruments, the instrument lacks the word, and only negative example rules are formed according to word combination (or regular matching) to be added, while the error pattern help sb. to do and preposition to are redundant, the words help + sb. + to + part of speech tagging results VB (representing word primitive) are needed to realize, for the error pattern of noun single number, the NNS (representing noun complex number) + VBP (representing verb non-third person to be referred to as singular) is completely dependent on the combined judgment of the part of speech tagging results.

B. Applying a scene of genre division according to English texts, wherein the addition rule mainly utilizes the difference of the genres, namely the difference of the composition structures of sentences, such as letters, and puts the emphasis on whether the sentences are simplified or not, the average word number contained in the sentences and the word difficulty and the common degree are recorded, the word difficulty and the common degree are obtained by comparing 5000 common words summarized and summarized by self, and the cutting division reminding is carried out on overlength clauses; the scientific research papers mainly investigate the expression accuracy of the professional vocabulary, give out by using a fixed expression matching mode of the professional vocabulary, namely summarize and summarize the professional vocabulary in the field, give out common and commonly occurring vocabularies in the same text of the professional vocabulary (namely, the vocabularies with high occurrence frequency in the professional text but low occurrence frequency in other professional fields), then count the occurrence frequency corresponding to the vocabularies for the given text, and then carry out replacement reminding on improper words; and explaining and narrating the text, mainly record the frequency of each word, when the frequency of word is too high, carry on the replacement of the word of similar meaning and remind, if the average word number of the sentence is less, carry on the warning that the sentence merges, if the conjunctive word is too few among the sentences, carry on the warning of conjunctive word.

It is worth noting that: the method for constructing the negative example rule by the sentence segmentation, word segmentation, part of speech tagging and correction of the text exists in the rule expansion process, but the claim point of the prior patents lies in the construction process, and the protection requirement point of the prior patents does not lie in the implementation of the process.

2.2.2 Module two: syntactic analysis module

The general processing flow is shown in fig. 3, and a more detailed parsing processing method is shown in patent I, wherein the parsing processing method comprises the whole content of a data preprocessing module of the module I, the whole process of syntactic analysis in the module II and the whole content of a corresponding module III result output module.

Claims

1. The logic method for carrying out error correction and syntactic analysis on the English text is characterized by comprising the following steps: the processing process of the English text comprises two module parts, one part carries out text error correction, and the other part gives related syntactic analysis for a correct sentence.

2. The method of claim 1, further comprising: the text error correction module comprises the following two parts:

1) identifying common English grammar errors by using a common English text error correction tool;

2) addition of additional error rule patterns.

3. The method of claim 1, further comprising: the syntax analysis module obtains relevant syntax analysis by using English text regular expression matching and dependency syntax analysis results, wherein the relevant syntax analysis comprises information such as basic sentence patterns of sentences, sentence composition structures, sentence syntactical sequences, phrase dependency relations, modifiers, fixed collocation and the like.

4. The method of claim 2, wherein: and adding an additional error rule mode, and adding error rules specific to the corresponding field or level according to different application scenes.

5. The method of claim 4, wherein: the established scene is divided according to the two aspects of the learning stage of the domestic English composition and the application and the cutting of the English text, and the division of the learning stage of the domestic English composition comprises the following steps: pupils, junior high school students and the three scenes above; english text application genre partitioning comprises: three scenes, namely letter, description and narrative, and scientific research paper.