CN114970569A - Automatic question solving method, device and storage medium for Chinese-English translation test questions - Google Patents

Automatic question solving method, device and storage medium for Chinese-English translation test questions Download PDF

Info

Publication number
CN114970569A
CN114970569A CN202210515567.7A CN202210515567A CN114970569A CN 114970569 A CN114970569 A CN 114970569A CN 202210515567 A CN202210515567 A CN 202210515567A CN 114970569 A CN114970569 A CN 114970569A
Authority
CN
China
Prior art keywords
chinese
translation
question
text
english
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210515567.7A
Other languages
Chinese (zh)
Inventor
崔寅生
胡科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunsizhixue Technology Co ltd
Original Assignee
Beijing Yunsizhixue Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunsizhixue Technology Co ltd filed Critical Beijing Yunsizhixue Technology Co ltd
Priority to CN202210515567.7A priority Critical patent/CN114970569A/en
Publication of CN114970569A publication Critical patent/CN114970569A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an automatic question solving method, device and storage medium for Chinese-translation-English translation test questions, wherein the automatic question solving method for the Chinese-translation-English translation test questions comprises the following steps: training the speech material aiming at the pre-training language model based on Chinese and English texts to obtain an automatic problem solving model; performing text preprocessing on the Chinese-translation-English translation test question text to obtain a uniform format test question text; coding the uniform format test question text based on the automatic question solving model to obtain a coded text, wherein the coded text contains filling and covering characters of a to-be-translated answer part in a corresponding Chinese-translation-English translation test question; and searching and decoding the covered characters in the coded text based on the automatic question solving model, and automatically generating question solving answers of Chinese-translation-English translation test questions. The invention is helpful for the students to give guidance in the answering process of Chinese-to-English translation test questions and to realize automatic correction by comparing the answering result of the Chinese-to-English translation test questions with the automatic answering result.

Description

Automatic question solving method, device and storage medium for Chinese-English translation test questions
Technical Field
The invention relates to the technical field of natural language processing, in particular to an automatic question solving method, device and storage medium for Chinese translation and English translation test questions.
Background
In order to help primary and secondary school students to better master and use English, English training test questions of the primary and secondary schools contain a large number of Chinese translation and English translation test questions. The Chinese translation and English translation test questions have various question types, such as a given Chinese sentence and a given part of English sentences shown in FIG. 1, and words in vacant positions in the English sentences are filled according to the Chinese sentences; the usage scenario is described in fig. 2, and english sentences are translated according to the usage scenario prompts.
The existing Chinese-translation-English translation test questions are generally answered manually, when students answer the Chinese-translation-English translation test questions, people without certain English experience cannot give guidance, when English teachers modify the Chinese-translation-English translation test questions, the students need to modify manually one by one, batch automatic modification cannot be performed, and modification workload of the English teachers is increased.
In view of the above, the invention is especially provided for solving the automatic problem solving problem of the Chinese-to-English translation test questions, which is helpful for students to give guidance in the answering process of the Chinese-to-English translation test questions and realizes the automatic correction of the answering results of the Chinese-to-English translation test questions.
Disclosure of Invention
In order to solve the above problems, the present invention provides an automatic question solving method, device and storage medium for Chinese-to-English translation test questions, specifically, the following technical scheme is adopted:
an automatic question solving method for Chinese-English translation test questions comprises the following steps:
training the speech material aiming at the pre-training language model based on Chinese and English texts to obtain an automatic problem solving model;
performing text preprocessing on the Chinese-translation-English translation test question text to obtain a uniform format test question text;
coding the uniform format test question text based on the automatic question solving model to obtain a coded text, wherein the coded text contains filling and covering characters of a to-be-translated answer part in a corresponding Chinese-translation-English translation test question;
and searching and decoding the covered characters in the coded text based on the automatic question solving model, and automatically generating question solving answers of Chinese-translation-English translation test questions.
As an optional implementation manner of the present invention, in the method for automatically solving the problem of the chinese-to-english translation test question, the method for training a language model for pre-training on a speech based on a chinese-to-english text to obtain an automatic problem solving model includes:
unifying large-scale general parallel corpora into Chinese + English sentence pairs;
and training a pre-training language model based on the sentence pairs of Chinese and English to obtain a primary automatic problem solving model.
As an optional implementation manner of the present invention, in the method for automatically solving the problem of the chinese-to-english translation test question, the training of the language model for the pre-training based on the chinese-to-english text to obtain the automatic problem solving model includes:
obtaining a corpus text related to Chinese translation and English translation test questions in a set field, and uniformly arranging the corpus text into Chinese + English sentence pairs;
the Chinese and English sentence pair enhanced training primary automatic problem solving model is unified and sorted based on the set field, and an enhanced automatic problem solving model is obtained;
optionally, when the set field is an education field, the corpus text related to the Chinese-translation-English translation test question in the education field includes a question bank, a textbook and a corpus similar to the education field is filtered out from the general parallel corpus through the model.
As an optional implementation manner of the present invention, in the method for automatically solving the problem of the chinese-to-english translation test question, the training of the language model for the pre-training based on the chinese-to-english text to obtain the automatic problem solving model includes:
designing MASK tasks according to the question type characteristics of Chinese translation questions to convert the Chinese translation questions with various question types into a unified input and output format;
and continuously training the enhanced automatic problem solving model based on the MASK task to finally obtain the automatic problem solving model.
As an optional implementation manner of the present invention, in the method for automatically solving questions of chinese-to-english translation test questions according to the present invention, the designing MASK task according to question type characteristics of the chinese-to-english translation test questions to convert the chinese-to-english translation test questions of various question types into a unified input + output format includes:
the question type characteristic of the Chinese translation-English translation test question is that certain prompts are given to complement the remaining words to be translated;
the MASK task converts the Chinese translation question into a unified input and output format:
[ sentence start identifier ] chinese sentence part [ sentence end identifier ] + english sentence part + [ MASK ] part [ sentence end identifier ];
wherein, the [ MASK ] part corresponds to the vacant position of the word needing to be completed in the English sentence in the Chinese translation question.
As an optional implementation manner of the present invention, in the method for automatically solving questions of chinese-to-english translation test questions according to the present invention, the designing MASK task according to question type characteristics of the chinese-to-english translation test questions to convert the chinese-to-english translation test questions of various question types into a unified input + output format includes:
the question type characteristic of the Chinese-English translation test question is that a part of scene prompts are given to translate the whole sentence;
the MASK task converts the Chinese translation question into a unified input and output format:
[ sentence start identifier ] + [ question identification ] scene prompts chinese sentence part [ sentence end identifier ] + english sentence part [ sentence end identifier ].
As an optional implementation manner of the present invention, in the method for automatically solving the problem of the chinese-to-english translation test question, the text preprocessing is performed on the text of the chinese-to-english translation test question to obtain the test question text in the uniform format, and the method includes:
scanning the Chinese-translation-English translation test question text to be solved to obtain an original OCR text;
carrying out structuralization processing on the original OCR text to obtain a structuralized OCR text;
constructing a unified MASK text based on the structured OCR text.
As an optional implementation manner of the present invention, in the automatic question solving method for chinese-to-english translation test questions, search and decoding are performed on the masked characters in the encoded text based on the automatic question solving model, and search and decoding are performed by using bundle search in automatically generating answer to questions solved for chinese-to-english translation test questions.
The invention also provides an automatic question solving device for Chinese-translation-English translation test questions, which comprises:
the automatic problem solving model training module is used for training the speech aiming at the pre-training language model based on large-scale Chinese and English texts to obtain an automatic problem solving model;
and an automatic problem solving module, the automatic problem solving module comprising:
the text format processing unit is used for carrying out text preprocessing on the Chinese-translation-English translation test question text to obtain a uniform format test question text;
the text coding processing unit is used for coding the uniform format test question text based on the automatic question solving model to obtain a coded text, and the coded text contains filling and covering characters of a part to be translated and solved in a corresponding Chinese-translation-English translation test question;
and the text decoding processing unit is used for searching and decoding the covered characters in the coded text based on the automatic problem solving model and automatically generating the problem solving answers of the Chinese-translation-English translation test questions.
The invention also provides a storage medium which stores a computer executable program, and when the computer executable program is executed, the automatic question solving method for the Chinese-translation-English translation test question is realized.
Compared with the prior art, the invention has the beneficial effects that:
according to the automatic question solving method for the Chinese-to-English translation test questions, disclosed by the invention, the automatic question solving model is obtained by training the speech aiming at the pre-training language model based on the large-scale Chinese-to-English text, the automatic answer of the Chinese-to-English translation test questions is realized by using the automatic question solving model, and the automatic correction is realized by helping students to give guidance in the answering process of the Chinese-to-English translation test questions and comparing the answering result of the Chinese-to-English translation test questions with the automatic answering result.
Description of the drawings:
FIG. 1 is a diagram illustrating an example of Chinese-to-English translation test question according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an example of another topic type of Chinese-to-English translation test according to an embodiment of the present invention;
FIG. 3 is a flow chart of an automatic problem solving method for Chinese-translation-English translation test questions according to an embodiment of the present invention;
FIG. 4 is a training flow chart of an automatic problem solving model in the method for automatically solving the problems of Chinese translation and English translation according to the embodiment of the present invention;
FIG. 5 is a diagram illustrating examples of sentence pairs of Chinese + English which are unified for parallel corpora in the automatic question solving method for Chinese-to-English translation questions according to the embodiment of the present invention;
FIG. 6 is a diagram illustrating examples of sentence pairs of Chinese + English in a unified manner for corpus texts related to Chinese translation and English translation test questions in the field of education in an automatic question solving method for Chinese translation and English translation test questions according to an embodiment of the present invention;
FIG. 7 is a flowchart illustrating an automatic problem solving method for Chinese-to-English translation questions according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments.
Thus, the following detailed description of the embodiments of the invention is not intended to limit the scope of the invention as claimed, but is merely representative of some embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments of the present invention and the features and technical solutions thereof may be combined with each other without conflict.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that the terms "upper", "lower", and the like refer to orientations or positional relationships based on those shown in the drawings, or orientations or positional relationships that are conventionally arranged when the products of the present invention are used, or orientations or positional relationships that are conventionally understood by those skilled in the art, and such terms are used for convenience of description and simplification of the description, and do not refer to or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Referring to fig. 3, the automatic question solving method for chinese-to-english translation test questions of the present embodiment includes:
training the speech aiming at the pre-training language model based on large-scale Chinese and English texts to obtain an automatic problem solving model;
performing text preprocessing on the Chinese-translation-English translation test question text to obtain a uniform format test question text;
coding the uniform format test question text based on the automatic question solving model to obtain a coded text, wherein the coded text contains filling and covering characters of a to-be-translated answer part in a corresponding Chinese-translation-English translation test question;
and searching and decoding the covered characters in the coded text based on the automatic question solving model, and automatically generating question solving answers of Chinese-translation-English translation test questions.
The automatic question solving method for the Chinese-to-English translation test questions is characterized in that a large-scale Chinese-to-English text-based language is trained aiming at a pre-training language model to obtain an automatic question solving model, the automatic question solving model is used for achieving automatic answer of the Chinese-to-English translation test questions, and the automatic question solving method is helpful for a student to give guidance in the answering process of the Chinese-to-English translation test questions and is beneficial for achieving automatic correction by comparing answer results of the Chinese-to-English translation test questions with automatic answer results.
The Chinese-to-English translation test question aimed at in the embodiment is mainly for primary and middle school students, the question type of the Chinese-to-English translation test question is that a certain prompt is given as shown in fig. 1 to complement the remaining words to be translated (complete shape and fill in the blank), or a part of scene prompts are given as shown in fig. 2 to translate the whole sentence (scene translation), and the pre-training language model adopts a UniLM model according to the question type characteristics of the test question.
UniLM is a pre-trained language model, called the unified pre-trained language model. The method can complete one-way, sequence-to-sequence and two-way prediction tasks, combines the advantages of AR and AE language models, and the Unilm obtains the best performance in the sampling field of abstract, generative question answering and language generation data sets.
The UniLM is also a multi-layer Transformer network and can simultaneously complete three pre-training targets, and the UniLM model completes the prediction of mask words and the shape filling task based on the context of the mask words. The context is different for different training targets.
The language model is trained in one way, and the context of the mask word is words on one side, left side or right side.
The context of the mask word is words on the left and right sides.
In the Seq-to-Seq language model, the left Seq is called source sequence, the right Seq is called target sequence, and what we want to predict is target sequence, so the context is all source sequences and the predicted target sequences on the left side.
According to the targets pre-trained by the UniLM model, aiming at training targets of different language models, in a certain shape filling task, some WordPiece can be randomly selected to be replaced by [ MASK ], then corresponding output vectors are obtained through the calculation of a transform network, and then the output vectors are fed into a softmax classifier to predict word of the [ MASK ]. The goal of UniLM parameter optimization is to minimize the cross entropy between the predicted and true values of the [ MASK ] token. It is noted that since a full fill-in-blank task is used, the same training procedure can be used for all language models (whether unidirectional or bi-directional).
The one-way language model:
and the unidirectional language model adopts training targets from left to right and from right to left respectively. Taking the left to right example, for example to predict a MASK in the sequence 'X1X2[ MASK ] X4', only X1, X2 and its own information are available, and X4 information is not available.
The bidirectional language model:
the bi-directional language model, also exemplified by 'X1X2[ MASK ] X4', where X1, X2, X4 and their own information are all available, can generate better context-dependent token tokens than the uni-directional language model.
Sequence-to-sequence language model:
the left sequence is actually our known sequence called source sequence, and the right sequence is our desired sequence called target sequence. The left sequence belongs to the encoding stage, so that the mutual context information can be seen; the right sequence belongs to the decoding stage, and the source sequence information, the left information of the target sequence, and the own information can be seen. By way of example with T1T2- > T3T4T5, our inputs become [ SOS ] T1T2[ EOS ] T3T4T5[ EOS ], T1 and T2 can see each other and both sides [ SOS ] and [ EOS ]; while T4 can see [ SOS ], T1, T2, [ EOS ], T3 and its own information.
During training, token in the source sequence and the target sequence can be randomly replaced by MASK so as to achieve the purpose of model learning and training. While predicting [ MASK ], because the two statement pairs are packed together, the model does not literally learn the close relationship that exists between the two statements. This is very useful in NLG tasks, such as abstract digests.
The UniLM model has three advantages:
three different training objectives, network parameter sharing.
Due to the sharing of network parameters, the model is prevented from being over-fitted to a single language model, and the learned model is more generic and more universal.
Because the Seq-to-Seq language model is adopted, the NLU task can be completed, and at the same time, the NLG task can also be completed, for example: abstract, question and answer generation.
As an optional implementation manner of this embodiment, referring to fig. 4, in the automatic problem solving method for chinese-to-english translation test questions according to this embodiment, the training of corpora for a UniLM model based on large-scale chinese-to-english texts to obtain an automatic problem solving model includes:
unifying large-scale general parallel corpora into sentence pairs of Chinese and English, as shown in FIG. 5;
and training a UniLM model based on the sentence pairs of Chinese and English to obtain a primary automatic problem solving model.
The corpus is a basic resource for linguistic research of the corpus and is also a main resource of an empirical language research method. The method is applied to the aspects of lexicography, language teaching, traditional language research, statistical or example-based research in natural language processing and the like. The corpus can also be divided into Monolingual (Monolingual), Bilingual (bilinngual) and Multilingual (multilinngual) according to the language of the corpus. Bilingual and multilingual corpora can be divided into parallel (aligned) corpora and comparison corpora according to the organization form of the corpora, the corpora of the bilingual and multilingual corpora form a translation relation and are mainly used in the application fields of machine translation, bilingual dictionary compiling and the like, and the language comparison research is mainly used for collecting different language texts expressing the same content. In order to realize the automatic problem solving of the Chinese-translation-English translation test problem, the automatic problem solving model of the embodiment trains the UniLM model by adopting large-scale general parallel linguistic data.
Further, the method for automatically solving the problems of the chinese-to-english translation test questions in this embodiment is particularly directed to automatically solving the chinese-to-english translation test questions in the set field, specifically, the embodiment takes the education field as an example, and is based on the question type characteristics of the chinese-to-english translation test questions in the education field, and the embodiment trains the linguistic data to the UniLM model based on the large-scale chinese-to-english text to obtain the automatic problem solving model includes:
obtaining a corpus text related to Chinese-translation-English translation test questions in the education field, and uniformly arranging the corpus text into Chinese + English sentence pairs, which are shown in FIG. 6;
the Chinese and English sentence pair enhanced training primary automatic problem solving model is unified and sorted based on the education field, and an enhanced automatic problem solving model is obtained;
the Chinese-translation-English translation test question related language material text in the education field comprises a question bank, a textbook and a language material which is similar to the education field and is filtered from general parallel language materials through a model.
The Chinese-translation-English test question of the embodiment includes a certain prompt given as shown in FIG. 1 to complement words (complete shape and fill in the blank) which need to be translated, or a part of scene prompts given as shown in FIG. 2 to translate a whole sentence (scene translation), so that the question type characteristics of the Chinese-translation-English test question aimed by the automatic question solving method for the Chinese-translation-English test question of the embodiment are trained in an enhanced manner, so as to improve the accuracy of the automatic question solving of the Chinese-translation-English translation test question aimed at the field of education.
As an optional implementation manner of this embodiment, in the automatic problem solving method for chinese-to-english translation test questions according to this embodiment, the training of linguistic data for a UniLM model based on large-scale chinese-to-english text to obtain an automatic problem solving model includes:
designing MASK tasks according to question type characteristics of Chinese translation question to convert Chinese translation question with various question types into a unified input and output format;
and continuously training the enhanced automatic problem solving model based on the MASK task to finally obtain the automatic problem solving model.
In language models, it is often necessary to predict the next word from the previous one, but if the selection is to be applied in the LM or context information is used at the same time, a mask is needed to "mask" it in order not to reveal the tag information to be predicted. Different mask modes also correspond to the paper of the discourse. An NLU task is required to be carried out in the process of the automatic problem solving model, and the NLU task has the function of enabling a machine to accurately understand natural language generated by human beings. For the task of NLG, S1: source segment, S2: target segment, then the input is "[ SOS ] S1[ EOS ] S2[ EOS ]". We are also random mask some span as in pre-training, with the goal of maximizing the probability of token of our mask at a given context. It is noted that [ EOS ], while marks the end of the target sequence may also be masked, as this allows the model to learn when to generate [ EOS ] and thus mark the end of text generation.
Specifically, in the automatic question solving method for chinese-translation-english translation test questions according to this embodiment, the step of designing a span mask task according to question type characteristics of the chinese-translation-english translation test questions to convert the chinese-translation-english translation test questions of various question types into a unified input + output format includes:
the question type characteristic of the Chinese-English translation test question is that a certain prompt is given to complement the remaining words to be translated;
the span mask task converts the Chinese-translation-English translation test questions into a unified input + output format: [ sentence start identifier ] chinese sentence part [ sentence end identifier ] + english sentence part + [ MASK ] part [ sentence end identifier ];
wherein, the [ MASK ] part corresponds to the vacant position of the word needing to be completed in the English sentence in the Chinese translation question.
For the example Chinese-translation-English translation test question in FIG. 1, the format of the span mask task after conversion is: [ SOS ] I like a park with a swimming pool. [ EOS ] I love the park [ MASK ] [ MASK ] [ MASK ] A bathing pool [ EOS ].
Specifically, in the automatic question solving method for chinese-translation-english translation test questions according to this embodiment, the step of designing a span mask task according to question type characteristics of the chinese-translation-english translation test questions to convert the chinese-translation-english translation test questions of various question types into a unified input + output format includes:
the question type characteristic of the Chinese-English translation test question is that a part of scene prompts are given to translate the whole sentence;
the span mask task converts the Chinese-translation-English translation test questions into a unified input + output format:
[ sentence start identifier ] + [ question identification ] scene prompts chinese sentence part [ sentence end identifier ] + english sentence part [ sentence end identifier ].
For the example Chinese-translation-English translation test question in FIG. 2, the format of the span mask task after conversion is: [ SOS ] [ ASK ] do you want to know which musician your friends prefer [ EOS ] Who is you favorite music? [ EOS ].
Natural language processing common identifiers:
< UNK > low frequency words or words not in the vocabulary;
completing characters in the PAD;
< GO >/< SOS > < sentence Start identifier;
< EOS > < sentence end identifier;
[ SEP ]: a separator between two sentences;
[ MASK ]: filling in the masked characters.
As an optional implementation manner of this embodiment, in the automatic problem solving method for chinese-to-english translation test questions according to this embodiment, the performing text preprocessing on the text of the chinese-to-english translation test questions to obtain the text of the test questions in the unified format includes:
scanning the Chinese-translation-English translation test question text to be solved to obtain an original OCR text;
carrying out structuralization processing on the original OCR text to obtain a structuralized OCR text;
constructing a unified MASK text based on the structured OCR text.
The automatic answer method for the Chinese-translation-English translation test questions of the embodiment is based on the automatic answer model generated by training, so that the automatic answer of the Chinese-translation-English translation test questions is realized, and in the automatic answer process, the Chinese-translation-English translation test question texts need to be preprocessed to construct unified MASK texts so as to be used for realizing the automatic answer by the automatic answer model.
OCR (optical character recognition) technology is mainly to recognize characters in an image as an editable character string. The ocr technology in the early stage mainly identifies simple document images, and the current ocr technology is widely applied to character recognition of images in various complex scenes due to the development of deep learning. However, the result identified by the OCR technology is only a string of editable character strings, and does not contain any structured information, and the business processing needs character recognition of different boards in the whole view, so that the original OCR text is structured, the original OCR text of the chinese-to-english translation test question text can be processed in a targeted manner, more accurate information can be obtained, the accuracy of solving the question by using the automatic question solving model is ensured, and the efficiency of automatically solving the question can be improved.
Further, in the automatic question solving method for the chinese-to-english translation test questions according to this embodiment, search decoding is performed on the masked characters in the encoded text based on the automatic question solving model, and search decoding is performed by using a beam search in automatically generating the answer to the question solved for the chinese-to-english translation test questions.
Beam Search is a heuristic graph Search algorithm, which is generally used in the case of a large solution space of a graph, in order to reduce the space and time occupied by the Search, some nodes with poor quality are cut off and some nodes with high quality are reserved when the depth of each step is extended. This reduces space consumption and improves time efficiency, but the disadvantage is that there is a potential for the best solution to be discarded, and therefore the Beam Search algorithm is incomplete and is typically used in systems with large solution spaces.
Common scenarios for this algorithm are as follows: machine translation and voice recognition, and when the data set of the system is large, the computing resources are limited, and no unique optimal solution exists, the algorithm can quickly find the solution close to the most correct solution.
This embodiment provides an automatic device of solving a question of chinese translation english translation examination question simultaneously, includes:
the automatic problem solving model training module is used for training the speech aiming at the pre-training language model based on large-scale Chinese and English texts to obtain an automatic problem solving model;
and an automatic problem solving module, the automatic problem solving module comprising:
the text format processing unit is used for carrying out text preprocessing on the Chinese-translation-English translation test question text to obtain a uniform format test question text;
the text coding processing unit is used for coding the uniform format test question text based on the automatic question solving model to obtain a coded text, and the coded text contains filling and covering characters of a part to be translated and solved in a corresponding Chinese-translation-English translation test question;
and the text decoding processing unit is used for searching and decoding the covered characters in the coded text based on the automatic question solving model and automatically generating answers to questions solved by Chinese-translation-English translation test questions.
The automatic answer device of chinese-to-english translation examination questions of this embodiment, automatic answer model training module obtains the automatic answer model through training to the language model of training in advance to the material based on extensive chinese-to-english text, and the automatic answer module realizes the automatic answer of chinese-to-english translation examination questions through using the automatic answer model, helps the student give the tutor in chinese-to-english translation examination questions answering process to and compare the realization is criticized automatically to chinese-to-english translation examination questions answering result and automatic answer result.
The Chinese-to-English translation test question aimed at by the embodiment is mainly for primary and middle school students, the question form of the Chinese-to-English translation test question is that a certain prompt is given as shown in FIG. 1 to complement words to be translated (complete shape and fill in blank), or a part of scene prompts are given as shown in FIG. 2 to translate a whole sentence (scene translation), and the pre-training language model trained by the automatic question solving model training module adopts a UniLM model according to the question form characteristics of the test question.
As an optional implementation manner of this embodiment, referring to fig. 4, in the automatic problem solving device for chinese-to-english translation test questions according to this embodiment, the automatic problem solving model training module training corpora based on large-scale chinese-to-english texts to obtain an automatic problem solving model by aiming at a UniLM model includes:
unifying large-scale general parallel corpora into sentence pairs of Chinese and English, as shown in FIG. 5;
and training a UniLM model based on the sentence pairs of Chinese and English to obtain a primary automatic problem solving model.
Further, the automatic question solving device for chinese-to-english translation test questions in this embodiment is especially for automatic answer of chinese-to-english translation test questions in the education field, and based on the question model characteristics of chinese-to-english translation test questions in the education field, the automatic question solving model training module in this embodiment trains the language for the UniLM model based on large-scale chinese-to-english text to obtain the automatic question solving model, including:
obtaining a corpus text related to Chinese-translation-English translation test questions in the education field, and uniformly arranging the corpus text into Chinese + English sentence pairs, which are shown in FIG. 6;
the Chinese and English sentence pair enhanced training primary automatic problem solving model is unified and sorted based on the education field, and an enhanced automatic problem solving model is obtained;
the Chinese-translation-English translation test question related language material text in the education field comprises a question bank, a textbook and a language material which is similar to the education field and is filtered from general parallel language materials through a model.
The Chinese-translation-English test question of the embodiment includes a certain prompt given as shown in FIG. 1 to complement words (complete shape and fill in the blank) which need to be translated, or a part of scene prompts given as shown in FIG. 2 to translate a whole sentence (scene translation), so that the question type characteristics of the Chinese-translation-English test question aimed by the automatic question solving model training module of the Chinese-translation-English test question of the embodiment are enhanced and trained to improve the accuracy of automatic question solving of the Chinese-translation-English translation test question aimed at the education field.
As an optional implementation manner of this embodiment, the training module of the automatic problem solving model according to this embodiment, training corpora on a UniLM model based on a large-scale chinese-english text to obtain the automatic problem solving model, includes:
designing MASK tasks according to the question type characteristics of Chinese translation questions to convert the Chinese translation questions with various question types into a unified input and output format;
and continuously training the enhanced automatic problem solving model based on the MASK task to finally obtain the automatic problem solving model.
In language models, it is often necessary to predict the next word from the previous one, but if self attribute is to be applied in the LM or context information is used at the same time, a mask is needed to "mask" it in order not to reveal the tag information to be predicted. Different mask modes also correspond to the paper of the discourse. An NLU task is required to be carried out in the process of the automatic problem solving model, and the NLU task has the function of enabling a machine to accurately understand natural language generated by human beings. For the task of NLG, S1: source segment, S2: target segment, then the input is "[ SOS ] S1[ EOS ] S2[ EOS ]". We are also random mask some span as in pre-training, with the goal of maximizing the probability of token of our mask at a given context. It is noted that [ EOS ], while marks the end of the target sequence may also be masked, as this allows the model to learn when to generate [ EOS ] and thus mark the end of text generation.
Specifically, in the automatic question solving apparatus for chinese-translation-english translation test questions according to this embodiment, the automatic question solving model training module designs MASK tasks according to the question type characteristics of the chinese-translation-english translation test questions to convert the chinese-translation-english translation test questions of various question types into a unified input + output format includes:
the question type characteristic of the Chinese-English translation test question is that a certain prompt is given to complement the remaining words to be translated;
the MASK task converts the Chinese translation question into a unified input and output format:
[ sentence start identifier ] chinese sentence part [ sentence end identifier ] + english sentence part + [ MASK ] part [ sentence end identifier ];
wherein, the [ MASK ] part corresponds to the vacant position of the word needing to be completed in the English sentence in the Chinese translation question.
For the example Chinese-translation-English translation test question in FIG. 1, the format of the span mask task after conversion is: [ SOS ] I like a park with a swimming pool. [ EOS ] I love the park [ MASK ] [ MASK ] [ MASK ] A bathing pool [ EOS ].
Specifically, in the automatic problem solving device for the chinese-translation-english translation test questions in this embodiment, the automatic problem solving model training module designs a span mask task according to the question type characteristics of the chinese-translation-english translation test questions to convert the chinese-translation-english translation test questions of various question types into a unified input + output format includes:
the question type characteristic of the Chinese-English translation test question is that a part of scene prompts are given to translate the whole sentence;
the span mask task converts the Chinese-translation-English translation test questions into a unified input + output format:
[ sentence start identifier ] + [ question identification ] scene prompts chinese sentence part [ sentence end identifier ] + english sentence part [ sentence end identifier ].
For the example Chinese-translation-English translation test question in FIG. 2, the format of the span mask task after conversion is: [ SOS ] [ ASK ]. do you want to know which musician your friends like best? [ EOS ] Who is your favorite music? [ EOS ].
As an optional implementation manner of this embodiment, in the automatic problem solving apparatus for chinese-to-english translation test questions described in this embodiment, the text format processing unit performs text preprocessing on the text of the chinese-to-english translation test questions to obtain the test questions text in the uniform format includes:
scanning the Chinese-translation-English translation test question text to be solved to obtain an original OCR text;
carrying out structuralization processing on the original OCR text to obtain a structuralization OCR text;
constructing a unified MASK text based on the structured OCR text.
The automatic question solving device for the Chinese-translation-English translation test questions realizes automatic answer of the Chinese-translation-English translation test questions based on an automatic question solving model generated by training, and in the automatic answer process, the Chinese-translation-English translation test question texts need to be preprocessed to construct unified MASK texts so as to realize automatic answer by the automatic question solving model.
Further, in the automatic question solving device for the chinese-to-english translation test question according to this embodiment, search and decoding are performed on the masked characters in the encoded text based on the automatic question solving model, and search and decoding are performed by using a beam search (beam search) search algorithm in automatically generating the answer to the question solved for the chinese-to-english translation test question.
The embodiment also provides a storage medium, which stores a computer executable program, and when the computer executable program is executed, the automatic problem solving method for Chinese-translation-English translation test questions is realized.
The storage medium of this embodiment may comprise a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The embodiment also provides an electronic device, which comprises a processor and a memory, wherein the memory is used for storing a computer executable program, and when the computer program is executed by the processor, the processor executes the automatic problem solving method for the Chinese-translation-English translation test question.
The electronic device is in the form of a general purpose computing device. The processor can be one or more and can work together. The invention also does not exclude that distributed processing is performed, i.e. the processors may be distributed over different physical devices. The electronic device of the present invention is not limited to a single entity, and may be a sum of a plurality of entity devices.
The memory stores a computer executable program, typically machine readable code. The computer readable program may be executed by the processor to enable an electronic device to perform the method of the invention, or at least some of the steps of the method.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may also be non-volatile memory, such as read-only memory (ROM).
It should be understood that elements or components not shown in the above examples may also be included in the electronic device of the present invention. For example, some electronic devices further include a display unit such as a display screen, and some electronic devices further include a human-computer interaction element such as a button, a keyboard, and the like. Electronic devices are considered to be covered by the present invention as long as the electronic devices are capable of executing a computer-readable program in a memory to implement the method of the present invention or at least a part of the steps of the method.
From the above description of the embodiments, those skilled in the art will readily appreciate that the present invention can be implemented by hardware capable of executing a specific computer program, such as the system of the present invention, and electronic processing units, servers, clients, mobile phones, control units, processors, etc. included in the system. The invention may also be implemented by computer software for performing the method of the invention, e.g. control software executed by a microprocessor, an electronic control unit, a client, a server, etc. It should be noted that the computer software for executing the method of the present invention is not limited to be executed by one or a specific hardware entity, and can also be realized in a distributed manner by non-specific hardware. For computer software, the software product may be stored in a computer readable storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or may be distributed over a network, as long as it enables the electronic device to perform the method according to the present invention.
The above embodiments are only used for illustrating the invention and not for limiting the technical solutions described in the invention, and although the present invention has been described in detail in the present specification with reference to the above embodiments, the present invention is not limited to the above embodiments, and therefore, any modification or equivalent replacement of the present invention is made; all such modifications and variations are intended to be included herein within the scope of this disclosure and the appended claims.

Claims (10)

1. An automatic problem solving method for Chinese translation and English translation test questions is characterized by comprising the following steps:
training the speech aiming at the pre-training language model based on Chinese and English texts to obtain an automatic problem solving model;
performing text preprocessing on the Chinese-translation-English translation test question text to obtain a uniform format test question text;
coding the uniform format test question text based on the automatic question solving model to obtain a coded text, wherein the coded text contains filling and covering characters of a to-be-translated answer part in a corresponding Chinese-translation-English translation test question;
and searching and decoding the covered characters in the coded text based on the automatic question solving model, and automatically generating question solving answers of Chinese-translation-English translation test questions.
2. The method of claim 1, wherein the training of the language based on Chinese-English text against the pre-trained language model to obtain the automatic problem solving method comprises:
unifying large-scale general parallel corpora into Chinese + English sentence pairs;
and training a pre-training language model based on the sentence pairs of Chinese and English to obtain a primary automatic problem solving model.
3. The method of claim 2, wherein training a language against a pre-trained language model based on Chinese-English text to obtain an automatic problem solving model comprises:
obtaining a corpus text related to Chinese translation and English translation test questions in a set field, and uniformly arranging the corpus text into Chinese + English sentence pairs;
the sentence pairs of Chinese and English which are uniformly arranged based on the set field are used for enhancing the training primary automatic problem solving model to obtain an enhanced automatic problem solving model;
optionally, when the set field is an education field, the corpus text related to the Chinese-translation-English translation test question in the education field includes a question bank, a textbook and a corpus similar to the education field is filtered out from the general parallel corpus through the model.
4. The method of claim 3, wherein the training of the language material against the pre-trained language model based on Chinese-English text to obtain the automatic problem solving model comprises:
designing MASK tasks according to question type characteristics of Chinese translation question to convert Chinese translation question with various question types into a unified input and output format;
and continuously training the enhanced automatic problem solving model based on the MASK task to finally obtain the automatic problem solving model.
5. The method of claim 4, wherein the step of designing MASK tasks according to question type characteristics of Chinese translation and English translation questions to convert the Chinese translation and English translation questions into a unified input and output format comprises:
the question type characteristic of the Chinese-English translation test question is that a certain prompt is given to complement the remaining words to be translated;
the MASK task converts the Chinese translation question into a unified input and output format as follows:
[ sentence start identifier ] chinese sentence part [ sentence end identifier ] + english sentence part + [ MASK ] part [ sentence end identifier ];
wherein, the [ MASK ] part corresponds to the vacant position of the word needing to be completed in the English sentence in the Chinese translation question.
6. The method of claim 4, wherein the step of designing MASK tasks according to question type characteristics of Chinese translation and English translation questions to convert the Chinese translation and English translation questions into a unified input and output format comprises:
the question type characteristic of the Chinese-English translation test question is that a part of scene prompts are given to translate the whole sentence;
the MASK task converts the Chinese translation question into a unified input and output format:
[ sentence start identifier ] + [ question identification ] scene prompts chinese sentence part [ sentence end identifier ] + english sentence part [ sentence end identifier ].
7. The method for automatically solving the problems of Chinese-to-English translation test questions according to any one of claims 1 to 6, wherein the text preprocessing is performed on the Chinese-to-English translation test question text to obtain the test question text with the uniform format comprises:
scanning the Chinese-translation-English translation test question text to be solved to obtain an original OCR text;
carrying out structuralization processing on the original OCR text to obtain a structuralized OCR text;
constructing a unified MASK text based on the structured OCR text.
8. The method of claim 7, wherein the automatic problem solving method for Chinese-to-English translation test questions is characterized in that the search decoding is performed for the masked characters in the encoded text based on the automatic problem solving model, and the search decoding is performed by using a beam search in automatically generating the problem solving answers for Chinese-to-English translation test questions.
9. An automatic question solving device for Chinese-English translation test questions, comprising:
the automatic problem solving model training module is used for training the speech aiming at the pre-training language model based on large-scale Chinese and English texts to obtain an automatic problem solving model;
and an automatic problem solving module, the automatic problem solving module comprising:
the text format processing unit is used for carrying out text preprocessing on the Chinese-translation-English translation test question text to obtain a uniform format test question text;
the text coding processing unit is used for coding the uniform format test question text based on the automatic question solving model to obtain a coded text, and the coded text contains filling and covering characters of a part to be translated and solved in a corresponding Chinese-translation-English translation test question;
and the text decoding processing unit is used for searching and decoding the covered characters in the coded text based on the automatic question solving model and automatically generating answers to questions solved by Chinese-translation-English translation test questions.
10. A storage medium storing a computer-executable program, wherein the computer-executable program, when executed, implements a method for automatically solving the problems of chinese-to-english translation problems according to any one of claims 1 to 8.
CN202210515567.7A 2022-05-12 2022-05-12 Automatic question solving method, device and storage medium for Chinese-English translation test questions Pending CN114970569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210515567.7A CN114970569A (en) 2022-05-12 2022-05-12 Automatic question solving method, device and storage medium for Chinese-English translation test questions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210515567.7A CN114970569A (en) 2022-05-12 2022-05-12 Automatic question solving method, device and storage medium for Chinese-English translation test questions

Publications (1)

Publication Number Publication Date
CN114970569A true CN114970569A (en) 2022-08-30

Family

ID=82981218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210515567.7A Pending CN114970569A (en) 2022-05-12 2022-05-12 Automatic question solving method, device and storage medium for Chinese-English translation test questions

Country Status (1)

Country Link
CN (1) CN114970569A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190131A (en) * 2018-09-18 2019-01-11 北京工业大学 A kind of English word and its capital and small letter unified prediction based on neural machine translation
CN111274764A (en) * 2020-01-23 2020-06-12 北京百度网讯科技有限公司 Language generation method and device, computer equipment and storage medium
CN112559702A (en) * 2020-11-10 2021-03-26 西安理工大学 Transformer-based natural language problem generation method in civil construction information field
CN112699691A (en) * 2020-12-30 2021-04-23 北京百分点科技集团股份有限公司 Translation model generation method and device, readable storage medium and electronic equipment
CN113297841A (en) * 2021-05-24 2021-08-24 哈尔滨工业大学 Neural machine translation method based on pre-training double-word vectors
CN113536801A (en) * 2020-04-16 2021-10-22 北京金山数字娱乐科技有限公司 Reading understanding model training method and device and reading understanding method and device
CN113761944A (en) * 2021-05-20 2021-12-07 腾讯科技(深圳)有限公司 Corpus processing method, apparatus, device and storage medium for translation model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190131A (en) * 2018-09-18 2019-01-11 北京工业大学 A kind of English word and its capital and small letter unified prediction based on neural machine translation
CN111274764A (en) * 2020-01-23 2020-06-12 北京百度网讯科技有限公司 Language generation method and device, computer equipment and storage medium
CN113536801A (en) * 2020-04-16 2021-10-22 北京金山数字娱乐科技有限公司 Reading understanding model training method and device and reading understanding method and device
CN112559702A (en) * 2020-11-10 2021-03-26 西安理工大学 Transformer-based natural language problem generation method in civil construction information field
CN112699691A (en) * 2020-12-30 2021-04-23 北京百分点科技集团股份有限公司 Translation model generation method and device, readable storage medium and electronic equipment
CN113761944A (en) * 2021-05-20 2021-12-07 腾讯科技(深圳)有限公司 Corpus processing method, apparatus, device and storage medium for translation model
CN113297841A (en) * 2021-05-24 2021-08-24 哈尔滨工业大学 Neural machine translation method based on pre-training double-word vectors

Similar Documents

Publication Publication Date Title
Black et al. Statistically-driven computer grammars of English: The IBM/Lancaster approach
CN111414464B (en) Question generation method, device, equipment and storage medium
Guo et al. Fine-tuning by curriculum learning for non-autoregressive neural machine translation
CN106484682A (en) Based on the machine translation method of statistics, device and electronic equipment
CN112765345A (en) Text abstract automatic generation method and system fusing pre-training model
CN108287820A (en) A kind of generation method and device of text representation
CN110795552A (en) Training sample generation method and device, electronic equipment and storage medium
CN116596347B (en) Multi-disciplinary interaction teaching system and teaching method based on cloud platform
CN112463942A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN117453898B (en) Cross-modal question-answering processing method and device based on thinking chain
Yang et al. Hierarchical neural data synthesis for semantic parsing
CN114970569A (en) Automatic question solving method, device and storage medium for Chinese-English translation test questions
CN114328857A (en) Statement extension method, device and computer readable storage medium
CN114372140A (en) Layered conference abstract generation model training method, generation method and device
KR102395702B1 (en) Method for providing english education service using step-by-step expanding sentence structure unit
CN114625759A (en) Model training method, intelligent question answering method, device, medium, and program product
CN114139535A (en) Keyword sentence making method and device, computer equipment and readable medium
Murugathas et al. Domain specific question & answer generation in tamil
CN109918651B (en) Synonym part-of-speech template acquisition method and device
CN114462428A (en) Translation evaluation method and system, electronic device and readable storage medium
CN114297353A (en) Data processing method, device, storage medium and equipment
Tschichold et al. Intelligent CALL and written language
Yang et al. Analysis of AI MT based on fuzzy algorithm
Nittala et al. Speaker Diarization and BERT-Based Model for Question Set Generation from Video Lectures
CN115169332A (en) Automatic problem solving method and device for blank filling test problems of natural language subject and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination