CN118197111A

CN118197111A - Language online learning method and device, electronic equipment and storage medium

Info

Publication number: CN118197111A
Application number: CN202410268958.2A
Authority: CN
Inventors: 陈颖豪; 杨耀伟
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2024-03-08
Filing date: 2024-03-08
Publication date: 2024-06-14

Abstract

The application provides a language online learning method, a device, electronic equipment and a storage medium, relates to the field of education universe, and is used for improving the online language learning efficiency of users. The method comprises the following steps: generating a first dialogue voice based on historical dialogue content in a set exercise scene, wherein the first dialogue voice is used for conducting guiding questioning to a user; the first dialogue speech is output.

Description

Language online learning method and device, electronic equipment and storage medium

Technical Field

The application relates to the field of education universe, in particular to a language online learning method, a language online learning device, electronic equipment and a storage medium.

Background

The meta universe, a wide digital world consisting of countless 3D virtual spaces, has gradually become a new world for billions of users to interact, entertain, work and learn online. He breaks the real and virtual boundaries, allowing people to communicate freely in an immersive environment, exploring unlimited possibilities. Under such a background, the universe of educational elements has developed, bringing revolutionary changes to online language learning.

The educational element universe provides unprecedented immersive experience for language learning by utilizing sophisticated technologies such as virtual reality, augmented reality, artificial intelligence, and the like. The experience enables the user to interact with the native speaker in real time as if the user were in a real language environment, thereby understanding and mastering knowledge more deeply.

However, although the development of technology brings great convenience to education, some challenges should be faced in practical applications. For example, in the related art, virtual classroom education based on meta-universe requires a teacher to put a lot of effort to prepare course contents and even participate in the learning process of the user in the whole course, but the teacher often has limited effort, and it is difficult to put the teacher in the preparation of the course in the whole course and in the learning process of the student, which affects the learning efficiency of the user; meanwhile, the traditional online learning platform cannot provide timely and effective composition correction and examination feedback, so that the learning progress and enthusiasm of a user are affected. Therefore, how to provide a method for learning language online for users at any time and any place, and to improve the efficiency of learning language online for users, is a problem to be solved urgently.

Disclosure of Invention

The application provides a language online learning method, a device, electronic equipment and a storage medium, which are used for improving the efficiency of online language learning of a user.

In a first aspect, the present application provides a language online learning method, applied to a server, the method comprising: generating a first dialogue voice based on historical dialogue content in a set exercise scene, wherein the first dialogue voice is used for asking questions of a user; the first dialogue speech is output.

The technical scheme provided by the application has at least the following beneficial effects: firstly, the dialogue exercise is carried out in the setting exercise scene, a more real and practical language learning scene is provided for the user, the participation and immersion of the user are enhanced, and the interest and power of the user for language learning are improved; secondly, based on the historical dialogue content in the set exercise scene, the method generates and outputs the first dialogue voice to conduct guiding questioning on the user, can help the user to conduct targeted language exercise, guides the user to conduct thinking and answers by proposing specific questions, is helpful for helping the user to understand and master language knowledge points more deeply, and improves language application capability and spoken language expression capability; in summary, the application is helpful to improve the online language learning efficiency of the user through the interactive learning mode with the user.

As a possible implementation manner, the method includes: the historical dialogue content in the set exercise scene is input into a first model to obtain first text information; the first model is used for analyzing the historical dialogue content and determining questions posed to the user; the first text information is a question raised for a user; the first text information is converted into first dialogue speech.

As a possible implementation manner, the method further includes: receiving second dialogue voice to be recognized, wherein the second dialogue voice is a questioning voice sent by a user; recognizing the second dialogue voice to obtain a recognition result; generating a third dialogue voice based on the recognition result and the context content of the set exercise scene, wherein the third dialogue voice is used for replying to the second dialogue voice; and outputting the third dialogue voice.

As a possible implementation manner, the recognition result includes second text information corresponding to the second dialogue voice; the method comprises the following steps: inputting the second text information and the context content of the set exercise scene into a second model to obtain third text information; the third text information is used for replying to the second text information; the second model is used for analyzing the second text information and setting the context content of the exercise scene and determining third text information; and converting the third text information into third dialogue voice.

As a possible implementation manner, the method includes: generating a virtual portrait matched with the set exercise scene; transmitting configuration information of the virtual human figure to front-end equipment so that the front-end equipment displays the virtual human figure in a set training scene; the method comprises the following steps: the first dialogue voice is sent to the front-end equipment, so that the front-end equipment renders the virtual human figure based on the first dialogue voice, and the first dialogue voice is broadcasted by the virtual human figure.

As a possible implementation manner, the method further includes: receiving a test instruction, wherein the test instruction comprises the language of a test and the difficulty level of the test; inputting a test instruction into a third model to generate a test question, wherein the third model is used for determining the test question based on the language of the test and the difficulty level of the test; the test questions are sent to the front-end device so that the front-end device presents the test questions to the user.

As a possible implementation manner, the method further includes: receiving a composition text uploaded by a user, wherein the composition text is an answer text of a test question; and inputting the composition text into a fourth model to generate a correction result, wherein the fourth model is used for correcting the composition text.

In a second aspect, the present application provides an online learning apparatus for speech, the apparatus comprising: the generation module is used for generating first dialogue voice based on historical dialogue content in the set exercise scene, and the first dialogue voice is used for asking questions of a user; and the communication module is used for outputting the first dialogue voice.

As a possible implementation manner, the generating module is specifically configured to input the historical dialogue content in the set exercise scene into a first model to obtain first text information; the first model is used for analyzing the historical dialogue content and determining questions posed to the user; the first text information is a question raised for a user; the first text information is converted into first dialogue speech.

As a possible implementation manner, the communication module is further configured to receive a second dialogue voice to be recognized, where the second dialogue voice is a question voice sent by the user; the device further comprises: the recognition module is used for recognizing the second dialogue voice to obtain a recognition result; the generating module is further configured to generate a third dialogue voice based on the recognition result and the context content of the set exercise scene, where the third dialogue voice is used to answer the second dialogue voice; the communication module is further configured to output a third dialogue voice.

As a possible implementation manner, the recognition result includes second text information corresponding to the second dialogue voice; the generating module is further configured to input second text information and context content of a set exercise scene into a second model to obtain third text information; the third text information is used for replying to the second text information; the second model is used for analyzing the second text information and setting the context content of the exercise scene and determining third text information; and converting the third text information into third dialogue voice.

As a possible implementation manner, the generating module is further configured to generate a virtual portrait matched with the set exercise scene; the communication module is further used for sending configuration information of the virtual human figure to the front-end equipment so that the front-end equipment displays the virtual human figure in a set training scene; the first dialogue voice is sent to the front-end equipment, so that the front-end equipment renders the virtual human figure based on the first dialogue voice, and the first dialogue voice is broadcasted by the virtual human figure.

As a possible implementation manner, the communication module is further configured to receive a test instruction, where the test instruction includes a language of a test and a difficulty level of the test; the generation module is further used for inputting a test instruction into a third model to generate a test question, and the third model is used for determining the test question based on the language of the test and the difficulty level of the test; the communication module is further used for sending the test questions to the front-end equipment, so that the front-end equipment displays the test questions to the user.

As a possible implementation manner, the communication module is further configured to receive a composition text uploaded by a user, where the composition text is an answer text of a test question; the generating module is further configured to input the composition text into a fourth model, generate a correction result, and the fourth model is used for correcting the composition text.

In a third aspect, the present application provides an electronic device comprising a processor and a memory, the processor being coupled to the memory; the memory is used to store computer instructions that are loaded and executed by the processor to cause the computer apparatus to implement the language online learning method provided in the first aspect and any one of its possible implementations.

In a fourth aspect, the present application provides a computer readable storage medium having instructions stored therein which, when executed on a computer, cause the computer to perform any of the language online learning methods provided in the first aspect above.

The description of the second to fourth aspects of the present application may refer to the detailed description of the first aspect; also, the advantageous effects described in the second aspect to the fourth aspect may refer to the advantageous effect analysis of the first aspect, and are not described herein.

Drawings

FIG. 1 is a schematic diagram of a language online learning system according to some embodiments;

FIG. 2 is a flowchart of a method for online learning of a language according to some embodiments;

FIG. 3 is a flow chart diagram of a method for online learning of a language according to some embodiments;

FIG. 4 is a flowchart III of a method for online learning of a language according to some embodiments;

FIG. 5 is a flowchart four of a method for online learning of a language according to some embodiments;

FIG. 6 is a flowchart five of a method for online learning of a language according to some embodiments;

FIG. 7 is a flowchart six of a method for online learning of a language according to some embodiments;

FIG. 8 is a flow chart seven of a method of language online learning according to some embodiments;

FIG. 9 is a flowchart eight of a method for online learning of a language, according to some embodiments;

FIG. 10 is a flowchart nine of a method for online learning of a language according to some embodiments;

FIG. 11 is a flowchart of a method for online learning of a language, according to some embodiments;

FIG. 12 is a schematic diagram of a language online learning apparatus according to some embodiments;

FIG. 13 is a second schematic diagram of a language online learning device according to some embodiments;

Detailed Description

The call ticket data recording method provided by the application is described in detail below with reference to the accompanying drawings.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

The terms "first" and "second" and the like in the description and in the drawings are used for distinguishing between different objects or between different processes of the same object and not for describing a particular order of objects.

Furthermore, references to the terms "comprising" and "having" and any variations thereof in the description of the present application are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or apparatus.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more.

As background, the meta universe is a virtual digital world, consisting of multiple 3D virtual spaces, that can accommodate billions of users online, and support various forms of interaction, including social, entertainment, learning, work, and the like. The meta universe breaks the limit of reality and virtual, so that people can freely communicate in an immersive environment, and infinite possibility is explored. Under such a background, the universe of educational elements has developed, bringing revolutionary changes to online language learning.

Education universe, especially language study field, utilizes sophisticated techniques such as virtual reality, augmented reality and artificial intelligence, provides unprecedented immersive experience for language study. The experience enables the user to interact with the native speaker in real time as if the user were in a real language environment, thereby understanding and mastering knowledge more deeply. However, although the development of technology brings great convenience to education, some challenges remain in practical use.

First, the teacher needs to put a lot of effort in preparing course content and participating in the user learning process in the whole course, and the effort of the teacher is often limited, so that the teacher cannot provide sufficient guidance and feedback for each user. Second, for language class lessons, especially english, japanese, russian, german, french, spanish, etc., users need to improve writing through a large number of writing exercises. However, the conventional online learning platform often cannot provide timely and effective feedback for users, resulting in low learning efficiency. In addition, conventional language teaching platforms are inadequate to provide an immersive language learning scenario for users.

Therefore, how to provide a method for users to learn language online in a sinking manner at any time and any place under the background of the meta universe, and to improve the efficiency of online language learning of users, is a problem to be solved urgently.

Aiming at the technical problems, the embodiment of the application provides a language online learning method, which has the following ideas: generating a first dialogue voice based on historical dialogue content in a set exercise scene, wherein the first dialogue voice is used for asking questions of a user; the first dialogue speech is output. It can be understood that firstly, the dialogue exercise is performed in the setting exercise scene, so that a more real and practical language learning scene is provided for the user, the participation and immersion of the user are enhanced, and the interest and power of the user for language learning are improved; secondly, based on the historical dialogue content in the set exercise scene, the method generates and outputs the first dialogue voice to conduct guiding questioning on the user, can help the user to conduct targeted language exercise, guides the user to conduct thinking and answers by proposing specific questions, is helpful for helping the user to understand and master language knowledge points more deeply, and improves language application capability and spoken language expression capability; in summary, the application is helpful to improve the online language learning efficiency of the user through the interactive learning mode with the user.

The embodiments of the present application will be described in detail below with reference to the drawings attached to the specification.

Referring to fig. 1, a language online learning system is provided for an embodiment of the present application, the language online learning system including: a front-end device 100 and a server 200, wherein the front-end device 100 and the server 200 are communicatively connected. Front-end device 100 for human-machine interaction with a user.

In some embodiments, the front-end device 100 is configured to obtain behavior data, voice data, text data, etc. of a user, and send the obtained data to the server 200, where the obtained data is stored or processed by the server 200.

Illustratively, the front-end device 100 may obtain a question voice sent by a user, send the question voice sent by the user to the server 200, and perform processing such as storage or voice recognition by the server 200.

In some embodiments, the front-end device 100 is further configured to present exercise scenes, text data, image data, virtual human figures, play voice data, and the like to a user.

Illustratively, the front-end device 100 displays the training scene and the virtual person image to the user by receiving the configuration information of the training scene and the configuration information of the virtual person image transmitted from the server 200 and configuring the corresponding training scene and virtual person image according to the configuration information of the training scene and the configuration information of the virtual person image.

Illustratively, the front-end device 100 plays the voice for asking the user, which is transmitted by the receiving server 200, to the user.

In some embodiments, the headend device 100 may provide different login ports depending on the different types of users. In the embodiment of the application, the user of the language online learning system comprises a common user (i.e. a person to be taught or a student) and a teacher (i.e. a teacher), and the login port comprises a student end and a teacher end.

The student end is an entrance for a common user (i.e. a person to be taught or a student) to log in the language online learning system; the language online learning system logged in from the student terminal comprises a scene dialogue module and a test module.

The teacher end is an entry for a teacher to log in the language online learning system; the language online learning system logged in from the teacher end comprises a teaching management module; wherein the teaching management module can generate teaching practice scenes, modify scene dialogues, modify the result of the correction of the composition text, and the like.

In some embodiments, the headend device 100 may be a cell phone, tablet, desktop, laptop, handheld computer, notebook, ultra-mobile personal computer, UMPC, netbook, or the like. The embodiment of the present application is not particularly limited to the specific form of the front-end apparatus 100. The system can perform man-machine interaction with a user through one or more modes of a keyboard, a touch pad, a touch screen, a remote controller, voice interaction or handwriting equipment and the like.

A server 200 for processing and storing data transmitted from the head-end equipment 100.

The server 200 is capable of altering the composition text uploaded by the user, for example.

In some embodiments, the server 200 may be a single server, or may be a server cluster formed by a plurality of servers. In some implementations, the server cluster may also be a distributed cluster.

In some embodiments, server 200 includes an artificial intelligence center and storage.

Wherein the artificial intelligence center is configured with various models for processing data. For example, a generated artificial intelligence model of language text with a certain logic and consistency, a large-scale language model for understanding learning text data, and the like can be generated.

In some embodiments, the artificial intelligence center is further configured with a speech recognition function based on automatic speech recognition (automatic speech recognition, ASR) technology, a speech synthesis function based on speech-to-speech (TTS) technology, and an image-text recognition function based on optical character recognition (optical character recognition, OCR) technology, etc.

For example, the voice recognition function of the artificial intelligence center can recognize the questioning voice sent by the user to obtain a recognition result.

In some embodiments, the artificial intelligence center may be a processor, where the processor may be a central processing unit (central processing unit, CPU), an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present application, such as: one or more digital signal processors (DIGITAL SIGNAL processors, DSPs), or one or more field programmable gate arrays (field programmable GATE ARRAY, FPGAs).

In some embodiments, the artificial intelligence center may also be a controller, wherein the controller may be a central controller, a hardware controller, a data controller, an algorithm controller, or the like.

And the storage device is used for storing various voice data, text data, image data, virtual human figures and the like according to the embodiment of the application.

In some embodiments, the storage may be, but is not limited to, read-only memory (ROM) or other type of static storage device that can store static information and instructions, random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, but may also be, but is not limited to, electrically erasable programmable read-only memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-only memory, EEPROM), magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

It should be noted that, the system architecture described in the embodiments of the present application is for more clearly describing the technical solution of the embodiments of the present application, and does not constitute a limitation on the technical solution provided by the embodiments of the present application, and those skilled in the art can know that, along with the evolution of the system architecture, the technical solution provided by the embodiments of the present application is equally applicable to similar technical problems.

Referring to fig. 2, a language online learning method is provided for an embodiment of the present application, and the method may be applied to a server 200 in the language online learning system shown in fig. 1, and includes the following steps S101 to S102.

S101, generating a first dialogue voice based on historical dialogue content in a set exercise scene.

The first dialogue voice is used for asking questions to the user.

Setting a training scenario refers to a virtual environment created to achieve a specific learning goal and application skill.

By way of example, the set exercise scenario may be a hospital, campus, tourist attraction, restaurant, supermarket, book management, office, etc.

In some embodiments, the historical dialog content in the setting scene is dialog content based on the setting language. The set language may be English, korean, french, japanese, german, russian or other foreign language.

Illustratively, assuming that user a needs to learn a first foreign language, the historical dialog content in the setup scenario is based on the dialog content of the first foreign language.

In some embodiments, the server stores historical dialog content in the set exercise scenario in a storage device of the server; when the server needs to ask a question to the user, the server calls the historical dialogue content in the set exercise scene from the storage device, and then recognizes and analyzes the historical dialogue content in the set exercise scene to generate a first dialogue voice for asking the user.

S102, outputting the first dialogue voice.

In some embodiments, the step S102 may be implemented as: the server sends the first dialogue voice to the front-end equipment; accordingly, the front-end device receives the first dialogue voice, and then plays the first dialogue voice to the user through an audio device such as a speaker.

In some embodiments, after the server generates the first dialogue speech, the first dialogue speech may also be sent to a storage device of the server for storage.

It can be understood that the storage device of the server stores the first dialogue voice on one hand to backup the data to ensure the reliability of the data, namely, the data can be quickly recovered even if the data is damaged due to unexpected situations in the process of data transmission and processing, thereby avoiding the loss of the data; the first dialogue speech is stored on the other hand to provide a data source for subsequent data analysis and applications, such as recording the first dialogue speech and other dialogue content to form historical dialogue content for subsequent analysis.

In some embodiments, as shown in FIG. 3, the above step S101 is embodied as the following steps S1011-S1012.

S1011, inputting the history dialogue content in the set exercise scene into a first model to obtain first text information.

The first model is used for analyzing the historical dialogue content and determining the problem raised to the user; the first text information is a question posed to the user.

In some embodiments, the first model may be a large-scale language model; wherein the large-scale language model is trained based on a large amount of text data.

By way of example, the first model may be trained on web text, books, articles, forum discussions, chat conversations, etc., to learn statistical rules and patterns of language to enable understanding and generation of natural language text.

In some embodiments, after the server obtains the historical dialogue content in the setting exercise scene from the storage device, the historical dialogue content in the setting exercise scene is input into the first model to obtain the first text information.

S1012, converting the first text information into first dialogue speech.

In some embodiments, after the server determines the first text information, the first text information is converted to a first conversational speech by a speech synthesis function of the server.

It can be understood that the application asks questions to the user according to the history dialogue content in the set exercise scene and plays the questions to the user in the mode of the first dialogue voice, thereby providing an immersive dialogue exercise scene for the user, effectively improving the efficiency and effect of the online learning of the user and enhancing the learning experience and interest of the user.

In some embodiments, as shown in fig. 4, the method further includes the following steps S201 to S204 before the step S101.

S201, receiving second dialogue voice to be recognized, wherein the second dialogue voice is a question voice sent by a user.

In some embodiments, after a user selects a setting exercise scene through the front-end device, the server generates a guide problem corresponding to the setting exercise scene through a generated artificial intelligent model and then sends the guide problem to the front-end device; accordingly, the front-end device receives the guidance questions and presents them to the user via a display or the like.

For example, assuming that the setting exercise scenario selected by the user a is a campus, the server generates, through the generated artificial intelligence model, a guidance problem corresponding to the setting exercise scenario may be: the school activities of the school you are in are what, what the subjects you need to learn are, what course you like most, whether you can introduce me with your school, etc.

It should be noted that the generated artificial intelligence model is a machine learning model based on deep learning, and can generate new data similar to and not exactly the same as the original data by learning a large amount of data, specifically, by giving some initial conditions (such as noise vectors) as input, and then gradually generating new data by using a deep neural network and a probability model.

In some embodiments, after the headend device presents the guidance questions to the user, the user issues a question voice (i.e., a second dialogue voice) to the server using the set language based on the guidance questions.

Illustratively, assume that the set language selected by user A is a first foreign language, which user A uses to ask a server for a guide question.

In some embodiments, the server may further store a boot question corresponding to the set exercise scenario in a storage of the server, and when the front-end device needs to call the boot question, the front-end device calls from the storage of the server.

In some embodiments, after the user sends the second dialogue speech to the server, an audio device such as a microphone of the front-end device collects the second dialogue speech, and then sends the second dialogue speech to the server for recognition and storage.

S202, recognizing the second dialogue voice to obtain a recognition result.

The recognition result is an understandable text, and the recognition result comprises second text information corresponding to the second dialogue voice.

In some embodiments, after the server receives the second dialogue speech, the server recognizes a language type of the second dialogue speech, and then converts the second dialogue speech into the second text information (i.e., recognition result) through a speech recognition function of the server.

In some embodiments, after the server obtains the identification result, subsequent analysis and other processes may be performed on the identification result (for example, the following step S203 may be performed), or the identification result may be stored in a storage device of the server.

S203, generating a third dialogue voice based on the recognition result and the context content of the set exercise scene.

Wherein the third dialogue voice is used for replying to the second dialogue voice; context refers to background information, historical data, or environmental details, etc. related to what is being discussed or processed at the current time; in the embodiment of the application, the context content refers to historical dialogue content between a user and a server in setting exercise scenes.

As one possible implementation, the server may perform semantic analysis on the second text information (i.e., the recognition result) converted from the second speech and the context content of the set exercise scene, thereby generating a third dialogue speech for replying to the second dialogue speech.

S204, outputting a third dialogue voice.

In some embodiments, the step S204 may be implemented as: and sending the third dialogue voice to the front-end equipment, correspondingly, receiving the third dialogue voice by the front-end equipment, and playing the third dialogue voice to the user through audio equipment such as a loudspeaker.

In some embodiments, the historical dialog content in step S101 may include: second dialogue and third dialogue, so the above step S101 can be implemented as: the first dialogue speech is generated based on the second dialogue speech and the third dialogue speech.

Exemplary, the second text information corresponding to the second dialogue voice and the third text information corresponding to the third dialogue voice are input into the first model to obtain the first text information; the first text information is converted into a first conversational speech.

In some embodiments, as shown in fig. 5, the step S203 is specifically implemented as the following steps S2031 to S2032.

S2031, inputting the second text information and the context content of the set exercise scene into the second model to obtain third text information.

The third text information is used for replying to the second text information; the second model is used for analyzing the second text information and setting the context of the exercise scene and determining third text information.

It should be noted that the second model is trained based on a large-scale language model, and has strong language generation and context understanding capabilities.

And S2032, converting the third text information into third dialogue voice.

In some embodiments, after the server obtains the third text information, the third text information is converted into a third dialogue speech by a speech synthesis function of the server.

In some embodiments, after the server generates the third dialogue speech, the third dialogue speech may be stored, or the third dialogue speech may be sent to the front-end device, and accordingly, the front-end device receives the third dialogue speech and plays the third dialogue speech to the user through an audio device such as a speaker.

In some embodiments, as shown in fig. 6, before step S101, the language online learning method provided by the embodiment of the present application further includes the following steps S301 to S302.

S301, generating a virtual portrait matched with the set exercise scene.

In some embodiments, after the user selects the setting exercise scenario through the head-end equipment, the server may also generate a virtual portrait matching the setting exercise scenario through the generated artificial intelligence model.

For example, the virtual human figure may be a character figure that may exist in the setting scene; assuming that the user selected setting exercise scenario is a campus, the virtual human figure generated by the server may be a student or teacher, etc.

In some embodiments, the server is further capable of generating a scene picture corresponding to the set scene from the set exercise scene through the generated artificial intelligence model.

For example, assuming that the exercise scenario is set as a campus, the server may generate a picture of the scenario within the campus.

S302, sending configuration information of the virtual human figure to front-end equipment so that the front-end equipment displays the virtual human figure in a set training scene.

In some embodiments, after the server generates the virtual human figure, the virtual human figure may be stored for subsequent recall, or configuration information of the virtual human figure may be sent to the front-end device, and accordingly, the front-end device receives the configuration information of the virtual human figure and configures the virtual human figure according to the configuration information of the virtual human figure; after the configuration is completed, the virtual human image is displayed to the user through a display and other devices.

In some embodiments, the step S01 may be further specifically implemented as: transmitting the first dialogue voice to the front-end equipment; accordingly, the front-end equipment receives the first dialogue voice, renders the virtual human figure based on the first dialogue voice, and broadcasts the first dialogue voice by the virtual human figure.

In some embodiments, in the above process, the front-end device is able to render the set-up exercise scene as well as the virtual portrait based on techniques such as the web three-dimensional engine babylon.

The above steps are described below by way of example for ease of understanding.

Illustratively, as shown in fig. 7, taking an example of a user logging in a language online learning system from a teacher side, the above steps are specifically implemented as the following steps a1 to a8.

Step a1, a user selects and sets a training scene at a teacher end.

And a step a2, the server receives the set exercise scene sent by the front-end equipment, and inputs the set exercise scene into the generated artificial intelligent model to generate the virtual human image, the scene picture corresponding to the set exercise scene and the first guide problem corresponding to the set exercise scene.

Step a3, the server converts the first guiding problem into first guiding voice, and stores the virtual human figure, the scene picture corresponding to the set exercise scene, the first guiding problem and the first guiding voice in the storage device.

And a4, a user checks the virtual human image at the teacher end, sets a scene picture corresponding to the exercise scene and a first guiding problem, and determines whether the first guiding problem is modified.

And a step a5, under the condition that the first guiding problem needs to be modified, the user sends a modification instruction to the server through the front-end equipment.

And a step a6, the server receives the modification instruction and modifies the first guide problem to obtain a second guide problem.

Step a7, the server converts the second guiding problem into second guiding voice and stores the second guiding voice.

And a step a8, the server sends the virtual human figure, the scene picture corresponding to the set exercise scene, the second guiding problem and the second guiding voice to the front-end equipment, so that the front-end equipment can render the virtual human figure, set the exercise scene and the scene picture based on the web three-dimensional engine babylon. Js and other technologies, and display the virtual human figure, the set exercise scene and the scene picture to a user through a display and other equipment.

For example, as shown in fig. 8, taking an online learning system for logging in a language from a student side as an example, the above steps are embodied as the following steps b1 to b8.

Step b1, a user selects and sets a training scene at the student end.

And b2, the server receives the set exercise scene sent by the front-end equipment, and acquires the virtual person image, the scene picture corresponding to the set exercise scene and the guide problem corresponding to the set exercise scene.

For example, the server inputs the set exercise scene into the generated artificial intelligent model to generate the virtual human figure, the scene picture corresponding to the set exercise scene and the guide problem corresponding to the set exercise scene.

For example, the virtual portrait corresponding to the setting exercise scene, the scene picture corresponding to the setting exercise scene, and the guiding problem corresponding to the setting exercise scene are stored in the storage device of the server, and may be obtained from the storage device.

And b3, the server sends the virtual person image, the scene picture corresponding to the set exercise scene and the guide problem corresponding to the set exercise scene to the front-end equipment, so that the front-end equipment renders the virtual person image, the set exercise scene and the scene picture based on the web three-dimensional engine babylon. Js and the like and displays the virtual person image, the set exercise scene and the scene picture to a user through the display and other equipment.

In some embodiments, the user sends a question voice, namely a second dialogue voice, to the server according to the guide question corresponding to the setting scene displayed by the front-end equipment.

And b4, the server receives the second dialogue voice to be recognized.

And b5, the server recognizes the second dialogue voice to obtain a recognition result.

Step b6, the server generates a third dialogue voice based on the recognition result and the context content of the set exercise scene, wherein the third dialogue voice is used for replying to the second dialogue voice.

In some embodiments, the historical dialog content in the setting exercise scene may be formed based on at least one of the second dialog voice and the third dialog voice.

And b7, the server generates a first dialogue voice based on the history dialogue content in the set exercise scene.

Step b8, the server sends the first dialogue voice to the front-end equipment.

In some embodiments, in the above process, the front-end device may render the virtual human figure, set the exercise scene, and scene pictures in real time based on techniques such as web three-dimensional engine babylon.

It can be understood that the application provides a language online learning method, which can generate dialogue voice by simulating a real dialogue scene and combining a history dialogue, thereby providing an immersive language learning environment for a user, simultaneously, the user can perform language training anytime and anywhere without depending on other people in the process of language learning, and effectively improving the efficiency of learning language.

In some embodiments, as shown in fig. 9, the language online learning method provided by the embodiment of the present application further includes the following steps S401 to S403.

S401, receiving a test instruction sent by a user.

The test instruction comprises the language of the test and the difficulty level of the test.

In some embodiments, the language of the test includes English, japanese, korean, french, german, russian, etc.; the difficulty level of the test comprises a first level, a second level, a third level and the like, wherein the difficulty level of the test of the first level, the second level and the third level is sequentially increased.

In some embodiments, after the user selects the language of the test and the difficulty level of the test in the front-end device, the front-end device sends the test instruction sent by the user to the server.

S402, inputting a test instruction into the third model to generate a test question.

The third model is used for determining test questions based on the language of the test and the difficulty level of the test.

In some embodiments, after receiving the test instruction sent by the front-end device, the server inputs the test instruction into the third model to randomly generate a test question corresponding to the test instruction.

S403, sending the test questions to the front-end equipment so that the front-end equipment displays the test questions to the user.

In some embodiments, after the server generates the test question, the test question may be stored in the storage device of the server, or may be sent to the front-end device, and accordingly, the front-end device receives the test question and displays the test question to the user through a display device or other devices.

In some embodiments, as shown in fig. 10, after the step S403, the language online learning method provided by the present application further includes the following steps S501 to S502.

S501, receiving a composition text uploaded by a user. The composition text is the answer text of the test questions.

In some embodiments, after the head-end device presents the quiz questions to the user, the user answers the quiz questions and then uploads them through the head-end device.

In some embodiments, the user may upload the composition text in a variety of ways, such as a photo upload or online answering.

In some embodiments, if the user uploads the composition text by photographing, the server receives a picture of the composition text. In this case, the server needs to recognize the picture of the composition text by the image text recognition function, thereby converting the words in the picture of the composition text into a text format that can be edited and understood, and thereby obtaining the composition text.

In some embodiments, if the user selects to input the composition text in an online answer mode, the composition text is sent to the server in a text form, the server does not need to recognize through an image text recognition function, and the server can directly acquire and process the composition text.

In some embodiments, after the server obtains the composition text, the server may store or post-process the composition text.

S502, inputting the composition text into a fourth model to generate a modification result.

The fourth model is used for correcting the text of the composition; the correction results comprise composition error endorsements, composition evaluations and example templates.

In some embodiments, after the server obtains the composition text, the composition text may be input into a fourth model to obtain a composition error annotation, a composition evaluation, and text of the example template, and then the composition evaluation and the text of the example template may be converted to speech by a speech synthesis function.

In some embodiments, after the server corrects the composition, the correction result of the composition and the voice corresponding to the correction result can be stored in the storage device, so that the user can modify and evaluate the correction result.

For ease of understanding, the language online learning method described above is described below by way of example.

For example, as shown in fig. 11, in the case where the user transmits a test instruction to the server through the head-end apparatus, the above method is embodied as the following steps c1 to c7.

Step c1, the server receives a test instruction sent by a user.

And c2, inputting a test instruction into a third model by the server to generate a test question, wherein the third model is used for determining the test question based on the language of the test and the difficulty level of the test.

And step c3, the server sends the test questions to the front-end equipment so that the front-end equipment displays the test questions to the user.

In some embodiments, after the headend device presents the quiz questions to the user, the user answers the quiz questions and uploads the composition text.

And c4, the server receives the composition text uploaded by the user, wherein the composition text is an answer text of the test question.

And c5, inputting the composition text into a fourth model by the server to generate a correction result, wherein the fourth model is used for correcting the composition text.

And step c6, the server stores the correction result of the composition text and the voice corresponding to the correction result.

In some embodiments, after the server stores the correction result of the composition text and the voice corresponding to the correction result, the teacher may check the correction result of the composition text through the teacher end and modify the correction result.

And c7, the server sends the correction result of the composition text and the voice corresponding to the correction result to the front-end equipment, and meanwhile, the front-end equipment can render the virtual human figure, set the exercise scene and the scene picture based on the web three-dimensional engine babylon. Js and the like.

It can be understood that the application realizes the rapid question setting of test questions and the timely correction of composition texts in multiple languages through the third model and the fourth model, so that a user can perform composition training anytime and anywhere, thereby being beneficial to the rapid improvement of the writing capability of the languages and the improvement of the learning efficiency of the languages.

It can be seen that the foregoing description of the solution provided by the embodiments of the present application has been presented mainly from a method perspective. To achieve the above-mentioned functions, embodiments of the present application provide corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application can divide the functional modules of the network node according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. Optionally, the division of the modules in the embodiment of the present application is schematic, which is merely a logic function division, and other division manners may be implemented in practice.

Referring to fig. 12, a schematic structural diagram of a language online learning device according to an embodiment of the present application is provided. The modules in the device shown in fig. 12 have the functions of implementing the steps in fig. 2 to 11, and achieve the corresponding technical effects. As shown in fig. 12, the language online learning apparatus 1200 may include: a generation module 1201 and a communication module 1202.

A generating module 1201, configured to generate a first dialogue voice based on the historical dialogue content in the setting exercise scene, where the first dialogue voice is used to question the user.

The communication module 1202 is configured to output a first dialogue voice.

In some embodiments, the generating module 1201 is specifically configured to input the historical dialogue content in the set exercise scene into a first model to obtain first text information; the first model is used for analyzing the historical dialogue content and determining questions posed to the user; the first text information is a question raised for a user; the first text information is converted into first dialogue speech.

In some embodiments, the communication module 1202 is configured to receive a second dialogue voice to be recognized, where the second dialogue voice is a questioning voice sent by the user; the device further comprises: the recognition module 1203 is configured to recognize the second dialogue speech to obtain a recognition result; the generating module 1201 is further configured to generate a third dialogue voice based on the recognition result and the context content of the set exercise scene, where the third dialogue voice is used to answer the second dialogue voice; the communication module 1202 is further configured to output a third dialogue voice.

In some embodiments, the recognition result includes second text information corresponding to a second dialog voice; the generating module 1201 is further configured to input the second text information and the context content of the set exercise scene into the second model to obtain third text information; the third text information is used for replying to the second text information; the second model is used for analyzing the second text information and setting the context content of the exercise scene and determining third text information; and converting the third text information into third dialogue voice.

In some embodiments, the generating module 1201 is further configured to generate a virtual portrait matching the set exercise scene; the communication module 1202 is further configured to send configuration information of the virtual human figure to the front-end device, so that the front-end device displays the virtual human figure in a set exercise scene; the first dialogue voice is sent to the front-end equipment, so that the front-end equipment renders the virtual human figure based on the first dialogue voice, and the first dialogue voice is broadcasted by the virtual human figure.

In some embodiments, the communication module 1202 is further configured to receive a test instruction, where the test instruction includes a language of a test and a difficulty level of the test; the generating module 1201 is further configured to input a test instruction into a third model, and generate a test question, where the third model is used to determine the test question based on the language of the test and the difficulty level of the test; the communication module 1202 is further configured to send the test question to the front-end device, so that the front-end device displays the test question to the user.

In some embodiments, the communication module 1202 is further configured to receive a composition text uploaded by a user, where the composition text is an answer text of a test question; the generating module 1201 is further configured to input the composition text into a fourth model, and generate a correction result, where the fourth model is used to correct the composition text.

In the case of implementing the functions of the integrated modules in the form of hardware, the embodiment of the present invention provides another possible structural schematic diagram of the language online learning device related to the embodiment. As shown in fig. 13, the language online learning apparatus 1300 includes: a processor 1302, a communication interface 1303, and a bus 1304. Optionally, the language online learning device may further include a memory 1301.

The processor 1302, which may be a processor, implements or executes various illustrative logical blocks, modules, and circuits described in connection with the present disclosure. The processor 1302 may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 1302 may also be a combination of computing functions, e.g., including one or more microprocessors, a combination of a DSP and a microprocessor, and the like.

A communication interface 1303 for connecting with other devices through a communication network. The communication network may be an ethernet, a radio access network, a wireless local area network (wireless local area networks, WLAN), etc.

Memory 1301, which may be, but is not limited to, read-only memory (ROM) or other type of static storage device that may store static information and instructions, random access memory (random access memory, RAM) or other type of dynamic storage device that may store information and instructions, or electrically erasable programmable read-only memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-only memory, EEPROM), magnetic disk storage or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

As a possible implementation, the memory 1301 may exist separately from the processor 1302, and the memory 1301 may be connected to the processor 1302 by a bus 1304 for storing instructions or program code. The processor 1302, when calling and executing instructions or program code stored in the memory 1301, can implement the language online learning method provided by the embodiment of the present invention.

In another possible implementation, the memory 1301 may also be integrated with the processor 1302.

Bus 1304, which may be an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The bus 1304 may be classified as an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 13, but not only one bus or one type of bus.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the language online learning device is divided into different functional modules, so as to perform all or part of the functions described above.

The embodiment of the application also provides a computer readable storage medium. All or part of the flow in the above method embodiments may be implemented by computer instructions to instruct related hardware, and the program may be stored in the above computer readable storage medium, and the program may include the flow in the above method embodiments when executed. The computer readable storage medium may be any of the foregoing embodiments or memory. The computer-readable storage medium may be an external storage device of the language online learning apparatus, for example, a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided in the language online learning apparatus. Further, the above-mentioned computer-readable storage medium may further include both the internal storage unit and the external storage device of the above-mentioned language online learning apparatus. The computer-readable storage medium is used for storing the computer program and other programs and data required by the language online learning device. The above-described computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

Embodiments of the present application also provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform any of the language online learning methods provided in the above embodiments.

The present application is not limited to the above embodiments, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method for online learning of a language, applied to a server, the method comprising:

generating a first dialogue voice based on historical dialogue content in a set exercise scene, wherein the first dialogue voice is used for asking questions of a user;

And outputting the first dialogue voice.

2. The method of claim 1, wherein generating the first dialogue speech based on the historical dialogue content in the set-up exercise scenario comprises:

Inputting the history dialogue content in the set exercise scene into a first model to obtain first text information; the first model is used for analyzing historical dialogue content and determining questions posed to a user; the first text information is a question raised to a user;

and converting the first text information into the first dialogue voice.

3. The method of claim 1, wherein prior to generating the first dialogue speech based on historical dialogue content in the set-up exercise scenario, the method further comprises:

Receiving second dialogue voice to be recognized, wherein the second dialogue voice is a question voice sent by a user;

identifying the second dialogue voice to obtain an identification result;

Generating a third dialogue voice based on the recognition result and the context content of the set exercise scene, wherein the third dialogue voice is used for replying to the second dialogue voice;

And outputting the third dialogue voice.

4. The method of claim 3, wherein the recognition result includes a second text message corresponding to the second dialogue voice; the generating the third dialogue speech based on the recognition result and the context content of the set exercise scene includes:

Inputting the second text information and the context content of the set exercise scene into a second model to obtain third text information; the second model is used for analyzing the second text information and the context content of the set exercise scene and determining the third text information for replying to the second text information;

and converting the third text information into the third dialogue voice.

5. The method according to claim 1, wherein the method further comprises:

Generating a virtual portrait matched with the set exercise scene;

Transmitting configuration information of the virtual human figure to front-end equipment so that the front-end equipment displays the virtual human figure in the set exercise scene;

The outputting the first dialogue speech includes:

and sending the first dialogue voice to the front-end equipment, so that the front-end equipment renders the virtual human figure based on the first dialogue voice, and the first dialogue voice is broadcasted by the virtual human figure.

6. The method according to claim 1, wherein the method further comprises:

receiving a test instruction, wherein the test instruction comprises the language of a test and the difficulty level of the test;

Inputting the test instruction into a third model to generate a test question, wherein the third model is used for determining the test question based on the language of the test and the difficulty level of the test;

and sending the test questions to front-end equipment so that the front-end equipment displays the test questions to a user.

7. The method of claim 6, wherein the method further comprises:

Receiving a composition text uploaded by a user, wherein the composition text is an answer text of the test question;

and inputting the composition text into a fourth model to generate a correction result, wherein the fourth model is used for correcting the composition text.

8. An online language learning device, the device comprising:

The generation module is used for generating first dialogue voice based on historical dialogue content in a set exercise scene, and the first dialogue voice is used for asking questions of a user;

And the communication module is used for outputting the first dialogue voice.

9. An electronic device comprising a processor and a memory, the processor coupled to the memory; the memory is for storing computer instructions that are loaded and executed by the processor to cause a computer device to implement the language online learning method of any one of claims 1 to 7.

10. A computer-readable storage medium comprising computer-executable instructions that, when run on a computer, cause the computer to perform the language online learning method of any one of claims 1 to 7.