CN117615182B

CN117615182B - Live broadcast interaction dynamic switching method, system and terminal

Info

Publication number: CN117615182B
Application number: CN202410091243.4A
Authority: CN
Inventors: 周雪松; 孙政
Original assignee: Jiangsu Oudi Electronic Technology Co ltd
Current assignee: Jiangsu Oudi Electronic Technology Co ltd
Priority date: 2024-01-23
Filing date: 2024-01-23
Publication date: 2024-04-26
Anticipated expiration: 2044-01-23
Also published as: CN117615182A

Abstract

The application belongs to the field of teaching, and discloses a live interaction dynamic switching method, which specifically comprises the following steps: and acquiring the number of terminals, terminal identity identifiers, and user expressions, user actions and user voices which are input by each terminal currently in online video interaction. And analyzing the user expression, the user action and the user voice, and determining the semantics. And labeling the semantics, and determining semantic labels of multiple dimensions. And inputting the semantic tags into a neural network model to determine the current interaction mode. The method helps teachers pay no extra effort to pay attention to the states of students in the live broadcast process, and can timely adjust the classroom modes according to the provided interaction modes so as to mobilize the classroom enthusiasm of the students.

Description

Live broadcast interaction dynamic switching method, system and terminal

Technical Field

The application belongs to the field of live broadcast teaching, and particularly relates to a live broadcast interaction dynamic switching method, a live broadcast interaction dynamic switching system and a live broadcast interaction dynamic switching terminal.

Background

With the development of live broadcasting technology, the teaching environment and the teaching mode of teachers are correspondingly changed. The learning in the live classroom is more interactive, the teaching of the knowledge concept is more imaging, and the teaching mode is more personalized.

Although live broadcasting classroom improves the interactive diversity of classroom to a certain extent, because teacher and student can not face-to-face exchange during live broadcasting teaching, current live broadcasting teaching still can not carry out real-time monitoring to student's classroom state moreover, and the teacher only focuses on own teaching content most of the time, can not in time discover student's state of having lesson to the interactive mode when adjusting oneself live broadcasting teaching. Based on the current situation, students are more prone to tiring in live broadcasting class, so that class efficiency is low and class enthusiasm is low. Therefore, how to timely adjust the interaction mode of teachers and students during online live broadcast teaching and to mobilize the enthusiasm of students in class is a technical problem to be solved urgently.

Disclosure of Invention

The application aims to provide a live interaction dynamic switching method which can help teachers to timely adjust interaction modes with students and mobilize enthusiasm of students in class.

In a first aspect, an embodiment of the present application provides a live interaction dynamic switching method, where the method includes:

acquiring the number of terminals, terminal identity identifiers, and user expressions, user actions and user voices which are input by each terminal currently in online video interaction;

analyzing the user expression, the user action and the user voice, and determining the semantics;

Labeling the semantics, and determining semantic labels with multiple dimensions;

And inputting the semantic tags into a neural network model to determine the current interaction mode.

In an implementation embodiment, the analyzing the user expression, the user action, and the user voice, and determining the semantics includes:

According to a preset face information base, combining face information in the online video, determining the position of each terminal user, and capturing human body actions in the position area of each user to obtain action semantics of each user; the human body motion includes lip motion dynamics, facial dynamics, and torso motion;

Determining the voice sent by each user according to the lip motion dynamics intercepted in the position area where each user is positioned and the audio frequency in the online video, and obtaining the voice semantics of each user;

And determining the facial expression of each user according to the facial dynamics intercepted in the position area of each user, and combining the facial expression analysis model to obtain the emotion semantics of each user.

In an implementation embodiment, the determining, according to the lip motion dynamics intercepted in the location area where each user is located and the audio in the online video, the voice sent by each user, and obtaining the voice semantics of each user include:

Determining estimated voice corresponding to each user lip motion dynamic state according to a preset lip motion dynamic library;

Analyzing the audio in the online video to generate a plurality of segments of actual voice;

And matching each estimated voice with the actual voice, determining the actual voice corresponding to each lip action, and taking the actual voice corresponding to each lip action as the voice semantic of the corresponding user.

In an implementation embodiment, the determining the facial expression of each user according to the facial dynamics intercepted in the location area of each user, and combining the facial expression analysis model to obtain the emotion semantics of each user includes:

determining the estimated facial expression corresponding to each user according to a preset facial expression dynamic library;

extracting facial feature points of the estimated facial expression and the actual facial expression, and determining the matching degree of the estimated facial expression and the actual facial expression according to the distance between the feature points;

and taking the facial expression with the highest matching degree as emotion semantics of the corresponding user.

In an implementation embodiment, the labeling the semantics, determining the semantic tags for the plurality of dimensions includes:

Determining a current semantic tag of a user according to a preset semantic tag system, emotion semantics, action voice and voice semantics of the user;

and determining semantic tags on different dimensions of the user according to the current semantic tag of the user.

In an achievable embodiment, the method further comprises:

acquiring a history live broadcast watching interaction mode, a user terminal test result and history live broadcast watching times, and determining interaction feedback information characteristics;

And correcting the interaction mode by utilizing the interaction feedback information characteristics.

In an implementation embodiment, the correcting the interaction mode by using the interaction feedback information includes:

Determining an interaction score according to the interaction feedback information characteristics;

analyzing the interaction score and the user terminal test score to generate a direct proportion relation between the interaction score and the user terminal test score;

And correcting the interaction mode according to the proportional relation.

In a second aspect, the present application further provides a live interaction dynamic switching system, where the system has a function of implementing the method in the first aspect or any possible implementation manner thereof. In particular, the system comprises means for implementing the method of the first aspect or any possible implementation thereof.

In one embodiment thereof, the system comprises:

The acquisition module is used for acquiring the number of terminals of the current online video interaction, the terminal identity and the user expression, the user action and the user voice which are input by each terminal at present;

the analysis module is used for analyzing the user expression, the user action and the user voice and determining the semantics;

The definition module is used for marking the semantics and determining semantic tags of multiple dimensions;

And the output module is used for dynamically inputting the semantic tags into the neural network model to determine the current interaction mode.

In a third aspect, the present application also provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method of any one of the preceding claims when executing the computer program.

In a fourth aspect, the present application also provides a computer readable storage medium storing a computer program which, when executed by a processor, implements a method as described in one of the above.

In a fifth aspect, the application also provides a computer program product for causing an electronic device to perform the method of any one of the implementations of the first aspect described above when the computer program product is run on the electronic device.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

According to the method, all user semantics are determined by analyzing the expressions, actions and voices of all terminal users, all semantics are classified according to preset semantic labels, the model is trained by utilizing the semantic labels finally, and the proper interaction modes of different time periods in live broadcasting are determined.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an application environment diagram of a live interaction dynamic switching method provided by the application;

fig. 2 is a schematic flow chart of a live interaction dynamic switching method provided by the application;

FIG. 3 is a schematic flow chart of determining semantics according to user expressions, actions and voices;

FIG. 4 is a schematic flow chart of determining the speech semantics of each user according to the present application;

FIG. 5 is a schematic flow chart for determining emotion semantics of each user according to the present application;

FIG. 6 is a schematic diagram of the actual facial expression flow of student A provided by the present application;

FIG. 7 is a schematic diagram of a flow of estimated facial expressions of student A according to the present application;

FIG. 8 is a schematic flow chart of a semantic tag for determining multiple dimensions according to the present application;

fig. 9 is another flow chart of a live interaction dynamic switching method provided by the application;

FIG. 10 is a flow chart of the method for correcting an interaction mode by using interaction feedback information according to the present application;

fig. 11 is a schematic structural diagram of a live interaction dynamic switching method system provided by the application;

fig. 12 is a schematic structural diagram of an electronic device provided by the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

It should be understood that, the sequence number of each step in this embodiment does not mean the execution sequence, and the execution sequence of each process should be determined by its function and internal logic, and should not limit the implementation process of the embodiment of the present application in any way.

Fig. 1 is an application environment diagram of a live interaction dynamic switching method provided by the application, which can be used for adjusting an interaction mode of a classroom in real time, and as shown in fig. 1, students learn through live software. In the existing live broadcast teaching, as students are more in number and teachers and students do not conduct face-to-face teaching, the teachers cannot monitor the learning state of the students every moment, meanwhile, the students are more easy to be tired, do small actions or get in mind in the teaching process due to lack of on-site constraint of the teachers, and the students and teachers can only interact through an evaluation area or a praise button, so that the effect of the live broadcast teaching is often unsatisfactory. According to the application, the current optimal interaction mode is determined by acquiring and analyzing the classroom state of the client student or teacher. According to the method, teachers can timely adjust classroom interaction modes, and the method can mobilize the liveness of students in the classroom and finally help the students to improve the learning score.

The following description will be made with reference to a specific scheme of the present application, and fig. 2 is a schematic flow chart of a live interaction dynamic switching method provided by the present application.

S201, the number of terminals of the current online video interaction, terminal identification and user expression, user action and user voice input by each terminal are obtained.

S202, analyzing the user expression, the user action and the user voice, and determining the semantics.

S203, labeling the semantics, and determining semantic labels with multiple dimensions.

S204, inputting the semantic tags into a neural network model, and determining the current interaction mode.

In the application, the number of terminals for online video interaction is the number of people participating in live broadcasting class.

By way of example, according to the current live broadcasting room information, the total number of watching people in the live broadcasting room can be obtained, the teaching live broadcasting room can generally distinguish a teacher account and a student account according to the information of a registrant, and when in live broadcasting class, the expression, action and voice of the student can be obtained through a camera and a microphone. Through the expression, action and voice of the student, the semantics of the user can be determined, for example, when the eyebrow tattooing of the student A is detected, the student A is stated to have a certain question on the current explanation content or to be dissatisfied with the current explanation content. Or when detecting the action of the student B as a hand lift, explaining that the student B currently wants to carry out interactive questioning with a teacher, after obtaining all the expressions, actions and voices of the student, determining the semantics, classifying the semantics into the interest of the student, the uninteresting of the student, the teaching of the student or the class small action of the student, and labeling the semantics of the student according to the interest of the student, the uninteresting duration of the student, the teaching duration of the student, the class small action of the student and the duration of the whole live broadcasting classroom, and judging the participation degree of the student, the activity degree of the student, the emotion of the student and the answer relativity of the student. All semantic tags are used as input and input into a mixed model of a cyclic neural network (RNN) and a Convolutional Neural Network (CNN), and an interaction mode of a current live broadcasting room is determined, for example, when students generate questions, teachers can be reminded to inquire about whether the students need to explain the part of content again, and when students propose interaction requests, the teachers can be reminded whether one-to-one interaction is performed.

By the method, not only can the teaching states of students be monitored in real time, but also teachers can be reminded of the most appropriate interaction mode of the class at the moment, so that the teachers can timely change the interaction mode of the class, and the activity of the students in the class can be further mobilized.

In one embodiment of the application, the speech features include actions and speech; as shown in fig. 3, the analyzing the user expression, the user action and the user voice, determining the semantics includes:

S301, determining the position of each terminal user according to a preset face information base and combining face information in the online video, and intercepting human body actions in the position area of each user to obtain action semantics of each user; the human body movements include lip movement dynamics, facial dynamics, and torso movements.

S302, determining the voice sent by each user according to the lip motion dynamics intercepted in the position area of each user and the audio in the online video, and obtaining the voice semantics of each user.

S303, determining the facial expression of each user according to the facial dynamics intercepted in the position area of each user, and combining the facial expression analysis model to obtain the emotion semantics of each user.

By way of example, face information in a living broadcast room of a math living broadcast class is firstly determined, identity information of a teacher and students is determined by combining a preset face information base, after the identity information of all people is obtained, human body actions of the teacher and human body actions of the students are intercepted, the teacher gives lessons, namely trunk actions, facial dynamics and lip actions, and voice semantics, action semantics and expression semantics of the teacher can be determined by combining the obtained audio. At this time, if the student has not only trunk action, facial dynamics but also lip action, the voice information of the student can be determined by combining the acquired audio, and the voice semantics, action semantics and expression semantics of the student are obtained.

Specifically, when a student has a hand-lifting action in a live broadcasting classroom, the action is analyzed, and the expressed semantics are the request for interaction with a teacher, so that the hand-lifting action semantics are the request for interaction with the part of interest which is explained by the teacher, and at the moment, the teacher can judge whether to interact with the student according to related prompts. The expression semantics of the student are dynamically determined according to the facial expressions of the student in the live broadcasting classroom, and the expression semantics of the current student are determined according to the facial expressions in the preset expression library, for example, "happy", "anger" or "wonderful".

In addition, when students lift hands in class, the left arm can be lifted on the seat, and standard interaction postures of the hands of the students exist in the interaction language characteristic database, but in actual situations, the actions of the students are not necessarily quite standard, and certain deviation exists between the actions and the standard interaction postures of the input system. In order to accurately recognize gestures of students requesting interaction, a vector distance between an actual fingertip and a standard interaction gesture fingertip is calculated, a deviation generating value is determined, when the vector distance is controlled within a preset threshold range, the gesture is judged to be the same gesture, the similarity between the actual gesture and the standard interaction gesture is determined by calculating the vector distance between each joint of a human body under the actual gesture and each joint under the standard interaction gesture, and whether the action gesture of the students is a gesture marked in a database is determined according to the similarity.

The semantics of each student and each teacher are accurately obtained through the method, the teacher and student information is further matched, and the accuracy of determining the classroom interaction mode is improved.

In an embodiment of the present application, as shown in fig. 4, the determining, according to the lip motion dynamics intercepted in the location area where each user is located and the audio in the online video, the voice sent by each user, and obtaining the voice semantics of each user include:

S401, determining estimated voice corresponding to lip motion dynamics of each user according to a preset lip motion dynamic library;

S402, analyzing the audio in the online video to generate a plurality of segments of actual voice;

S403, matching each estimated voice with the actual voice, determining the actual voice corresponding to each lip action, and determining the voice semantics of the corresponding user according to the actual voice corresponding to each lip action.

For example, when a language teacher teaches a pinyin course in a class, the estimated voice is corresponding to "a" according to the detected corresponding lip action in the video, and the online video corresponding to the language course is analyzed to generate the actual voice at the corresponding moment, where the actual voice is "a". And determining that the actual voice corresponding to the lip action is 'a' through matching, and determining the voice semantics of the Chinese teacher at the moment, namely in teaching, according to the mode that the teacher knows that the teacher is teaching at the moment but not interacts with the students at the explanation phonetic symbols at the moment.

For example, assume that the estimated voice corresponding to the lip action of the student in the online classroom is "i have not yet understood the problem solving method of the problem", and the actual voice obtained by parsing is "i have not yet understood the problem solving method of the problem". Through matching, the actual voice corresponding to the lip action is determined to be a problem solving method that the student cannot understand the problem, the student is informed of a great doubt on the problem of explanation according to the actual voice of the student at the moment, interaction with a teacher is needed, and the voice semantics of the student at the moment are determined, namely the explanation content of the section is questioned.

According to the method, by matching the lip actions of the characters with the voice which is actually analyzed, the voice semantics of the students and the voice semantics of the teachers are distinguished more accurately, the semantics of the students are convenient to label, and the learning state of the students can be monitored better.

In an embodiment of the present application, as shown in fig. 5, the determining the facial expression of each user according to the facial dynamics intercepted in the location area of each user, and combining the facial expression analysis model to obtain the emotion semantics of each user includes:

s501, determining the estimated facial expression corresponding to each user according to a preset facial expression dynamic library.

S502, extracting facial feature points of the estimated facial expression and the actual facial expression, and determining the matching degree of the estimated facial expression and the actual facial expression according to the distance between the feature points.

S503, taking the estimated facial expression with the highest matching degree as the actual facial expression of the corresponding user, and determining the emotion semantics of the user according to the actual facial expression of the user.

In an online classroom, the estimated facial expression of the student a at the moment is determined to be smile according to a preset facial expression library, facial features of the student a are extracted, the actual facial expression of the student a is shown in fig. 6, the estimated facial expression of the student a is shown in fig. 7, eyes and mouth angles of the student a are taken as facial feature points at the moment, the distance between the corresponding facial feature points is calculated, the matching degree between the actual facial expression and the estimated facial expression is determined according to the distance between the corresponding facial feature points, the estimated facial expression with the highest matching degree is selected to be the actual facial expression of the student a, and the fact that the student a smiles at the moment is determined to be interested in teaching contents of the classroom at the moment, so that attention is focused.

Since students can not make a sound all the time in the course of a classroom, emotion semantics of the students at the moment can be accurately judged through analysis of facial expressions, a teacher can monitor the classroom states of all the students in real time conveniently, and the students are prevented from getting wrong in the course of teaching to cause low classroom efficiency.

In an implementation of the present application, as shown in fig. 8, the labeling the semantics and determining the semantic label of the multiple dimensions includes:

S801, determining a current semantic tag of a user according to a preset semantic tag system, emotion semantics, action voice and voice semantics of the user;

S802, determining semantic tags on different dimensions of the user according to the current semantic tag of the user.

Illustratively, assuming an intelligent voice assistant system, the user may be interacted with through voice recognition and natural language processing. The semantic tag system preset by the system comprises semantic tags of emotion, action, time, participation degree, liveness and other dimensions.

The emotional semantics of students may include emotional states of happiness, depression, anger, doubt, etc.; the action semantics may include user actions such as hand lifting, writing, etc.; the speech semantics may include the specific speech content of the user, such as what the student says is "please speak me again the question".

In this scenario, the user speaks a sentence: please explain this problem again for me. The system may convert speech to text through speech recognition and then analyze the user's intent through natural language processing.

In the S801 stage, the system determines the current semantic tag of the student according to the voice content and the semantic tag system of the student. In this example, the system may determine that the student's current semantic tag is "confusing for intercom solution.

In stage S802, the system may determine semantic tags in different dimensions of the user based on the student' S current semantic tag "talkback solution content is confusing". For example, a tag in the emotion dimension may be "puzzled"; the labels in the action dimension may be "hand lifting"; the tag in the time dimension may be the "current time"; the label on the user's engagement may be "high"; the label on user liveness may be "active".

Therefore, in the step, the system determines the current semantic tags of the user according to the voice semantics of the user, and further determines the semantic tags in different dimensions, so that the system can better understand the intention of the user and make corresponding response and behavior.

Illustratively, the application uses statistical graphs to collect and count related data of the student class semantic tags, such as class state line graphs, class state pie charts and the like.

The method realizes the digitization of the student classroom state, and enables teachers to more conveniently and intuitively know the student classroom situation.

In an implementation of the present application, as shown in fig. 9, the method further includes:

S901, acquiring a history live broadcast watching interaction mode, a user terminal test result and history live broadcast watching times, and determining interaction feedback information characteristics.

S902, correcting the interaction mode by utilizing the interaction feedback information characteristics.

Illustratively, assume that in a live classroom, students can watch live on a platform and interact, such as comment, praise, gift, etc.

On the platform, the historical live broadcast watching interaction mode of the user can be collected, wherein the interaction mode comprises a behavior mode of the user when watching live broadcast, such as whether the user likes comments, praise, gifts and the like. Meanwhile, the terminal test score of the student can be obtained, the terminal test score of the student comprises information such as homework completion scores, examination scores and the like at ordinary times of the student, and the historical live watching times of the student are obtained.

In step S901, the interactive feedback information feature is determined using the data. For example, the historical live watching times and interaction modes of students can be analyzed, whether the students like to interact in live broadcasting or not and which interaction mode is liked can be observed; meanwhile, the terminal test score of the student can be analyzed, and the association between the examination score of the student and the live interaction mode can be observed.

In step S902, the interaction mode may be modified by using the interaction feedback information feature. For example, if it is found that students like to comment and praise in live broadcasting in a classroom, but unexpected interactions are few in the current live broadcasting classroom, the interaction mode of the live broadcasting classroom can be considered to be modified, for example, the current teaching mode is changed into a targeted interaction mode, or the feedback speed of teachers on the current student problem is improved, so that the classroom satisfaction and experience of the students are improved.

Therefore, in this example, the information such as the live broadcast interaction mode is watched in the history, the test result of the user terminal, the live broadcast number is watched in the history, the characteristics of the interaction feedback information are determined, and the interaction mode is corrected by using the characteristics, so that the satisfaction degree and experience of the students are improved, and the class concentration degree of the students is improved.

In an embodiment of the present application, as shown in fig. 10, the correcting the interaction mode by using the interaction feedback information includes:

s1001, determining an interaction score according to the interaction feedback information characteristics;

S1002, analyzing the interaction score and the user terminal test score, and generating a direct proportion relation between the interaction score and the user terminal test score;

S1003, correcting the interaction mode according to the proportional relation.

Illustratively, the live interaction mode is watched according to the history of the students, for example, the interaction requirement of the students in the past live classroom. The user terminal tests results, such as a student's usual homework completion score and examination score. And the history watching live broadcast times of students determine the interactive feedback information characteristics, for example, the students A belong to students loving thinking questions, the teacher is frequently asked in a classroom, the loving questions are the interactive feedback characteristics of the students A, the interactive feedback characteristics are scored according to preset rules, the interactive scores of the students in the live broadcast classroom are determined, for example, the students A loving questions, the interactive scores of the students are higher, the relationship between the interactive scores of the students and the ordinary homework scores is analyzed, the students which generally interact with the teacher are found, the completion degree and the completion quality of the ordinary homework are higher, the live broadcast factors which are in proportion to the interactive scores are determined, the live broadcast interactive mode can be continuously corrected according to all the data, the mode suitable for the interaction of the teacher and the student is further found, the interaction mode with the students can be timely adjusted, and the classroom enthusiasm of the students is mobilized.

According to the method, the teacher-student interaction mode in live broadcast teaching is continuously corrected, so that the teaching efficiency of a live broadcast classroom is further improved, and the enthusiasm of students in the live broadcast classroom is mobilized.

In addition, the method has different interaction mode standards for different classroom properties.

The three classes are exemplified, namely, the communication part before class of teachers and students, the communication part in class of teachers and students and the communication part after class of teachers and students. The pre-class communication part flow is more in the pre-class communication links, a teacher can issue data and topics, students need to study the class according to the data issued by the teacher, and the students are required to issue own problems in the unintelligible aspect, so that the interaction demand is larger in the period before the class, more interactions are required to be carried out between the teacher and the students, and particularly a specific interaction mode is required. The class festival teacher generally looks over the attendance condition earlier, then the in-process of lecture organizes the student and discusses, and finally explain the test of releasing, and whole process is the explanation of taking the teacher as the main part, and student's interdynamic need not too much, and the class interaction mode of this moment is one-to-many explanation, occasionally needs to adjust student liveness change to the purposeful one-to-one interdynamic. In the post-class communication link, mainly, teachers correct the homework of students and answer the questions of students, and the link also needs a specific interaction mode similar to the interaction before class.

The method of the embodiment of the present application is mainly described above with reference to the drawings. It should be noted that all values presented above are only examples and do not limit the application in any way. It should also be understood that, although the steps in the flowcharts related to the embodiments described above are shown in order, these steps are not necessarily performed in the order shown in the figures. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages. The system of the embodiment of the application is described below with reference to the accompanying drawings. For brevity, the description of the system will be omitted appropriately, and the related content may refer to the related description in the description of the method above, and the description will not be repeated.

Fig. 11 is a schematic structural diagram of a live interaction dynamic switching system according to an embodiment of the present application.

As shown in fig. 11, the system 1100 includes an acquisition module 1101, an analysis module 1102, a definition module 1103, and an output module 1104. The system 1100 is capable of performing any of the live classroom interaction mode determination methods described above. For example, the acquisition module 1101 may be used to perform step S201, the analysis module 1102 may be used to perform step S202, the definition module 1103 may be used to perform step S203, and the output module may be used to perform step S204.

Fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 12, the electronic apparatus 3000 of this embodiment includes: at least one processor 3001 (only one is shown in fig. 12), a memory 3002, and a computer program 3003 stored in the memory 3002 and executable on the at least one processor 3001, the steps in the above embodiments being implemented by the processor 3001 when executing the computer program 3003.

The Processor 3001 may be a central processing unit (Central Processing Unit, CPU), the Processor 3001 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Memory 3002 may be an internal storage unit of electronic device 3000 in some embodiments, such as a hard disk or memory of electronic device 3000. The memory 3002 may also be an external storage device of the electronic device 3000 in other embodiments, such as a plug-in hard disk provided on the electronic device 3000, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like. Further, the memory 3002 may also include both internal storage units and external storage devices of the electronic device 3000. The memory 3002 is used for storing an operating system, application programs, boot Loader (Boot Loader) data, other programs, and the like, such as program codes of computer programs, and the like. The memory 3002 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, because the content of information interaction and execution process between the above units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

It will be apparent to those skilled in the art that the above-described functional units are merely illustrated in terms of division for convenience and brevity, and that in practical applications, the above-described functional units and modules may be allocated to different functional units or modules according to needs, i.e., the internal structure of the system may be divided into different functional units or modules to perform all or part of the above-described functions. The functional units in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present application. The specific working process of the units in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiments of the present application also provide a computer readable storage medium storing a computer program, which when executed by a processor implements steps of the above-described respective method embodiments.

Embodiments of the present application provide a computer program product enabling the implementation of the above-mentioned methods when the computer program product is run on a computer.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a camera device/electronic apparatus, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application. In the description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

Furthermore, in the description of the application and the claims that follow, the terms "comprise," "include," "have" and variations thereof are used to mean "include but are not limited to," unless specifically noted otherwise.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus, computer device and method may be implemented in other manners. For example, the apparatus, computer device embodiments described above are merely illustrative, e.g., the partitioning of elements is merely a logical functional partitioning, and there may be additional partitioning in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. The live interaction dynamic switching method is characterized by comprising the following steps of:

acquiring a terminal identity of current online video interaction, and a user expression, a user action and a user voice which are input by each current terminal;

analyzing the user expression, the user action and the user voice, and respectively determining emotion semantics, action semantics and voice semantics;

Labeling the emotion semantics, the action semantics and the voice semantics, and determining semantic labels with multiple dimensions;

2. The method of claim 1, wherein analyzing the user expression, the user action, and the user voice to determine emotion semantics, action semantics, and voice semantics, respectively, comprises:

3. The method for dynamically switching live interaction according to claim 2, wherein determining the voice uttered by each user according to the lip motion dynamics intercepted in the location area of each user and the audio in the online video, and obtaining the voice semantics of each user comprises:

And matching each estimated voice with the actual voice, determining the actual voice corresponding to each lip action, and determining the voice semantics of the corresponding user according to the actual voice corresponding to each lip action.

4. The method for dynamically switching live interaction according to claim 2, wherein the determining facial expression of each user according to the facial dynamics intercepted in the location area of each user, and combining the facial expression analysis model to obtain emotion semantics of each user comprises:

and determining the actual facial expression of the corresponding user according to the estimated facial expression with the highest matching degree, and determining the emotion semantics of the user according to the actual facial expression of the user.

5. The method of claim 1, wherein the tagging the emotion semantics, action semantics, and voice semantics to determine semantic tags for multiple dimensions comprises:

Determining a current semantic tag of a user according to a preset semantic tag system, emotion semantics, action semantics and voice semantics of the user;

6. The method for dynamic switching live interaction according to any one of claims 1-5, further comprising:

7. The method of claim 6, wherein modifying the interaction mode using the interaction feedback information feature comprises:

And correcting the interaction mode according to the proportional relation.

8. The live interaction dynamic switching system is characterized by comprising:

The analysis module is used for analyzing the user expression, the user action and the user voice and respectively determining emotion semantics, action semantics and voice semantics;

the definition module is used for labeling the emotion semantics, the action semantics and the voice semantics and determining semantic tags of multiple dimensions;

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.