CN113467604A

CN113467604A - Data interaction method and related equipment

Info

Publication number: CN113467604A
Application number: CN202010467086.4A
Authority: CN
Inventors: 矫佩佩; 高雪松; 陈维强
Original assignee: Hisense Co Ltd
Current assignee: Hisense Group Co Ltd; Hisense Co Ltd
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2021-10-01

Abstract

The application discloses a data interaction method and related equipment, which are used for providing an operation mode in a user interaction process and simplifying an operation flow of user interaction. The method comprises the following steps: the method comprises the steps of obtaining video and audio of a first object, sending the video and audio of the first object to at least one second device, and receiving the video of a second object corresponding to the second device, which is obtained by the at least one second device; recognizing gesture information of each frame of image in the acquired video of the first object, and determining a target instruction corresponding to the recognized gesture information; and sending the target instruction, wherein the target instruction is used for instructing a target device in the second device to execute an operation corresponding to the target instruction.

Description

Data interaction method and related equipment

Technical Field

The present application relates to the field of data processing, and in particular, to a data interaction method and related device.

Background

With the arrival of the big data age, the number of users for online education is rapidly increased, and online education becomes an important teaching mode in the education world. The online teaching can be free from the limitation of the situations that teachers and students cannot perform offline teaching in cold holidays and summer holidays or cannot participate in offline teaching due to other reasons, and the teachers and the students are assisted to participate in teaching activities. In the future, the use scenes of online education will be more and more extensive.

In the existing online teaching scene, teachers and students need to select various functions by clicking buttons, sometimes the buttons or options need to be searched by switching display interfaces and other operations, the operation process is complex, and the form is single.

Disclosure of Invention

An exemplary embodiment of the present application provides a data interaction method and related devices, so as to provide an operation mode, enrich the operation modes in an online teaching scene, and simplify an operation flow.

According to a first aspect of the exemplary embodiments, there is provided a data interaction method, including:

acquiring video and audio of a first object, sending the video and audio of the first object to at least one second device, and receiving the video of a second object corresponding to the second device, acquired by the at least one second device; and

recognizing gesture information of each frame of image in the acquired video of the first object, and determining a target instruction corresponding to the recognized gesture information;

and sending a target instruction, wherein the target instruction is used for instructing a target device in the second device to execute an operation corresponding to the target instruction.

In the above embodiment, through the gesture information of each frame image in the video of discerning the first object that acquires, confirm the target instruction, send this target instruction, make target device carry out the operation that target instruction corresponds, realize control target device, first object can be through the mode of doing the gesture, trigger and send the target instruction, the operation process of first object control target device has been simplified, promote the user interaction efficiency that first object and second equipment correspond, realize providing other operation mode for teachers and students in the online education scene, simplify the interactive operation process of teachers and students.

In some exemplary embodiments, sending a target instruction, where the target instruction is used to instruct a target device in the second device to perform an operation corresponding to the target instruction, further includes:

identifying an object identifier contained in the audio of the first object, and determining that the second device corresponding to the object identifier is the target device, wherein the duration between the acquisition time of the audio and the acquisition time of the image to which the identified gesture information belongs is less than the preset duration.

In the above embodiment, the second device corresponding to the object identifier identified from the audio of the first object may be determined as the target device, so that the target instruction for interaction between the first object and the user corresponding to the target device is determined by identifying the audio and the video of the first object, the target device is controlled to execute the operation corresponding to the target instruction, the operation flow for interaction between the first object and the user corresponding to the target device is simplified, the processing time for user interaction between the first object and the target device is shortened, and the interaction efficiency between the first object and the user corresponding to the target device is improved.

In some exemplary embodiments, sending a target instruction, where the target instruction is used to instruct a target device in the second device to perform an operation corresponding to the target instruction, includes:

if the target instruction is a control sound starting instruction, sending the control sound starting instruction so that the target equipment sends the acquired audio of the second object corresponding to the target equipment; or

If the target instruction is a control sound closing instruction, sending the control sound closing instruction, and controlling the second equipment to stop sending the acquired audio of the second object corresponding to the second equipment; or

If the target instruction is an image display instruction, sending the image display instruction so that the target equipment determines and controls to display an icon corresponding to the image display instruction; or

And if the target instruction is a sound effect playing instruction, sending the sound effect playing instruction so that the target equipment determines and controls the audio corresponding to the audio playing instruction to be played.

In the foregoing embodiment, a plurality of target instructions are provided for the first object to implement a process of controlling the target device to perform a plurality of operations, for example, the control target device sends the audio of the second object corresponding to the target device to the first device, so that the first device can receive the audio of the second object to implement a question-answer interaction process for teachers and students. And if the second equipment is controlled to stop sending the audio of the corresponding second object, the first equipment is enabled not to receive the audio of the second object any more, and the process of teacher-student question-answer interaction or classroom discipline management is finished. And if the control target equipment determines and displays the icon corresponding to the image display instruction, the teacher evaluates the students in the teacher-student interaction process. And if the control target equipment determines and plays the audio corresponding to the audio playing instruction, the teacher evaluates the students in the teacher-student interaction process. The target equipment is controlled to execute the operation corresponding to the target instruction, so that the interaction quality of teachers and students and the interaction efficiency can be improved.

In some exemplary embodiments, the method further comprises:

and controlling to display the video of the second object corresponding to the second equipment acquired by the at least one second equipment.

In the above embodiment, in the online education scene, by controlling the display screen included in the first device or controlling the display screen connected to the first device to display the video of the at least one second object, the teacher can watch the video of the student corresponding to the second device, and the teacher is assisted in supervising and mastering the classroom state of the student.

According to a second aspect of the exemplary embodiments, there is provided a data interaction method, including:

acquiring a video of a second object, sending the video of the second object to first equipment, and receiving the video and audio of the first object acquired by the first equipment;

receiving a target instruction sent by first equipment, and executing an operation corresponding to the target instruction;

wherein the target instruction is determined by the first device based on the acquired video-recognized gesture information of the first object.

In the above embodiment, the received target instruction is determined by the first device recognizing the acquired gesture information of the video of the first object, and the operation corresponding to the target instruction is executed, so that the first device controls the target device, the operation process of controlling the target device by the first object corresponding to the first device is simplified, and the user interaction efficiency corresponding to the first object and the second device is improved.

In some exemplary embodiments, the method further comprises:

recognizing gesture information of each frame of image in the obtained video of the second object;

and if the recognized gesture information is determined to be the preset gesture information, acquiring the audio of the second object, and sending the audio of the second object to the first equipment.

In the above embodiment, the second object may send the audio to the first device by making the preset gesture information, so that the interaction between the second object and the first object corresponding to the first device is simplified, for example, an interaction process of a student asking a teacher for a question.

In some exemplary embodiments, a target instruction sent by a first device is received, and an operation corresponding to the target instruction is executed, including:

if the target instruction is a control sound starting instruction, sending the acquired audio of the second object; or

If the target instruction is a control sound closing instruction, stopping sending the acquired audio of the second object; or

If the target instruction is an image display instruction, determining and controlling to display an icon corresponding to the image display instruction according to the image display instruction; or

And if the target instruction is a sound effect playing instruction, determining and controlling to play the audio corresponding to the audio playing instruction according to the audio playing instruction.

In the above embodiment, the target device executes the operation process corresponding to the instruction, for example, the audio of the second object of the target device is sent to the first device, so that the first device can receive the audio of the second object, and the process of question-answer interaction between teachers and students is implemented. And if the audio of the second object is stopped to be sent, the first equipment does not receive the audio of the second object any more, and the process of teacher-student question-answer interaction or classroom discipline management is finished. And if the display contained in the second equipment or the display connected with the second equipment is determined and controlled to display the icon corresponding to the image display instruction, the evaluation of the teacher to the students in the teacher-student interaction process is received. And if the audio playing device contained in the second equipment or the audio playing device connected with the second equipment is determined and controlled to play the audio corresponding to the audio playing instruction, the evaluation of the teacher to the students in the teacher-student interaction process is received. The operation corresponding to the instruction is executed, and the interaction quality and the interaction efficiency of teachers and students can be improved.

In some exemplary embodiments, the method further comprises:

and controlling to play the video and the audio of the first object acquired by the first equipment.

In the above embodiment, in the online education scene, the student may learn about the content of the teacher's explanation or the lecture by controlling the display screen included in the second device or controlling the display screen connected to the second device to play the video of the first object and by controlling the audio playing device included in the second device or the audio playing device connected to the second device.

According to a third aspect of the exemplary embodiments, there is provided a gesture recognition method including:

acquiring a video of a target object;

matching a target image containing a human face in each frame image of the video with a human face image corresponding to a preset target object;

and recognizing gesture information in the target image matched with the target face image.

In the above embodiment, each frame of image in the video of the target object is identified, the face in the image is matched with the face image corresponding to the target object, the gesture information in the target image matched with the face image corresponding to the target object is identified, the accuracy of identifying the gesture information of the target object is improved, and the interaction quality between teachers and students is improved.

In some exemplary embodiments, identifying gesture information in a target image that matches a target face image includes:

if the target image only contains a single hand, determining gesture information of the single hand image; or

And if the target image comprises a plurality of hands, determining gesture information of a hand image which is closest to the position of the face image matched with the target face image in the target image.

In the above embodiment, the number of hands included in the target image is determined, the gesture information of the target object is determined, and if only one hand exists in the target image, the pickup information of the hand image is determined as the gesture information of the target object. If a plurality of hands exist in the target image, the hand information of the hand image with the position closest to the target face image is determined as the gesture information of the target object, so that the accuracy of gesture information identification of the target object is improved, and the hand information of other objects is prevented from being identified as the gesture information of the target object.

and determining gesture information contained in the gesture image by utilizing a pre-trained gesture recognition neural network model, wherein the gesture recognition neural network model is generated by training by taking a plurality of pre-collected gesture image samples and gesture information contained in each gesture image sample as input and taking the gesture information contained in each gesture image sample as output.

According to a fourth aspect of the exemplary embodiments, there is provided an electronic device, comprising: a memory, a processor;

the memory is used for storing computer programs or instructions;

the processor is used to execute the computer program or instructions in the memory to implement the following processes:

In some exemplary embodiments, the apparatus further comprises: the device comprises a camera device, an audio acquisition device, a communication unit and a display;

the camera device is used for acquiring a video of a first object;

the audio acquisition device is used for acquiring the audio of the first object;

the communication unit is used for sending the video and the audio of the first object to at least one second device, receiving the video of a second object corresponding to the second device and collected by the at least one second device, and sending a target instruction to a target device in the second device so that the target device can execute an operation corresponding to the target instruction;

and the display is used for displaying the video of the second object which is acquired by the at least one second device and corresponds to the second device.

In some exemplary embodiments, the processor is further configured to:

In some exemplary embodiments, the processor is specifically configured to:

In some exemplary embodiments, the processor is further configured to:

According to a fifth aspect of the exemplary embodiments, there is provided an electronic device, comprising: a memory, a processor;

the memory is used for storing computer programs or instructions;

In some exemplary embodiments, the apparatus further comprises: the device comprises a camera device, an audio acquisition device, a communication unit, a display and an audio playing device;

the camera device is used for acquiring a video of a second object;

the communication unit is used for sending the video of the second object to the first equipment, receiving the video and the audio of the first object collected by the first equipment and receiving a target instruction sent by the first equipment;

the display is used for displaying the video of the first object acquired by the first equipment;

and the audio playing device is used for playing the audio of the first object acquired by the first equipment.

In some exemplary embodiments, the processor is further configured to:

In some exemplary embodiments, the processor is specifically configured to:

In some exemplary embodiments, the processor is further configured to:

According to a sixth aspect of the exemplary embodiments, there is provided an electronic device comprising a memory and a processor;

a memory for storing processor-executable instructions;

acquiring a video of a target object;

In some exemplary embodiments, the processor is specifically configured to:

According to a seventh aspect of the exemplary embodiments, there is provided a data interaction device, including:

the acquisition unit is used for acquiring the video and the audio of the first object, sending the video and the audio of the first object to at least one second device and receiving the video of the second object corresponding to the second device, which is acquired by the at least one second device; and

the processing unit is used for identifying gesture information of each frame of image in the acquired video of the first object and determining a target instruction corresponding to the identified gesture information;

and the sending unit is used for sending the target instruction, and the target instruction is used for instructing the target equipment in the second equipment to execute the operation corresponding to the target instruction.

In some exemplary embodiments, the processing unit is further configured to:

In some exemplary embodiments, the sending unit is specifically configured to:

In some exemplary embodiments, the processing unit is further configured to:

According to an eighth aspect of the exemplary embodiments, there is provided a data interaction apparatus, including:

the audio and video processing unit is used for acquiring a video of a second object, sending the video of the second object to the first equipment, and receiving the video and the audio of the first object acquired by the first equipment;

the instruction processing unit is used for receiving a target instruction sent by the first equipment and executing an operation corresponding to the target instruction;

In some exemplary embodiments, the audio video processing unit is further configured to:

In some exemplary embodiments, the instruction processing unit is specifically configured to:

In some exemplary embodiments, the audio processing unit is further configured to:

According to a ninth aspect in the exemplary embodiments, there is provided a gesture recognition apparatus including:

an acquisition unit configured to acquire a video of a target object;

the matching unit is used for matching a target image containing a human face in each frame image of the video with a human face image corresponding to a preset target object;

and the processing unit is used for identifying gesture information in the target image matched with the target face image.

In some exemplary embodiments, the processing unit is specifically configured to:

According to a tenth aspect of the exemplary embodiments, the present application further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the methods of the first, second, third aspects.

In addition, for technical effects brought by any one implementation manner of the fifth to tenth aspects, reference may be made to technical effects brought by different implementation manners of the first aspect, the second aspect, and the third aspect, and details are not repeated here.

On the basis of the common knowledge in the field, the above preferred conditions can be combined randomly to obtain the preferred embodiments of the application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a schematic diagram illustrating a communication manner between an electronic device used by a teacher and an electronic device used by a student in the present application;

fig. 2 schematically illustrates a structural diagram of an electronic device according to an embodiment of the present invention;

fig. 3 schematically illustrates a structural diagram of another electronic device provided in an embodiment of the present invention;

fig. 4 is a schematic structural diagram illustrating another electronic device provided by an embodiment of the invention;

fig. 5 is a schematic structural diagram illustrating another electronic device provided by an embodiment of the invention;

FIG. 6 is a flow chart illustrating a data interaction method provided by an embodiment of the invention;

FIG. 7 is a flow chart illustrating another data interaction method provided by the embodiment of the invention;

FIG. 8 is a schematic diagram illustrating a data interaction flow in a teacher-side and student-side group building online classroom scenario;

FIG. 9 is a schematic diagram illustrating a data interaction flow in a teaching scene of online classes at a teacher end and a student end;

FIG. 10 is a flow chart illustrating a gesture recognition method according to an embodiment of the present invention;

FIG. 11 is a schematic structural diagram illustrating a data interaction device according to an embodiment of the present invention;

FIG. 12 is a schematic diagram illustrating an alternative data interaction apparatus according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram illustrating a gesture recognition apparatus according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present application will be described in detail and removed with reference to the accompanying drawings. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" in the text is only an association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: three cases of a alone, a and B both, and B alone exist, and in addition, "a plurality" means two or more than two in the description of the embodiments of the present application.

The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as implying or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of embodiments of the application, unless stated otherwise, "plurality" means two or more.

The teacher side in the embodiment of the application is electronic equipment on the teacher side, and the student side is electronic equipment on the student side. The electronic equipment and the data interaction method provided in the embodiment of the application are not limited to be applied to online teaching scenes, and can also be applied to offline teaching scenes, such as computer classes and the like. Besides educational related scenes, the electronic device and the data interaction method provided in the embodiments of the present application can also be applied to online meeting scenes such as remote video conferences, group discussion conferences, and the like.

Fig. 1 is a schematic diagram illustrating a communication manner between an electronic device used by a teacher and an electronic device used by a student in the present application, and as shown in fig. 1, an electronic device 10 may interact with one or more electronic devices 20, and interact with audio-video data and instructions through a wired or wireless connection manner. In a typical online teaching scenario, the electronic device 10 and the electronic device 20 send audio/video data and instructions to the opposite end through the cloud server 30. The electronic device 10 and the electronic device 20 may interact with each other through a wired connection or through a wireless connection in the same local area network 31.

In one example, the electronic device 10 and the electronic device 20 are configured to have the same hardware structure, and the electronic device has a memory storing computer program code stored in the electronic device 10 and computer program code stored in the electronic device 20, so that the user can be provided with options of a "teacher mode" and a "student mode", and the user can enable a processor of the electronic device to implement a process of executing the data interaction method by a processor of the electronic device 10 by selecting the "teacher mode", and can enable a processor of the electronic device to implement a process of executing the data interaction method by a processor of the electronic device 20 by selecting the "student mode".

Fig. 2 exemplarily shows a schematic structural diagram of the electronic device 10, and as shown in fig. 2, the electronic device 10 includes a processor 201, a camera 202, an audio capture device 203, a communication unit 204, and a display 205. The electronic device 10 may be implemented as a digital television, a web television, an Internet Protocol Television (IPTV), or the like.

The electronic device 10 also communicates data with the electronic device 20 via a plurality of communication means. The electronic device 10 may be enabled for communicative connection over a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The electronic device 10 may provide various content and interactions to the electronic device 20. By way of example, the electronic device 10 may transmit video, audio, and instructions. The electronic device 10 may be in data communication with the electronic device 20 via one or more sets, one or more types of servers.

When the processor 201 of the electronic device 10 acquires the audio of the first object, the camera 202 or an external camera connected to the electronic device 10 may be controlled to capture the video of the first object. When the audio of the first object is acquired, the audio acquisition device 203 or an external audio acquisition device connected to the electronic device 10 may be controlled to acquire the audio of the first object and transmit the acquired video and audio of the first object to the electronic device 20 through the communication unit 204, so as to implement an online education service function in which the video and audio of the teacher are transmitted to the students.

The communication unit 204 may also receive video of the second object transmitted by the electronic device 20; the processor 201 controls the display 205 or an external display device connected to the electronic device 10 to display a video of a second object corresponding to the electronic device 20 acquired or collected by at least one electronic device 20, so as to implement an online education service function in which a teacher views, watches, or confirms videos of teaching students.

The processor 201 executes the stored computer program code or computer program code stored in the memory of the electronic device 10 to implement the following processes:

and identifying gesture information of each frame of image in the acquired video of the first object, and determining a target instruction corresponding to the identified gesture information.

In an actual application scene, the corresponding relation between the gesture and the instruction is configured in advance. For example, the user can make an 'OK' gesture by hand to control sound opening instructions, make a 'two-hand cross' gesture by hand to control sound closing instructions, make an 'erect thumb' gesture by hand to correspond to image display instructions, make a 'clapping' gesture by hand to correspond to image display instructions and/or sound effect playing instructions, and make a 'heart-to-heart' gesture by hand to correspond to image display instructions. Other corresponding relations between gestures and instructions can be configured, and the embodiment of the application is not listed. The preconfigured gestures may be static gestures or dynamic gestures (dynamic actions).

One possible correspondence is shown in table 1:

TABLE 1

Instructions	Instruction function
		Controlling voice turn-on command	Controlling a target device to transmit captured audio
Controlling voice off command	Controlling the target device or all electronic devices to stop sending the collected audio
		Image presentation instructions	Control target device presentation icon
Sound effect playing instruction	Controlling a target device to play audio

It should be noted that the image display instruction provided in the embodiment of the present application may carry an image identifier, so that the target device determines an icon to be displayed according to the image display instruction, and the provided sound effect playing instruction may carry an audio identifier, so that the target device determines an audio to be played according to the sound effect playing instruction.

The processor 201 may recognize gesture information of each frame of image in the first object video, and when recognizing that a preset gesture is included in the current frame of image, may determine an instruction (target instruction) corresponding to the recognized gesture according to a correspondence between the preset gesture and the instruction. And controls the communication unit 204 to send the target instruction to the electronic device 20 or send the target instruction to the cloud server 30, and sends the target instruction to the corresponding electronic device 20 through the cloud server 30.

And if the target instruction is a control sound closing instruction, sending the control sound closing instruction, and controlling the electronic device 20 to stop sending the collected audio of the second object corresponding to the second device.

The electronic device 10 may receive audio transmitted by the electronic device 20 and play the audio transmitted by the electronic device 20 through a speaker of the electronic device 10 or a sound output device connected to the electronic device 10. So as to realize the function of online education service that teachers listen to questions or answers of students. When the teacher does not need to listen to the questions or answers of the student, the teacher sends a control sound closing instruction to the electronic device 20 by making a "two-hand cross" gesture, so that the electronic device 20 stops sending the collected audio corresponding to the second object to the electronic device 20, or the electronic device 20 stops collecting the audio corresponding to the second object. The teacher sends a sound closing control instruction to the electronic device 20 to which the currently received audio belongs by making a 'two-hand crossing' gesture, and can also send a sound closing control instruction to all the electronic devices 20 to control all students to forbid speaking.

If the teacher makes a question, encourages, or the like for a specific student, the teacher may trigger the processor 201 to recognize the voice by using a hand to perform an "OK" gesture, a "thumbs up" gesture, a "clap" gesture, or a "heart compare" gesture, and after the processor 201 recognizes that any one of the "OK" gesture, the "thumbs up" gesture, the "clap" gesture, and the "heart compare" gesture is included in the current frame image, whether the audio of the acquired first object includes an object identifier is recognized, and the electronic device 20 corresponding to the recognized object identifier is a target device. The control communication unit 204 sends the target instruction to the electronic device 20 corresponding to the object identifier, or sends the target instruction and the object identifier (or the target instruction carrying the object identifier) to the cloud server 30, and the cloud server 30 sends the target instruction to the electronic device 20 corresponding to the object identifier.

For example, when the current frame image includes the "OK" gesture, the processor 201 recognizes an object identifier included in the audio of the first object whose duration is less than a preset duration from the time of capturing the frame image, such as an object identifier included in the audio within 10 seconds after the time of capturing the frame image. The object identifier has a one-to-one correspondence relationship with the second object, for example, the object identifier may be a name of a student, or may be a school number of the student, etc. The communication unit 204 sends a control sound on instruction to the electronic device 20 corresponding to the object identifier.

If the target instruction is an image display instruction, the image display instruction is sent to the electronic device 20 corresponding to the object identifier, or the target instruction and the object identifier (or the target instruction carrying the object identifier) are sent to the cloud server 30, and the cloud server 30 sends the target instruction to the electronic device 20 corresponding to the object identifier, so that the electronic device 20 corresponding to the object identifier determines and controls to display the icon corresponding to the image display instruction.

If the target instruction is a sound effect playing instruction, the sound effect playing instruction is sent to the electronic device 20 corresponding to the object identifier, or the target instruction and the object identifier (or the target instruction carrying the object identifier) are sent to the cloud server 30, and the cloud server 30 sends the target instruction to the electronic device 20 corresponding to the object identifier, so that the electronic device 20 corresponding to the object identifier determines and controls to play the audio corresponding to the audio playing instruction.

If the target instruction is a control sound opening instruction, the control sound opening instruction is sent to the electronic device 20 corresponding to the object identifier, or the target instruction and the object identifier (or the target instruction carrying the object identifier) are sent to the cloud server 30, and the cloud server 30 sends the target instruction to the electronic device 20 corresponding to the object identifier, so that the electronic device 20 corresponding to the object identifier sends the acquired audio of the second object corresponding to the electronic device 20.

In a possible embodiment, the processor 201 may recognize, in real time, that the audio of the first object includes an object identifier, recognize gesture information included in each frame of image of the video, where the duration between the audio acquisition times is less than a preset duration, and take the electronic device 20 corresponding to the object identifier as a target device, and take an instruction corresponding to the gesture information as a target instruction. The implementation sends instructions to the designated student.

The electronic equipment 10 that this application embodiment provided can provide the teaching interactive mode with the student at the teaching in-process for the teacher, through sending control student's electronic equipment 20 with the gesture, simplifies the teacher and imparts knowledge to students the in-process and imparts knowledge to students interactive operation flow, and it facilitates to impart knowledge to students for teachers and students, carries out effectual teaching interdynamic, and then promotes the teaching quality of online teaching.

Fig. 3 exemplarily shows a schematic structural diagram of an electronic device 20, and as shown in fig. 3, the electronic device 20 includes a processor 301, a camera 302, an audio capture device 303, a communication unit 304, a display 305, and an audio playing device 306. The electronic device 20 may be implemented as a digital television, a web television, an Internet Protocol Television (IPTV), or the like.

When the processor 301 of the electronic device 20 acquires the video of the second object, the camera 302 or an external camera connected to the electronic device 20 may be controlled to capture the video of the second object. When the audio of the second object is acquired, the audio capture device 303 or an external audio capture device connected to the electronic device 20 may be controlled to capture the audio of the second object, and the communication unit 304 may be controlled to transmit the video of the second object to the electronic device 10.

The communication unit 304 may receive the video and audio of the first object acquired by the electronic device 10 and the target instruction transmitted by the electronic device 10.

The processor 301 may control the display 305 to display the video of the first object acquired or captured by the electronic device 10, and may also control the audio playing device 306 to play the audio of the first object acquired or captured by the electronic device 10. The online education service function of the teaching content watched by the students is realized.

The processor 301 performs an operation corresponding to a target instruction determined by the electronic device 10 based on the gesture information recognized by the captured video of the first object.

The processor 301 executes the operation corresponding to the target instruction, and realizes interaction with the electronic device 10. If the target instruction is a control sound opening instruction, the processor 301 controls the communication unit 304 to send the obtained or collected audio of the second object, so as to send the audio of the second object to the electronic device 10, and perform interaction of asking questions or answering questions in online teaching with the first object (e.g., a teacher).

If the target instruction is a control sound closing instruction, the processor 301 controls the communication unit 304 to stop sending the acquired or collected audio of the second object, so as to be controlled by the electronic device 10 to stop sending the audio of the second object. In an online teaching scene, a teacher controls all students to forbid speaking.

If the target command is an image display command, the processor 301 determines an icon corresponding to the image display command according to the image display command, and controls the display 305 or an external display device connected to the electronic device 20 to display the icon corresponding to the image display command, for example, a clap icon, a praise icon, and the like. And enabling the second object corresponding to the electronic equipment 20 to obtain feedback of the first object corresponding to the electronic equipment 10. Such as reward feedback, etc. Wherein the image presentation instruction may carry an icon identification, the processor 301 may determine an icon for presentation to the second object by the icon representation.

If the target instruction is a sound playing instruction, the processor 301 determines an audio corresponding to the audio playing instruction according to the audio playing instruction, and controls the audio playing device 306 or an external audio playing device connected to the electronic device 20 to play the audio corresponding to the audio playing instruction, such as applause audio, cheering audio, and the like. And enabling the second object corresponding to the electronic equipment 20 to obtain feedback of the first object corresponding to the electronic equipment 10. Such as reward feedback, etc.

In a possible implementation manner, the processor 301 may recognize gesture information of each frame of image in the obtained video of the second object, and when it is recognized that the current frame of image includes a preset gesture, control the audio acquisition device 303 to acquire audio of the second object, and control the communication unit 304 to send the acquired audio of the second object to the electronic device 10, so that the second object interacts with the first object corresponding to the electronic device 10 through the preset gesture, for example, a student asks a question to a teacher through the preset gesture.

The electronic device 10 may also be embodied as a control device connected to a device having a display function, such as a television, a display, a screen projection device, or the like. Fig. 4 schematically shows a structural diagram of an electronic device 10, which includes an Over The Top (OTT) box 40 and a central control device 41, which are provided via The internet. The electronic device 10 may also be connected to an external camera device (e.g., a camera), an external audio capture device (e.g., a microphone), and an external audio playback device (e.g., a speaker, a sound box, or an earphone).

In an actual application scenario, an application program corresponding to the data interaction method provided by the present application is further stored in the OTT box 40. When the application program runs, the central control device 41 may implement a data interaction process between the electronic device 10 and the electronic device 20 in the above embodiments of the present application.

The processor 401 in the OTT box 40 may receive an operation instruction of the first object (teacher or teacher user) and transmit the operation instruction to the central control device 41. The operation instruction can be an instruction for opening an application program, an instruction for closing the application program, an instruction for selecting an interactive object (student user), a mode control instruction, a voice control instruction and the like.

The OTT box 40 may further receive an audio sent by the central control device 41 (an audio obtained by the central control device 41 from the cloud server 30), and transmit the audio to a connected audio playing device through the audio interface 402 and a connection line for playing. The OTT box 40 may also receive a video sent by the central control device 41 (a video obtained by the central control device 41 from the cloud server 30), and transmit the video to a connected display through the video interface 403 and a connection line for displaying.

The central control device 41 may receive an operation instruction transmitted from the OTT box 40, and obtain video and audio from the cloud server 30 according to the instruction. The central control device 41 uploads the acquired video and audio (the video and audio acquired by the connected external camera device and the audio acquisition device) to the cloud server 30 for being called by other central control devices to be displayed. The central control device 41 may recognize gesture information of each frame of image in a video acquired by an external camera device connected thereto, and may also recognize an object identifier included in an audio acquired by an external audio acquisition device connected thereto, that is, may perform gesture and voice recognition on a local audio/video stream, determine a corresponding target instruction according to the recognized gesture information and/or object identifier, and upload the target instruction to the cloud server 30, so that the cloud server 30 transmits the target instruction to other central control devices or a designated central control device.

In one possible implementation, the OTT box 40 includes a processor 401, an audio interface 402 connected to an audio playback device, and a video interface 403 connected to a display. The OTT box 40 may also include a communication unit if the processor 401 does not have the capability to send and receive information.

The central control device 41 includes a processor 411 and a communication unit 412. The processor 411 may control the camera device connected to the electronic device 10 to capture a video of the first object (teacher), control the audio capture device connected to the electronic device 10 to capture an audio of the teacher, control the communication unit 412 to upload the video and the audio of the teacher to the cloud server 30, and send the audio and the video of the teacher to the electronic device 20 through the cloud server 30. The processor 411 may recognize gesture information (gestures) of each frame of image in the teacher video and determine a target instruction corresponding to the recognized gesture information.

The processor 411 controls the communication unit 412 to upload the target instruction to the cloud server 30, and sends the target instruction to the electronic device 20 through the cloud server 30, so that the electronic device 20 executes an operation corresponding to the target instruction, thereby controlling the electronic device 20.

The electronic device 20 at the student end in the online teaching scene can also be implemented as a control device connected with a device with a display function, such as a television, a display, a screen projection device, etc. fig. 5 exemplarily shows a structural schematic diagram of the electronic device 20, which includes an OTT box 50 and a central control device 51. The electronic device 20 may also be connected to a camera device (e.g., a camera), an audio capture device (e.g., a microphone), and an audio playback device (e.g., a speaker, a sound box, or an earphone). In an actual application scenario, an application program corresponding to the data interaction method provided by the present application is further stored in the OTT box 50. When the application program runs, the data interaction process between the electronic device 10 and the electronic device 20 in the above embodiments of the present application can be implemented.

The processor 501 in the OTT box 50 may receive an operation instruction of the second object (student or student user) and send the operation instruction to the central control device 51. The operation instruction can be an instruction for opening an application program, an instruction for closing the application program, an instruction for receiving and establishing a classroom, an instruction for refusing to establish the classroom, a mode control instruction, a voice control instruction and the like.

The OTT box 50 may further receive an audio sent by the central control device 51 (an audio obtained by the central control device 51 from the cloud server 30), and transmit the audio to an external audio playing device connected to the electronic device 20 through the audio interface 502 and the connection line for playing. The OTT box 50 may also receive a video sent by the central control device 51 (a video obtained by the central control device 51 from the cloud server 30), and transmit the video to an external display connected to the electronic device 20 through the video interface 503 and a connection line for displaying.

The central control device 51 may receive an operation instruction transmitted from the OTT box 50, and obtain video and audio from the cloud server 30 according to the instruction. The central control device 51 uploads the acquired video or audio (video captured by an external camera device connected to the electronic device 20, audio captured by an external audio capture device connected to the electronic device 20) to the cloud server 30 for the electronic device 10 to call for display. The central control device 51 may recognize gesture information of each frame of image in a video acquired by an external camera device connected to the electronic device 20, that is, may perform gesture recognition on a video stream acquired by the external camera device connected to the electronic device 20, determine a corresponding instruction according to the recognized gesture information, for example, a control sound start instruction, and execute an operation corresponding to the recognized instruction, for example, upload the acquired audio of the second object to the cloud server 30, so that the cloud server 30 sends the audio to the electronic device 10.

In one possible implementation, the OTT box 50 includes a processor 501, an audio interface 502 connected to an external audio playback device, and a video interface 503 connected to an external display. The OTT box 50 may also include a communication unit if the processor 501 does not have the capability to send and receive information.

The central control device 51 includes a processor 511 and a communication unit 512. The processor 511 may control an external camera device connected to the electronic device 20 to capture a video of the second object (student), control an external audio capture device connected to the electronic device 20 to capture an audio of the second object (student), control the communication unit 512 to upload the video and the audio of the student to the cloud server 30, and send the audio and the video of the second object to the electronic device 10 through the cloud server 30. The processor 511 may perform an operation corresponding to the target instruction sent by the electronic device 10, recognize gesture information (gesture) of each frame of image in the second object (student) video, determine an instruction corresponding to the recognized gesture information, and perform an operation corresponding to the instruction.

In one possible implementation, the processor 511 controls the communication unit 512 to upload the identified instruction to the cloud server 30, and sends the instruction to the other electronic devices 20 through the cloud server 30, so that the electronic devices 20 execute operations corresponding to the instruction, for example, the other electronic devices 20 are controlled to display a "clapping" icon, a "like" icon, or play a "clapping" sound effect, a "cheering" sound effect, and the like, so as to realize interaction among students, assist the online classroom approaching the teaching atmosphere of the offline classroom, and improve the experience of users such as teachers and students.

In a scenario where a teacher uses the electronic device 10 and at least one student uses the electronic device 20 to perform online teaching, the teacher triggers an online teaching application (e.g., "interactive classroom" APP) running in the OTT box 40, sets the electronic device 10 to "teacher mode", and selects the student to establish an online classroom (which may be a video call). The student triggers the online teaching application in the OTT box 50 to run, sets the electronic device 20 to "student mode", and waits for the teacher to set up a classroom instruction.

The central control device 41 receives the building classroom instruction transmitted from the OTT box 40 and transmits the building classroom instruction to the cloud server 30, the cloud server 30 transmits the building classroom instruction to the central control device 51 of the student to notify the student to prepare to enter an online classroom, and the central control device 41 controls to close the audio channel of the student (control the central control device 51 of the student to stop uploading the collected student audio to the cloud server 30, or control the central control device 51 of the student to stop collecting the student audio).

The central control device 41 controls the central control device 51 at the student end to upload videos and audios of the students to the cloud server 30, and after receiving the audios and videos uploaded by the central control device 51 at the student end, the cloud server 30 transmits the videos and audios back to the central control device 41 at the teacher end and displays the videos and audios through the connected display screen.

In one possible embodiment, when the online classroom is successfully established, the central control device 41 controls all students to be in the mute mode. The center control device 41 can recognize the gestures and voices of the teacher in real time. And according to the gesture information and/or the object identification of the recognition result, determining a target instruction for controlling the central control device 51 at the student end to execute the operation corresponding to the target instruction.

If a teacher wants a student to speak and answer a question during the course of giving lessons, the teacher can first make a gesture of turning on sound and send out a voice instruction to ask student 1 to answer the question. At this time, after recognizing the gesture information of the "open sound", the central control device 41 recognizes an object identifier "student 1" (for example, name information of the student) in the audio within a preset time before or after the gesture is sent, and sends a target instruction (for example, a control sound opening instruction) to the central control device 51 of the student end corresponding to the object identifier, and the central control device 51 of the student end opens an audio channel (uploads the collected audio of the student to the cloud server 30, or starts to collect and upload the audio of the student to the cloud server 10). The central control device 51 at the student end may recognize that a period of time (e.g., 5 seconds) after the student finishes answering, and automatically switch to the mute mode (stop uploading the collected student audio to the cloud server 30, or stop collecting the student audio).

It should be noted that the voice command provided in the present application is not limited to "please ask XXX to answer the question," but may also be "please say" XXX "and" answer "XXX children, that is, the specific form of the voice command provided in the present application is not specifically limited, and is only used for illustration.

If a student a is ready to speak actively in the course of teaching by the teacher, after the student a makes a gesture of "holding hands to speak", the central control device 51 at the student end recognizes the gesture information and determines a corresponding instruction, and executes an operation corresponding to the instruction, that is, opens an audio channel at the student end, for example, uploading the collected audio of the student a to the cloud server 30, or starting to collect and upload the audio of the student a to the cloud server 10, and sending the audio of the student a to the central control device 41 of the teacher and the central control devices 51 at other student ends for listening through the cloud server 30. After the student a finishes answering for a preset time (for example, 5s), the central control device 51 of the student a automatically switches to the mute mode (stop uploading the collected audio of the student a to the cloud server 30, or stop collecting and displaying the audio of the student a.

If in the course of giving lessons to the teacher, the teacher can use the pre-configured gestures to "like" or "raise" the students who answer correctly. The central control device 41 recognizes that the collected teacher video contains gesture information, determines a target instruction corresponding to the gesture information, recognizes that the teacher audio contains student identification, sends the target instruction to the central control device 51 corresponding to the student identification directly or through the cloud server 30, and the central control device 51 at the student end receives the target instruction and executes an operation corresponding to the target instruction, if the target instruction is an image display instruction, determines an icon corresponding to the image display instruction and displays the icon through the display. The teacher can show the effect of the 'praise' icon to the students in the display of the appointed students through 'praise' and other gestures, convenience is provided for teaching interaction of the teacher and the students, and the teaching atmosphere is created by the teacher and the students in an auxiliary mode.

In a possible implementation manner, the teacher end and the student end upload the collected local video and audio to the cloud server 30 and send the video and audio to the opposite end, and the central control device 41 of the teacher end acquires the audio and video of the student end from the cloud server 30 and performs channel separation on the acquired audio and video. The processor 411 of the central control device 41 synthesizes the multiple paths of video data after channel separation, compresses and splices multiple frames of images into one frame of data, and sends the frame of data to the OTT box 40 for display through the display. The teacher can observe the classroom performance of a plurality of students in the display.

In a possible implementation manner, the central control device provided by the present application may be a Jetson Xavier AGX embedded hardware device, an existing audio/video codec module is provided in the device, the computational power reaches 30Tops, and the device is small in size and low in power consumption, and can meet the requirements of real-time processing and merging of audio/video streams.

The OTT box provided by the application can be used for deploying application programs applied to an android system.

The electronic device 10 and the electronic device 20 provided by the present application have video data in RSTP format.

The application provides an OTT box and well accuse equipment between through socket signal transmission operating instruction, through MIPI or UVC agreement transmission audio frequency, video.

In an actual application scenario, the processor in the electronic device or the central control device provided by the application can also reconstruct a neural network for recognizing gesture information of each frame of image in a video and object identification in an audio by using the TensorRT, so as to optimize network parameters and data accuracy, accelerate inference speed of the network, and improve processing speed. Generally, the construction of the TensorRT network engine requires a lot of time, so after the construction is successful, the network engine is stored in a document in a txt format in a serialization way. When gesture information recognition and object identification recognition are performed, the stored TensorRT network engine is deserialized from the txt format document and then used.

When the processor in the electronic device or the central control device identifies the gesture information of each frame of image in the video acquired by the local terminal, OpenCV can be used to acquire each frame of image in the video acquired by the local terminal in real time to perform face detection (to detect whether the image contains a teacher object or a student corresponding to the electronic device).

And carrying out image preprocessing operations such as image scaling, clipping, floating point, mean value subtraction and the like on the acquired image, and taking the preprocessed image as an original image for face detection. The method comprises the steps of determining that an image contains human faces through a human face detection network in a TensorRT network engine, and storing the number of the detected human faces, human face image data (coordinate data of a human face frame) and an original image containing the human faces in a memory. And if the image is detected not to contain the face, storing the original image in a memory, and recording the corresponding face number identifier as 0. And recording the face number identification, the face image data and the original image data in the memory.

One possible form is shown in table 2:

TABLE 2

Face number identification	Face image data	Raw image data
			2	The position coordinates of the face frame 1; face frame 2 position coordinates	Image identification 1
0	0	Image identification 2

When gesture information is recognized, the processor reads the face number identification from the memory, if the face number identification is larger than 0, the frame image is identified to detect face data, and face image data and original image data of the frame image are read.

And clipping the corresponding original image according to the read face image data (the coordinates of the face frame) so as to enable the clipped image to comprise a face image and a gesture image. And if the face quantity identification is larger than 1, respectively reading the position coordinates of each face frame, and respectively cutting on the basis of the original image to obtain a plurality of cut images, wherein each cut image comprises a face image and a gesture image.

And inputting the cut image into a gesture recognition network for gesture detection. If a preconfigured gesture (effective gesture) is detected, the gesture flag is recorded as 1, and the detection result of the detected preconfigured gesture is recorded. If no valid gesture is detected, the gesture flag is recorded as 0. One possible form is shown in table 3:

TABLE 3

Gesture flag bit	Gesture information	Face image data	Raw image data
				1	Gesture 3; null	A face frame 1; face frame 2	Image identification 1
0			Image identification 3

And inputting the cut image into a face recognition network in a TensorRT network engine for face matching, and matching the cut image with a face image (target face image) of a preset target object (a teacher object or a student object). If the clipped image corresponding to the face frame n is matched with the target face image, reading gesture information (gesture identification or gesture type) corresponding to the face frame n from the memory, and recording the read gesture information such as the gesture m as gesture information of the target object.

In an actual application scenario, the gesture information corresponding to the face frame n may be null, which indicates that an effective gesture is not recognized in the clipped image corresponding to the face frame n, and it may be determined that the target object does not perform an action triggered by the gesture. The gesture information corresponding to the face frame n may be a gesture m, which indicates that the clipped image corresponding to the face frame n contains an effective gesture, and the gesture information is the gesture m, so that the action of the gesture trigger instruction corresponding to the gesture m of the target object can be determined. The number of the gesture information corresponding to the face box n is multiple, for example, gesture 2 and gesture 3. The gesture information corresponding to the position of the gesture box closest to the face can be selected as the gesture information of the target object, and the gesture information of the target object determined in the previous frame can also be used as the gesture information of the target object in the current frame.

In a possible implementation manner, the processor may implement the above-described process of recognizing gesture information of each frame of image in the video acquired by the local terminal through two processes. For example, the process 1 performs processes of acquiring images, detecting faces, recognizing faces, and making decisions (determining gesture information of a target object) in a local video. Process 2 performs a process of gesture recognition. Wherein process 1 and process 2 can write data to and read data from memory. The method and the device realize the acquisition of the face image data distinguished by other processes and the recognized gesture information.

It should be noted that, in the data interaction process of the electronic device 10 and the electronic device 20 provided in the present application, the gesture information for identifying each frame of image in the video acquired by the local terminal may use the above gesture information determination process, or may use other face recognition algorithms, face detection algorithms, and gesture recognition algorithms.

In an interaction scenario between the electronic device 10 and the electronic device 20, fig. 6 exemplarily shows a schematic flowchart of a data interaction method provided by the present application, and is applied to the electronic device 10, as shown in fig. 6, the method includes:

step S601, acquiring a video and an audio of the first object, sending the video and the audio of the first object to at least one second device, and receiving a video of a second object corresponding to the second device, acquired by the at least one second device.

In specific implementation, the electronic device at the teacher end may acquire or control the external device connected to the electronic device to acquire the audio and video of the local end, that is, acquire the video and audio of the first object (teacher) using the electronic device 10, and send the video and audio of the teacher to the electronic device (second device) at the student end. The teacher's side electronic device may also receive the student's side captured video of the second object (student). The video of the second object corresponding to the second device may also be displayed through a display of the electronic device 10 or an external display connected to the electronic device 10. So that the teacher can observe the classroom state of the students. If the online classroom contains a plurality of students, the electronic equipment at the teacher end can perform video synthesis processing on the received videos of the plurality of students, and display the videos of the plurality of students through a display screen or display each frame of image of the video after the synthesis processing through one video channel. The student end can receive the video and the audio sent by the teacher end, and the basic function of teaching in an online classroom is achieved.

Step S602, identify gesture information of each frame of image in the acquired video of the first object, and determine a target instruction corresponding to the identified gesture information.

During specific implementation, in order to simplify the operation process of a teacher in the process of giving lessons in an online classroom, the teacher can initiate a control command through gesture actions, and does not need to click a button and the like to interact with students. The teacher end identifies the gesture information of each frame of image in the collected teacher video, and can determine a target instruction corresponding to the identified gesture information according to the corresponding relation between the gesture information and the instruction configured in advance.

And if the target instruction corresponding to the recognized gesture information is a voice starting control instruction, controlling the voice starting instruction so that the target equipment sends the collected audio of the second object corresponding to the target equipment.

In specific implementation, if the teacher makes an OK gesture, the teacher may trigger selection of an appointed student to answer the question, and may determine, through the object identifier included in the audio of the voice recognition first object, that the second device corresponding to the object identifier is the target device, where a time length between the acquisition time of the audio and the acquisition time of the image to which the recognized gesture information belongs is less than a preset time length.

The audio of the teacher may include a sentence similar to "please the student 1 to answer the question", which includes an object identifier, where the object identifier may be a name of the student or an identifier of the student, such as a student number, that has a corresponding relationship with the student. The process of identifying whether the audio contains the object identifier may occur within a preset time period before the gesture information of the teacher is identified, or may occur within a preset time period after the gesture information of the teacher is identified. In other words, the processes of recognizing gestures and recognizing voice in the data interaction method provided by the application can be performed in real time, synchronously, and asynchronously within a preset time duration.

The teacher end sends a control sound opening instruction to the student end or the appointed student end directly or indirectly through the cloud server, controls or instructs the student end to send the audio frequency of the student to the teacher end, and the teacher-student question-answer interaction process is achieved.

And if the target instruction corresponding to the recognized gesture information is a control sound closing instruction, sending the control sound closing instruction to enable the second equipment to stop sending the collected audio of the second object corresponding to the second equipment.

The teacher end sends a control sound closing instruction to the student end or the appointed student end directly or indirectly through the cloud server, controls or instructs the student end to stop sending the audio of the student to the teacher end, or stops collecting the audio of the student, so that the student is muted, the teacher-student question-answer interaction process is finished, the control sound closing instruction is sent to all the student ends, and the process of classroom discipline management is realized.

And if the target instruction corresponding to the recognized gesture information is an image display instruction, sending the image display instruction so that the target equipment determines and displays an icon corresponding to the image display instruction.

The teacher end sends the image show instruction to appointed student end through direct or indirectly through the high in the clouds server, and control or instruction student end show appointed icon in the show interface, for example, icons such as "clapping" icon, "flower" icon, supplementary teacher appraises the interaction of feedback to the student, realizes that teacher and student interactive in-process teacher appraises the student.

And if the target instruction corresponding to the recognized gesture information is a sound effect playing instruction, sending the sound effect playing instruction so that the target equipment determines and plays the audio corresponding to the sound effect playing instruction.

The teacher end sends the audio playing instruction to appointed student end through direct or indirect through the high in the clouds server, controls or instructs student end to play appointed audio, for example, "cheering" audio, "applause" audio etc. also can show appointed icon when playing appointed audio, realizes assisting the teacher to evaluate the feedback to the student, perhaps builds online teaching classroom atmosphere.

Step S603, sending a target instruction, where the target instruction is used to instruct a target device in the second device to execute an operation corresponding to the target instruction.

During specific implementation, the teacher end directly sends the target instruction to the student end or the student end through the cloud server, so that the student end executes the operation corresponding to the target instruction, and the target instruction is generated by the teacher through gesture trigger, thereby simplifying the operation process of teacher and student interaction, shortening the processing time of teacher and student interaction, and improving the interaction efficiency.

In an interaction scenario between the electronic device 10 and the electronic device 20, fig. 7 exemplarily shows a schematic flowchart of a data interaction method provided by the present application, which is applied to the electronic device 20, and as shown in fig. 7, the method includes:

step S701, acquiring a video of the second object, sending the video of the second object to the first device, and receiving the video and the audio of the first object acquired by the first device.

In specific implementation, the electronic device at the student side acquires or controls the external device connected to the electronic device to acquire the audio and video of the student side, that is, the video of the first object (student) using the electronic device 20, and sends the video of the student to the electronic device at the teacher side (first device). The electronic device on the student side may also receive a video of the first object (teacher) on the teacher side. And presents the teacher's video. So that students can listen to the teaching content of the teacher. The electronic equipment at the teacher end can receive the student videos sent by the student ends, so that the teacher can observe the classroom states of the students, and the basic function of teaching in an online classroom is realized.

Step S702, receiving a target instruction sent by the first device, and executing an operation corresponding to the instruction, where the target instruction is determined by the first device based on the gesture information recognized by the acquired video of the first object.

The received target instruction is determined according to the gesture information of the teacher video, which is identified and acquired by the teacher end, and the student end executes the operation corresponding to the target instruction, so that the teacher controls the student end, the operation process of controlling the student end by the teacher is simplified, and the interaction efficiency of the teacher and the student is improved.

If the target instruction is a control sound starting instruction, the student end collects audio of students and uploads the audio of the students to the cloud server, so that the teacher end obtains the audio of the students.

And if the target instruction is a control sound closing instruction, the student end stops collecting the audio of the student.

And if the target instruction is an image display instruction, the student end determines and controls to display an icon corresponding to the image display instruction according to the image display instruction.

If the target instruction is a sound effect playing instruction, the student end determines and controls to play the audio corresponding to the audio playing instruction according to the audio playing instruction.

And the student end executes the operation process corresponding to the instruction, such as sending student audio to the teacher end to realize the question-answer interaction process of the teacher and the student. And if the audio of the student end is stopped being sent, the teacher end does not acquire the audio of the student any more, and the process of question-answer interaction or classroom discipline management of the teacher and the student is finished. And if the student end determines and displays the icon corresponding to the image display instruction, the evaluation of the teacher to the student in the teacher-student interaction process is received. And if the student end determines and plays the audio corresponding to the audio playing instruction, the evaluation of the teacher to the student in the teacher-student interaction process is received. The operation corresponding to the instruction is executed, and the interaction quality and the interaction efficiency of teachers and students can be improved.

In some exemplary embodiments, the student side may further recognize gesture information of each frame of image in the acquired video of the second object. And if the recognized gesture information is determined to be the preset gesture information, acquiring the audio of the second object, and sending the audio of the second object to the first equipment.

During specific implementation, the students can send audio to the teacher end by making preset gesture information, so that interaction between the students and the teacher is simplified, for example, an interaction process of asking questions to the teacher by the students is realized.

The embodiment of the application also provides an application program, and the process of data interaction between the teacher end and the student end can be implemented through the application program. FIG. 8 is a flow chart illustrating data interaction in the building of an online classroom scenario.

In step S801, the teacher obtains an instruction.

In specific implementation, a teacher or a student performs mode selection after starting an application program, and can select a student mode and a teacher mode. The teacher user may select "teacher mode" and the student user may select "student mode".

In step S802, the teacher end determines that the instruction is a group classroom instruction, and the user selects a teacher mode.

When the method is specifically implemented, whether the mode selected by the user is the teacher mode is detected, and the teacher can initiate to build a classroom through the teacher mode. If the obtained instruction is a classroom building instruction and the user is in a teacher mode, the user can be determined to be a teacher user.

And step S803, the teacher end determines the target equipment according to the classroom establishing instruction.

In particular implementations, the teacher user may select one or more students to participate in an online classroom. The central control device 41 of the teacher end may determine the target device according to the configuration classroom instruction sent by the OTT box 40 of the teacher end, and send the configuration classroom instruction and the target device to the cloud server 30, so that the cloud server 30 sends the configuration classroom instruction to the target device.

And step S804, the student side waits for the instruction sent by the teacher side.

In specific implementation, after the student end starts the application program and selects the student mode, the student end waits for a classroom building instruction sent by the teacher end.

And step S805, the teacher end collects teacher audio and video.

In specific implementation, the central control device 41 at the teacher end can control the camera device and the audio acquisition device to acquire the video and audio of the teacher.

And step S806, the teacher sends the collected audio and video and the classroom establishing instruction to the student.

In specific implementation, the teacher side uploads the acquired or collected audio and video to the cloud server 30 for the student side device to acquire, and also uploads the group classroom instruction to the cloud server 30, so that the cloud server 30 sends the group classroom instruction to the target device corresponding to the group classroom instruction.

In step S807, the student side determines whether the classroom instruction setup response is classroom joining, if so, step S808 is executed, otherwise, step S809 is executed.

In specific implementation, after receiving a classroom building instruction, a student displays an interface for selecting or refusing to join a classroom to the student, and triggers an instruction through a displayed 'agree' button or 'disagree' button.

And step S808, the student end collects student videos and sends the student videos to the teacher end.

During specific implementation, the student side detects that the student clicks an instruction triggered by the 'consent' button, acquires or collects a local video, that is, a video of the student, and uploads the local video to the cloud server 30to be acquired by the teacher side. A default full member silent mode may be configured such that after the student joins the classroom, the audio of the student is not uploaded to the cloud server 30 or is not collected.

And step S809, the student end sends an instruction of refusing to join the classroom.

When the system is specifically implemented, the student end detects that the student clicks the 'disagreement' button to trigger the instruction for refusing to join the classroom, uploads the instruction to the cloud server 30, and sends the instruction to the teacher end through the cloud server 30.

And step S810, the teacher judges whether the classroom is successfully established, if so, the next step is executed to step S810, and if not, the next step is executed to step S811.

In specific implementation, the teacher end receives all instructions for refusing to join the classroom sent by the target equipment, and determines that the classroom building fails, otherwise, determines that the classroom building succeeds.

And step S811, the teacher end acquires the video sent by the student end and adjusts the display window.

In specific implementation, the teacher end obtains the video uploaded by the student end who agrees to join the classroom from the cloud server 30, and displays the video through the display interface of the teacher end. The display window can be adjusted during display, and videos uploaded by a preset number of or all student terminals are displayed.

And step S812, finishing the classroom building.

When the video connection method is specifically implemented, all the target devices send instructions for refusing to join the classroom to the teacher end, and the teacher end cannot be in video connection with students corresponding to the target devices.

The process of data interaction between the teacher end and the student end can be implemented through the application program. FIG. 9 is a flow chart illustrating data interaction in a teaching scene in an online classroom.

In step S901, the teacher obtains a video of the student.

In specific implementation, after the teacher end and the student ends successfully establish the classroom, the teacher end pulls the student videos uploaded to the cloud server 30 from the cloud server in real time, and displays the student videos, so that the teacher can observe the classroom state of the students conveniently.

And step S902, the student side acquires the video and audio uploaded by the teacher side.

In specific implementation, after the teacher end and the student end successfully establish the classroom, the student end pulls the teacher video and audio uploaded to the cloud server 30 from the cloud server in real time, and plays the audio and video of the teacher, so that the student can listen to the teaching content of the teacher.

And step S903, the teacher end identifies the gesture information in each frame of image of the acquired teacher video.

When the teaching method is specifically implemented, a teacher can trigger interaction with students through gestures in the teaching process. And the teacher end identifies whether preset gestures exist in each frame of acquired teacher video images in real time and determines gesture information corresponding to the recognized gestures.

In step S904, the teacher determines that the recognized gesture information corresponds to the image display instruction, if yes, step S905 is executed, otherwise, step S907 is executed.

In specific implementation, if the gesture initiated by the teacher is a gesture such as "heart compare", "like", "clap", and the like, according to gesture information (gesture identification or gesture type) corresponding to the gesture, it is determined that the instruction corresponding to the gesture information is an image display instruction.

In step S905, the teacher end sends an image display instruction to the designated student end.

During specific implementation, the teacher end can directly send the image display instruction to the appointed student end or upload the image display instruction to the cloud server 30, and the image display instruction is sent to the appointed student end through the cloud server. The appointed student end can determine the equipment corresponding to the identified object identification as the appointed student end by identifying the voice of the teacher end, and can also obtain the equipment corresponding to the button selected by the teacher in the display interface as the appointed student end by the teacher end.

And step S906, the student end controls the display of the corresponding icon according to the received image display instruction.

In specific implementation, the student end receives the image display instruction, and displays icons corresponding to the image display instruction, such as icons of "heart-to-heart", "like", "applause", and the like, in the display interface.

In step S907, the teacher determines that the recognized gesture information corresponds to the control sound opening instruction, if yes, step S908 is executed, otherwise, step S911 is executed.

In specific implementation, if the gesture initiated by the teacher is an "OK" gesture, determining that the corresponding instruction is a control sound starting instruction according to gesture information corresponding to the gesture.

Step S908, the teacher end identifies the object identifier in the teacher audio, and determines that the device corresponding to the identified object identifier is the designated student end device.

In specific implementation, the teacher end identifies whether the audio with the duration less than the preset duration between the audio acquisition time and the image acquisition time of the gesture information contains the object identifier or not according to the control sound starting instruction, and determines the equipment corresponding to the identified object identifier as the appointed student end.

In step S909, the teacher 'S end transmits a control sound on instruction to the designated student' S end.

During specific implementation, the teacher end can directly send a voice starting instruction to the student end, and also can upload a voice starting control instruction to the cloud server 30, and send a voice starting control instruction to the appointed student end through the cloud server 30.

In step S910, the student end sends the audio of the student according to the received control sound opening instruction.

During specific implementation, the student end starts to acquire or collect student audio of the student end and uploads the student audio to the cloud server 30 according to the received control sound starting instruction, or starts to send the student audio acquired or collected by the student end to the cloud server 30, so that the teacher end can acquire the audio of the student from the cloud server. The function of receiving the answer of the student at the teacher end is realized.

Whether the student end can discern whether to contain in the local student video and predetermine gesture information, predetermine the control sound that gesture information corresponds and open the instruction, open the instruction according to this control sound, can begin to acquire or gather student's audio frequency of student end and upload high in the clouds server 30, perhaps begin to send student's audio frequency of student end to high in the clouds server 30to make teacher end can follow the audio frequency that acquires this student on the high in the clouds server. The function of asking questions to teachers by students is realized.

And step S911, the teacher end sends a command for controlling the sound to be closed to all the student ends.

In specific implementation, if the gesture initiated by the teacher is a 'mute' gesture, determining that the corresponding instruction is a sound control closing instruction according to gesture information corresponding to the gesture. The teacher end uploads a sound closing control instruction to the cloud server 30, and sends the sound closing control instruction to the specified student end through the cloud server 30.

In step S912, the student end stops sending the student audio according to the received control sound closing instruction.

During specific implementation, the student end can stop acquiring or collecting the student audio of the student end according to the received control sound closing instruction, or stop sending the student audio collected by the student end to the cloud server 30, so that the mute state of all the student ends except in a classroom is realized.

In an interaction scenario between the electronic device 10 and the electronic device 20, fig. 10 exemplarily shows a gesture recognition method that can be performed to recognize gesture information of each frame of image in a video in implementing a data interaction method provided by the present application, and is applied to the electronic device 10 and the electronic device 20, as shown in fig. 10, the method includes:

in step S1001, a video of a target object is acquired.

Step S1002, matching a target image containing a face in each frame image of the video with a face image corresponding to a preset target object.

In one possible implementation, the gesture information included in the gesture image is determined by using a pre-trained gesture recognition neural network model, where the gesture recognition neural network model is generated by using a plurality of pre-collected gesture image samples, each gesture image sample includes gesture information as an input, and each gesture image sample includes gesture information as an output for training.

In one possible embodiment, if the target image only contains a single hand, the gesture information of the single hand image is determined.

In one possible implementation, if the target image includes a plurality of hands, the gesture information of the hand image in the target image, which is closest to the position of the face image matched with the target face image, is determined.

Determining the number of hands contained in the target image, determining gesture information of the target object, and if only one hand exists in the target image, determining the pickup information of the hand image as the gesture information of the target object. If a plurality of hands exist in the target image, the hand information of the hand image with the position closest to the target face image is determined as the gesture information of the target object, so that the accuracy of gesture information identification of the target object is improved, and the hand information of other objects is prevented from being identified as the gesture information of the target object.

And step S1003, recognizing gesture information in the target image matched with the target face image.

The method comprises the steps of identifying each frame of image in the video of the target object, matching the face in the image with the face image corresponding to the target object, identifying gesture information in the target image matched with the face image corresponding to the target object, improving the accuracy of identifying the gesture information of the target object and improving the interaction quality of teachers and students.

The implementation of the gesture recognition method provided by the embodiment of the present application is the same as the process of recognizing the gesture information of each frame of image in the video provided in the above embodiment, and is not repeated here.

Fig. 11 is a schematic structural diagram illustrating a data interaction device, and as shown in fig. 11, the device includes:

an obtaining unit 1101, configured to obtain video and audio of a first object, send the video and audio of the first object to at least one second device, and receive video of a second object corresponding to the second device, which is obtained by the at least one second device; and

the processing unit 1102 is configured to identify gesture information of each frame of image in the acquired video of the first object, and determine a target instruction corresponding to the identified gesture information;

a sending unit 1103, configured to send a target instruction, where the target instruction is used to instruct a target device in the second device to perform an operation corresponding to the target instruction.

In some exemplary embodiments, the processing unit 1102 is further configured to:

In some exemplary embodiments, the sending unit 1103 is specifically configured to:

Fig. 12 is a schematic structural diagram illustrating a data interaction device, and as shown in fig. 12, the device includes:

the audio/video processing unit 1201 is configured to acquire a video of a second object, send the video of the second object to the first device, and receive the video and the audio of the first object acquired by the first device;

the instruction processing unit 1202 is configured to receive a target instruction sent by a first device, and execute an operation corresponding to the instruction;

In some exemplary embodiments, the av processing unit 1201 is further configured to:

In some exemplary embodiments, the instruction processing unit 1202 is specifically configured to:

Fig. 13 is a schematic structural diagram illustrating a gesture recognition apparatus, and as shown in fig. 13, the apparatus includes:

an obtaining unit 1301, configured to obtain a video of a target object;

a matching unit 1302, configured to match a target image containing a face in each frame image of the video with a face image corresponding to a preset target object;

and the processing unit 1303 is used for recognizing the gesture information in the target image matched with the target face image.

In some exemplary embodiments, the processing unit 1303 is specifically configured to:

In addition, in an exemplary embodiment, the present application further provides a storage medium, and when instructions in the storage medium are executed by a processor of the electronic device, the electronic device is enabled to implement the data interaction method in the embodiments of the present disclosure.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A data interaction method is applied to a first device, and the method comprises the following steps:

the method comprises the steps of obtaining video and audio of a first object, sending the video and audio of the first object to at least one second device, and receiving the video of a second object corresponding to the second device, which is obtained by the at least one second device; and

and sending the target instruction, wherein the target instruction is used for instructing a target device in the second device to execute an operation corresponding to the target instruction.

2. The method of claim 1, wherein the sending the target instruction is configured to instruct a target device in the second devices to perform an operation corresponding to the target instruction, and further comprising:

identifying an object identifier contained in the audio of the first object, and determining that a second device corresponding to the object identifier is the target device, wherein the duration between the acquisition time of the audio and the acquisition time of the identified image to which the gesture information belongs is less than a preset duration.

3. The method according to claim 1, wherein the sending the target instruction, the target instruction being used to instruct a target device in the second devices to perform an operation corresponding to the target instruction, includes:

if the target instruction is a control sound starting instruction, sending the control sound starting instruction so that the target equipment sends the acquired audio of a second object corresponding to the target equipment; or

And if the target instruction is a sound effect playing instruction, sending the sound effect playing instruction so as to enable the target equipment to determine and control the playing of the audio corresponding to the sound effect playing instruction.

4. The method according to any one of claims 1-3, further comprising:

and controlling to display at least one video of a second object corresponding to the second equipment acquired by the second equipment.

5. A data interaction method is applied to a second device, and comprises the following steps:

receiving a target instruction sent by the first equipment, and executing an operation corresponding to the target instruction;

6. The method of claim 5, further comprising:

and if the recognized gesture information is determined to be preset gesture information, acquiring the audio of the second object, and sending the audio of the second object to the first equipment.

7. The method according to claim 5, wherein the receiving a target instruction sent by the first device, and performing an operation corresponding to the target instruction comprises:

8. The method according to any one of claims 5-7, further comprising:

9. An electronic device, characterized in that the device comprises: a memory, a processor;

the memory is for storing a computer program or instructions;

the processor is configured to execute the computer program or instructions in the memory to implement the following processes:

10. An electronic device, characterized in that the device comprises: a memory, a processor;

the memory is for storing a computer program or instructions;