CN116633909B

CN116633909B - Conference management method and system based on artificial intelligence

Info

Publication number: CN116633909B
Application number: CN202310875230.1A
Authority: CN
Inventors: 李源
Original assignee: Fujian Yizhaoguang Intelligent Equipment Co ltd
Current assignee: Fujian Yizhaoguang Intelligent Equipment Co ltd
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-12-19
Anticipated expiration: 2043-07-17
Also published as: CN116633909A

Abstract

The conference management method and system based on artificial intelligence provided by the invention comprise the steps of determining the voice interaction degree of the spoken voice and the plurality of non-spoken users in a period of time of a speaking user by using a voice interaction degree model, determining the video interaction degree of the shared video and the plurality of non-spoken users in a period of time of the speaking user by using a video interaction degree model based on the shared video and conference information in a period of time of the speaking user, determining the association degree of the speaking user and the plurality of non-spoken users based on the voice interaction degree of the speech and the shared video and the plurality of non-spoken users in a period of time of the speaking user, determining the plurality of to-be-muted users based on the association degree of the speaking user and the plurality of non-spoken users, and carrying out mute processing on the plurality of to-be-muted users.

Description

Conference management method and system based on artificial intelligence

Technical Field

The invention relates to the technical field of conference management, in particular to a conference management method and system based on artificial intelligence.

Background

With the development of technology, more and more users choose to conduct teleconferencing through mobile applications. In the teleconference process, as the number of participants is numerous, if the microphones of all participants are opened, larger noise interference can exist to influence the progress of the conference, so that a manager can close the microphones of the participants who do not speak in the conference, thereby preventing noise in the environment from being transmitted to the conference to further interfere the conference.

Therefore, how to quickly manage the speaking right of the participants and improve the user experience are the problems to be solved currently.

Disclosure of Invention

The invention mainly solves the technical problem of how to rapidly manage the speaking right of the participants and improve the user experience.

According to a first aspect, the present invention provides an artificial intelligence based conference management method, including: acquiring conference information; acquiring the spoken voice of the speaking user in a period of time and the shared video of the speaking user in a period of time; determining the voice interaction degree of the spoken voices of the speaking user and a plurality of non-speaking users in a period of time by using a voice interaction degree model based on the spoken voices of the speaking user in the period of time and the conference information; determining the video interaction degree of the shared video of the speaking user in a period of time and the plurality of non-speaking users by using a video interaction degree model based on the shared video of the speaking user in a period of time and the conference information; determining the association degree of the speaking user and a plurality of non-speaking users based on the voice interaction degree of the speaking voice and the plurality of non-speaking users in a period of time of the speaking user and the video interaction degree of the shared video and the plurality of non-speaking users in a period of time of the speaking user; and determining a plurality of users to be muted based on the association degree of the speaking user and the plurality of non-speaking users, and carrying out mute processing on the plurality of users to be muted.

Still further, the method further comprises: performing voice recognition on the spoken voice of the speaking user to obtain voice recognition characters; judging whether the voice recognition text comprises the name of the user who does not speak; and if the voice recognition text comprises the name of the non-speaking user, the non-speaking user corresponding to the name of the non-speaking user is unmuted.

Further, the voice interaction degree model is a long-short period neural network model, the input of the voice interaction degree model is the speaking voice of the speaking user in a period of time and the conference information, the output of the voice interaction degree model is the voice interaction degree of the speaking voice of the speaking user in a period of time and the voice interaction degree of a plurality of non-speaking users, the video interaction degree model is a long-short period neural network model, the input of the video interaction degree model is the shared video of the speaking user in a period of time and the conference information, and the output of the video interaction degree model is the video interaction degree of the shared video of the speaking user in a period of time and the video interaction degree of the plurality of non-speaking users.

Still further, the determining the association degree of the speaking user with the plurality of non-speaking users based on the voice interaction degree of the speaking voice with the plurality of non-speaking users in the period of time of the speaking user and the video interaction degree of the shared video with the plurality of non-speaking users in the period of time of the speaking user includes: and respectively giving different weights to the voice interaction degree of the spoken voice of the speaking user and the plurality of non-speaking users in a period of time and the video interaction degree of the shared video of the speaking user and the plurality of non-speaking users in a period of time, and then carrying out weighted summation to obtain the association degree of the speaking user and the plurality of non-speaking users.

According to a second aspect, the present invention provides an artificial intelligence based conference management system comprising: the first acquisition module is used for acquiring conference information; the second acquisition module is used for acquiring the spoken voice of the speaking user in a period of time and the shared video of the speaking user in a period of time; the voice determining module is used for determining the voice interaction degree of the spoken voices of the speaking user and a plurality of non-speaking users in a period of time based on the spoken voices of the speaking user in the period of time and the conference information by using a voice interaction degree model; the video determining module is used for determining the video interaction degree of the shared video of the speaking user in a period of time and the plurality of non-speaking users by using a video interaction degree model based on the shared video of the speaking user in a period of time and the conference information; the relevancy determination module is used for determining relevancy between the speaking user and the plurality of non-speaking users based on voice interaction degrees of the speaking voice and the plurality of non-speaking users in a period of time of the speaking user and video interaction degrees of the shared video and the plurality of non-speaking users in a period of time of the speaking user; and the mute module is used for determining a plurality of users to be muted based on the association degree of the speaking user and the plurality of non-speaking users and carrying out mute processing on the plurality of users to be muted.

Still further, the system is further configured to: performing voice recognition on the spoken voice of the speaking user to obtain voice recognition characters; judging whether the voice recognition text comprises the name of the user who does not speak; and if the voice recognition text comprises the name of the non-speaking user, the non-speaking user corresponding to the name of the non-speaking user is unmuted.

Furthermore, the relevance determining module is further configured to assign different weights to the voice interaction degree of the spoken voice of the speaking user and the plurality of non-speaking users in a period of time and the video interaction degree of the shared video of the speaking user and the plurality of non-speaking users in a period of time, and perform weighted summation to obtain the relevance between the speaking user and the plurality of non-speaking users.

According to a third aspect, the present invention provides an electronic device comprising: a memory; a processor; a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the method described above.

According to a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as in any of the above aspects.

Drawings

Fig. 1 is a schematic flow chart of a conference management method based on artificial intelligence according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of managing user mute authorities according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an artificial intelligence-based conference management system according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In an embodiment of the present invention, there is provided an artificial intelligence based conference management method as shown in fig. 1, where the artificial intelligence based conference management method includes steps S1 to S6:

step S1, meeting information is obtained.

Meeting information includes the start time of the meeting, the topic of the meeting, information of participating users, the profile of the meeting, the agenda of the meeting, the material of the introduction to the meeting, etc. The information of the participating users includes user identity, user age, etc.

Step S2, the spoken voice of the speaking user in a period of time and the shared video of the speaking user in a period of time are obtained.

The spoken voice of the speaking user in a period of time represents that voice data in a period of time is obtained by recording the voice of the speaking user when speaking. The spoken voice over a period of time may be 5 seconds, 10 seconds, 30 seconds, etc.

The shared video of the speaking user in a period of time represents that the video picture of the speaking user when the video is shared is recorded, so that the shared video in a period of time is obtained. The shared video over a period of time may be 5 seconds, 10 seconds, 30 seconds, etc. The time of the spoken voice of the speaking user over a period of time and the shared video of the speaking user over a period of time may be the same or different, e.g., the speaking user may typically speak voice while sharing the video.

For speaking users in each conference, the voices and videos of the speaking users need to be recorded and stored continuously during the conference, so that the speaking voices of the speaking users and the shared videos of the speaking users in a period of time can be obtained.

And step S3, determining the voice interaction degree of the spoken voices of the speaking user and a plurality of non-speaking users in a period of time by using a voice interaction degree model based on the spoken voices of the speaking user in a period of time and the conference information.

The voice interaction degree model is a long-term and short-term neural network model. The long-term neural network model is one implementation of artificial intelligence. The Long-Short Term neural network model includes a Long-Short Term neural network (LSTM). The long-term and short-term neural network model can process sequence data with any length, capture sequence information and output results based on the association relationship of front data and rear data in the sequence. The long-short-term neural network model is used for processing the speaking voice in the continuous time period, so that the characteristics of the association relationship among the speaking voice in each time point can be comprehensively considered, and the output characteristics are more accurate and comprehensive. The voice interaction degree model can be obtained by training a training sample through a gradient descent method.

The input of the voice interaction degree model comprises the spoken voices of the speaking users in a period of time and the conference information, and the output of the voice interaction degree model is the voice interaction degree of the spoken voices of the speaking users in a period of time and a plurality of non-speaking users.

The voice interaction degree of the spoken voice of the speaking user and the plurality of non-speaking users in a period of time represents the interaction degree of the spoken voice of the speaking user and the other plurality of non-speaking users. The greater the degree of voice interaction, the greater the likelihood that there is interaction between the speaking voice of the speaking user and the non-speaking user, and the greater the likelihood that the non-speaking user needs to turn on the microphone to communicate with the speaking user. For example, if the spoken voice of the speaking user is "i want to invite xiao Liu to explain PPT material" in the following, the degree of interaction between the spoken voice of the speaking user and xiao Liu in the period of time is greater, and the microphone of xiao Liu needs to be turned on to perform voice communication. For another example, if the spoken voice of the speaking user in a period of time is "each student in the learning group a talks about his own learning experience", the degree of interaction between the spoken voice of the speaking user in a period of time and each student in the learning group a is greater, and then the microphone of each student in the learning group a needs to be turned on to conduct voice communication. The voice interaction degree between the spoken voice of the speaking user and the plurality of non-speaking users in a period of time can be a value between 0 and 1, and the larger the value is, the greater the possibility that interaction exists between the speaking voice of the speaking user and the non-speaking users is indicated.

And step S4, determining the video interaction degree of the shared video of the speaking user in a period of time and the video interaction degree of the plurality of non-speaking users by using a video interaction degree model based on the shared video of the speaking user in a period of time and the conference information.

The video interaction degree model is a long-term and short-term neural network model, the input of the video interaction degree model is shared video of the speaking user in a period of time and the conference information, and the output of the video interaction degree model is the video interaction degree of the shared video of the speaking user in a period of time and the video interaction degree of the plurality of non-speaking users.

The interaction degree of the shared video of the speaking user and the videos of the plurality of non-speaking users in a period of time represents the interaction degree of the shared video of the speaking user and the other plurality of non-speaking users. The greater the degree of video interaction, the greater the likelihood that there is interaction between the shared video of the speaking user and the non-speaking user, and the greater the likelihood that the non-speaking user needs to turn on the microphone to communicate with the speaking user. For example, if the shared video of the speaking user in a period of time is a small piece of shared video data, the interaction degree between the shared video of the speaking user in a period of time and the small piece of video is large, and a small piece of microphone needs to be turned on to perform voice communication. For another example, if the shared video of the speaking user in a period of time is a video clip of three students in the learning group a, the interaction degree between the shared video of the speaking user in a period of time and the video of three students in the learning group a is large, and then the microphones of the three students in the learning group a need to be turned on to perform voice communication. The degree of interaction between the shared video of the speaking user and the videos of the plurality of non-speaking users within a period of time can be a value between 0 and 1, and the larger the value is, the greater the possibility that interaction exists between the shared video of the speaking user and the non-speaking users is indicated.

And S5, determining the association degree of the speaking user and the plurality of non-speaking users based on the voice interaction degree of the speaking voice of the speaking user and the plurality of non-speaking users and the video interaction degree of the shared video of the speaking user and the plurality of non-speaking users.

In some embodiments, different weights may be respectively given to the voice interaction degree of the spoken voice and the plurality of non-speaking users in a period of time of the speaking user and the video interaction degree of the shared video and the plurality of non-speaking users in a period of time of the speaking user, and then the association degree of the speaking user and the plurality of non-speaking users is obtained after weighted summation. For example, the voice interaction degree and the video interaction degree may be respectively given a weight of 0.5, and then weighted and summed to obtain the association degree between the speaking user and the plurality of non-speaking users.

In some embodiments, the association degree between the speaking user and the plurality of non-speaking users may also be determined by a preset comparison relationship between the voice interaction degree between the speaking voice and the plurality of non-speaking users in a period of time of the speaking user and the video interaction degree between the shared video and the plurality of non-speaking users in a period of time of the speaking user.

And S6, determining a plurality of users to be muted based on the association degree of the speaking user and the plurality of users not speaking, and carrying out mute processing on the plurality of users to be muted.

In some embodiments, a relevance threshold may be set, and if the relevance of the non-speaking user is lower than the relevance threshold, the non-speaking user is irrelevant to the discussion subject or the right to speak is lower, and the non-speaking user below the relevance threshold may be muted. The mute processing mode comprises the steps of turning off the microphone and setting the user to be in a mute state.

In some embodiments, the conference remaining time may also be determined by a graph neural network model.

The graph neural network model includes a graph neural network (Graph Neural Network, GNN) and a full connectivity layer. A graph neural network is a neural network that acts directly on a graph, which is a data structure made up of two parts, nodes and edges. The graph neural network is one implementation of artificial intelligence.

The input of the graph neural network model comprises a plurality of nodes and a plurality of edges, the plurality of nodes comprise speaking user nodes and non-speaking user nodes, the plurality of edges are position relations among the plurality of nodes, each node in the plurality of nodes comprises a plurality of node characteristics, the node characteristics of the speaking user nodes comprise speaking voice of a speaking user in a period of time, shared video of the speaking user in a period of time, conference information and positions of the speaking user, the node characteristics of the non-speaking user nodes comprise voice interaction degree of the non-speaking user and the speaking voice of the speaking user in a period of time, video interaction degree of shared video of the non-speaking user and the speaking user in a period of time and positions of the non-speaking user, and the output of the graph neural network model is conference residual time. The job relationships may include upper and lower levels, the same team, different teams, etc. As an example, the conference remaining time may be 20 minutes. Through estimating the residual time of the conference, the user can plan the follow-up work in advance, and the user experience is improved.

In some embodiments, the user mute rights can also be managed by the method shown in fig. 2.

Fig. 2 is a schematic flow chart of managing user mute authorities according to an embodiment of the present invention, and fig. 2 includes steps S21 to S23:

and S21, performing voice recognition on the spoken voice of the speaking user to obtain voice recognition characters.

The speech recognition mode can comprise a speech recognition method based on template matching, a speech recognition method based on a hidden Markov model and a speech recognition method based on deep learning.

The speech recognition text is text obtained by performing speech recognition on the spoken speech.

Step S22, judging whether the voice recognition text comprises the name of the user who does not speak.

For example, if the speech recognition text includes the name of the non-speaking user, it indicates that the speaking user is presented to the non-speaking user, and it indicates that the speaking user may have interaction with the non-speaking user. For example, the speech recognition word is "I am out of the third party below to explain the PPT. "there may be an interaction between the speaking user and the third party.

In step S23, if the speech recognition text includes the name of the non-speaking user, the non-speaking user corresponding to the name of the non-speaking user is unmuted.

And if the voice recognition text is judged to comprise the name of the non-speaking user, the non-speaking user corresponding to the name of the non-speaking user is unmuted.

Based on the same inventive concept, fig. 3 is a schematic diagram of an artificial intelligence-based conference management system according to an embodiment of the present invention, where the artificial intelligence-based conference management system includes:

a first obtaining module 31, configured to obtain conference information;

a second obtaining module 32, configured to obtain the spoken voice of the speaking user during a period of time and the shared video of the speaking user during a period of time;

a voice determining module 33, configured to determine, based on the spoken voices of the speaking user and the conference information, a voice interaction degree model, where the spoken voices of the speaking user and the voices of a plurality of non-speaking users in a period of time;

video determination module 34, configured to determine, based on the shared video of the speaking user over a period of time and the conference information, a video interaction degree model, based on the shared video of the speaking user over a period of time, and based on the video interaction degree model;

a correlation determining module 35, configured to determine a correlation between the speaking user and a plurality of non-speaking users based on a voice interaction degree between the speaking voice and the plurality of non-speaking users in a period of time of the speaking user and a video interaction degree between the shared video and the plurality of non-speaking users in a period of time of the speaking user;

and the muting module 36 is configured to determine a plurality of users to be muted based on the association degree between the speaking user and the plurality of non-speaking users, and mute the plurality of users to be muted.

Based on the same inventive concept, an embodiment of the present invention provides an electronic device, as shown in fig. 4, including:

comprising the following steps: a processor 41; a memory 42; a computer program; wherein the computer program is stored in the memory 42 and configured to be executed by the processor 41 to implement an artificial intelligence based conference management method as provided above, the method comprising: acquiring conference information; acquiring the spoken voice of the speaking user in a period of time and the shared video of the speaking user in a period of time; determining the voice interaction degree of the spoken voices of the speaking user and a plurality of non-speaking users in a period of time by using a voice interaction degree model based on the spoken voices of the speaking user in the period of time and the conference information; determining the video interaction degree of the shared video of the speaking user in a period of time and the plurality of non-speaking users by using a video interaction degree model based on the shared video of the speaking user in a period of time and the conference information; determining the association degree of the speaking user and a plurality of non-speaking users based on the voice interaction degree of the speaking voice and the plurality of non-speaking users in a period of time of the speaking user and the video interaction degree of the shared video and the plurality of non-speaking users in a period of time of the speaking user; and determining a plurality of users to be muted based on the association degree of the speaking user and the plurality of non-speaking users, and carrying out mute processing on the plurality of users to be muted.

Based on the same inventive concept, the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by the processor 41, implements the aforementioned provided artificial intelligence based conference management method, the method comprising obtaining conference information; acquiring the spoken voice of the speaking user in a period of time and the shared video of the speaking user in a period of time; determining the voice interaction degree of the spoken voices of the speaking user and a plurality of non-speaking users in a period of time by using a voice interaction degree model based on the spoken voices of the speaking user in the period of time and the conference information; determining the video interaction degree of the shared video of the speaking user in a period of time and the plurality of non-speaking users by using a video interaction degree model based on the shared video of the speaking user in a period of time and the conference information; determining the association degree of the speaking user and a plurality of non-speaking users based on the voice interaction degree of the speaking voice and the plurality of non-speaking users in a period of time of the speaking user and the video interaction degree of the shared video and the plurality of non-speaking users in a period of time of the speaking user; and determining a plurality of users to be muted based on the association degree of the speaking user and the plurality of non-speaking users, and carrying out mute processing on the plurality of users to be muted.

Claims

1. An artificial intelligence based conference management method, comprising:

acquiring conference information;

acquiring the spoken voice of the speaking user in a period of time and the shared video of the speaking user in a period of time;

determining the voice interaction degree of the spoken voice of the speaking user and the voice of a plurality of non-speaking users in a period of time based on the spoken voice of the speaking user and the conference information by using a voice interaction degree model, wherein the voice interaction degree model is a long-short-period neural network model, the input of the voice interaction degree model is the spoken voice of the speaking user in a period of time and the conference information, and the output of the voice interaction degree model is the voice interaction degree of the spoken voice of the speaking user and the voice of the plurality of non-speaking users in a period of time;

determining the video interaction degree of the shared video of the speaking user and the video interaction degree of the plurality of non-speaking users in a period of time by using a video interaction degree model based on the shared video of the speaking user in a period of time and the conference information, wherein the video interaction degree model is a long-short-period neural network model, the input of the video interaction degree model is the shared video of the speaking user in a period of time and the conference information, and the output of the video interaction degree model is the video interaction degree of the shared video of the speaking user in a period of time and the plurality of non-speaking users;

determining the association degree of the speaking user and a plurality of non-speaking users based on the voice interaction degree of the speaking voice and the plurality of non-speaking users in a period of time of the speaking user and the video interaction degree of the shared video and the plurality of non-speaking users in a period of time of the speaking user;

and determining a plurality of users to be muted based on the association degree of the speaking user and the plurality of non-speaking users, and carrying out mute processing on the plurality of users to be muted.

2. The artificial intelligence based conference management method of claim 1, wherein the method further comprises:

performing voice recognition on the spoken voice of the speaking user to obtain voice recognition characters;

judging whether the voice recognition text comprises the name of the user who does not speak;

and if the voice recognition text comprises the name of the non-speaking user, the non-speaking user corresponding to the name of the non-speaking user is unmuted.

3. The artificial intelligence based conference management method of claim 1, wherein the determining the degree of association of the speaking user with the plurality of non-speaking users based on the degree of voice interaction of the speaking voice with the plurality of non-speaking users over a period of time of the speaking user and the degree of video interaction of the shared video with the plurality of non-speaking users over a period of time of the speaking user comprises: and respectively giving different weights to the voice interaction degree of the spoken voice of the speaking user and the plurality of non-speaking users in a period of time and the video interaction degree of the shared video of the speaking user and the plurality of non-speaking users in a period of time, and then carrying out weighted summation to obtain the association degree of the speaking user and the plurality of non-speaking users.

4. An artificial intelligence based conference management system, comprising:

the first acquisition module is used for acquiring conference information;

the second acquisition module is used for acquiring the spoken voice of the speaking user in a period of time and the shared video of the speaking user in a period of time;

the voice determining module is used for determining voice interaction degrees of the spoken voices of the speaking user and the plurality of non-speaking users in a period of time based on the spoken voices of the speaking user and the conference information by using a voice interaction degree model, wherein the voice interaction degree model is a long-short-period neural network model, the input of the voice interaction degree model is the spoken voices of the speaking user in the period of time and the conference information, and the output of the voice interaction degree model is the voice interaction degrees of the spoken voices of the speaking user and the plurality of non-speaking users in the period of time;

the video determining module is used for determining video interaction degrees of the shared videos of the speaking user and the plurality of non-speaking users in a period of time based on the shared videos of the speaking user in a period of time and the conference information by using a video interaction degree model, wherein the video interaction degree model is a long-short period neural network model, the input of the video interaction degree model is the shared videos of the speaking user in a period of time and the conference information, and the output of the video interaction degree model is the video interaction degrees of the shared videos of the speaking user in a period of time and the plurality of non-speaking users;

the relevancy determination module is used for determining relevancy between the speaking user and the plurality of non-speaking users based on voice interaction degrees of the speaking voice and the plurality of non-speaking users in a period of time of the speaking user and video interaction degrees of the shared video and the plurality of non-speaking users in a period of time of the speaking user;

and the mute module is used for determining a plurality of users to be muted based on the association degree of the speaking user and the plurality of non-speaking users and carrying out mute processing on the plurality of users to be muted.

5. The artificial intelligence based meeting management system of claim 4, wherein the system is further configured to:

6. The artificial intelligence based meeting management system of claim 4 wherein the relevancy determination module is further configured to: and respectively giving different weights to the voice interaction degree of the spoken voice of the speaking user and the plurality of non-speaking users in a period of time and the video interaction degree of the shared video of the speaking user and the plurality of non-speaking users in a period of time, and then carrying out weighted summation to obtain the association degree of the speaking user and the plurality of non-speaking users.

7. An electronic device, comprising: a memory; a processor; a computer program; wherein the computer program is stored in the memory and configured to be executed by the processor to implement the artificial intelligence based conference management method of any one of claims 1 to 3.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the artificial intelligence based conference management method as claimed in any one of claims 1 to 3.