CN113259620B

CN113259620B - Video conference data synchronization method and device

Info

Publication number: CN113259620B
Application number: CN202110630970.XA
Authority: CN
Inventors: 陈刚; 方烨
Original assignee: Guangzhou Lango Electronic Science and Technology Co Ltd
Current assignee: Guangzhou Lango Electronic Science and Technology Co Ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2021-12-03
Anticipated expiration: 2041-06-07
Also published as: CN113259620A

Abstract

The invention provides a video conference data synchronization method and a device, which relate to the video conference technology, and are characterized in that the triggering time of a plurality of user-side microphones is acquired in real time, a first user side is acquired based on the triggering time, and the rest user sides are used as second user sides, wherein the triggering time of the first user side is earlier than that of the second user side; acquiring first voice information acquired by the first user side, and sending the first voice information to the first user side and the second user side for playing; and acquiring second voice information acquired by the second user side for conversion processing to generate text information, and sending the text information to the first user side and the second user side for display.

Description

Video conference data synchronization method and device

Technical Field

The present invention relates to video conferencing technologies, and in particular, to a method and an apparatus for synchronizing video conferencing data.

Background

A video conference is a conference in which people located at two or more locations have a face-to-face conversation via a communication device and a network.

In the prior art, when a video conference is carried out, a free mode and a mute mode are set, wherein the mute mode is that only a specific person is allowed to talk, and other persons cannot talk; free mode, anyone can talk, therefore, free mode can't control the quantity of simultaneous speakers, when two or three more people talk simultaneously, sound can superpose, leads to making the speech environment noisy confusion.

Therefore, how to improve the voice environment in the video conference becomes a technical problem which needs to be solved urgently.

Disclosure of Invention

The embodiment of the invention provides a video conference data synchronization method and device, which can improve the voice environment during a video conference.

In a first aspect of the embodiments of the present invention, a method for synchronizing video conference data is provided, where the method includes:

acquiring trigger time of a plurality of user side microphones in real time, acquiring a first user side based on the trigger time, and taking the rest user sides as second user sides, wherein the trigger time of the first user side is earlier than that of the second user side;

acquiring first voice information acquired by the first user side, and sending the first voice information to the first user side and the second user side for playing;

and acquiring second voice information acquired by the second user side for conversion processing, generating text information, and sending the text information to the first user side and the second user side for display.

Optionally, in a possible implementation manner of the first aspect, the method further includes:

acquiring a video frame corresponding to each user side, and processing the background of the video frame based on different colors to generate a state frame;

where green represents the user's voice playing, yellow represents the user is in a muted state, and red represents the user is speaking but not playing.

Optionally, in a possible implementation manner of the first aspect, after the acquiring the first user end based on the plurality of trigger times, the method further includes:

and performing noise analysis on the first voice information, and if the first voice information is noise information, closing a microphone of the first user end.

Optionally, in a possible implementation manner of the first aspect, after the sending the text message to the first user end and the second user end for displaying, the method further includes:

and modifying the text information based on the editing operation of the second user side.

Optionally, in a possible implementation manner of the first aspect, before the triggering time for acquiring a plurality of client-side microphones in real time, the method further includes:

receiving the setting of the priority of each user side, and giving authority to each user side based on the priority, wherein the priority comprises a first priority and a second priority;

after the acquiring the first user end based on the plurality of trigger times, further comprising:

and receiving an interruption request of the user terminal with a first priority, and closing a microphone of the first user terminal.

receiving multiple types of operation requests of the user side with multiple second priorities, and generating a predicted value based on a preset rule, wherein the multiple types of operation requests comprise a content supplement request, a content error correction request and a content question request;

and closing the microphone of the first user terminal based on a preset value and the predicted value.

Optionally, in a possible implementation manner of the first aspect, the preset rule includes:

N=0.2x+0.5y+0.3z

wherein, N represents the predicted value, x represents the number of the content supplementary requests sent by the user side with the second priority, y represents the number of the content error correction requests sent by the user side with the second priority, and z represents the number of the content query requests sent by the user side with the second priority.

Optionally, in a possible implementation manner of the first aspect, after the receiving multiple types of operation requests of multiple second priorities from the user side, the method further includes:

and displaying the number of the content supplement requests sent by the user side with the second priority, the number of the content error correction requests sent by the user side with the second priority and the change of the number of the content question requests sent by the user side with the second priority along with the time on the basis of a preset curve model.

Optionally, in a possible implementation manner of the first aspect, before the acquiring, in real time, trigger times of a plurality of user-side microphones, and acquiring, based on a plurality of the trigger times, the first user side, the method further includes:

and determining that the number of triggered user side microphones is greater than a preset number.

In a second aspect of the embodiments of the present invention, a video conference data synchronization apparatus is provided, including:

the time module is used for acquiring the trigger time of a plurality of user-side microphones in real time, acquiring a first user side based on the trigger time, and taking the rest user sides as second user sides, wherein the trigger time of the first user side is earlier than that of the second user side;

the playing module is used for acquiring first voice information acquired by the first user end and sending the first voice information to the first user end and the second user end for playing;

and the text module is used for acquiring second voice information acquired by the second user end, converting the second voice information to generate text information, and sending the text information to the first user end and the second user end for displaying.

In a third aspect of the embodiments of the present invention, there is provided a video conference data synchronization apparatus, including: memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the method of the first aspect of the invention as well as various possible aspects of the first aspect.

A fourth aspect of the embodiments of the present invention provides a readable storage medium, in which a computer program is stored, the computer program being, when executed by a processor, configured to implement the method according to the first aspect of the present invention and various possible aspects of the first aspect.

The invention provides a video conference data synchronization method and a device, which play the sound of the earliest speaking user end by collecting the speaking time of a plurality of user ends, and convert the rest of the sound of the speaking user end into word processing, namely, only play the sound of the earliest speaker when a plurality of persons speak at the same time, thereby avoiding the superposition of the sound and improving the voice environment during the video conference.

Drawings

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present invention;

fig. 2 is a schematic flowchart of a video conference data synchronization method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a text message according to an embodiment of the present invention;

FIG. 4 is a schematic view of a curve provided by an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a video conference data synchronization apparatus according to an embodiment of the present invention;

fig. 6 is a schematic hardware structure diagram of a video conference data synchronization apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that any 1 or any 2 or 3 of A, B, C comprises.

It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.

As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention. A, B, C, D in the figure, in the free mode, when A, B, C, D four users speak simultaneously, there are four superimposed sounds, which results in noisy speech environment, and the sound output by the speaker cannot be heard clearly, resulting in poor video conference quality.

Referring to fig. 2, which is a schematic flowchart of a video conference data synchronization method according to an embodiment of the present invention, an execution subject of the method shown in fig. 2 may be a software and/or hardware device. The execution subject of the present application may include, but is not limited to, at least one of: user equipment, network equipment, etc. The user equipment may include, but is not limited to, a computer, a smart phone, a Personal Digital Assistant (PDA), the above mentioned electronic equipment, and the like. The network device may include, but is not limited to, a single network server, a server group of multiple network servers, or a cloud of numerous computers or network servers based on cloud computing, wherein cloud computing is one type of distributed computing, a super virtual computer consisting of a cluster of loosely coupled computers. The present embodiment does not limit this. The method comprises the following steps of S101 to S103:

s101, acquiring trigger time of a plurality of user side microphones in real time, acquiring a first user side based on the trigger time, and taking the rest user sides as second user sides, wherein the trigger time of the first user side is earlier than that of the second user side.

Specifically, the server detects the trigger time of a plurality of client microphones in real time, that is, when a user speaks through a client, the client collects information, wherein the collection time includes the trigger time, and then the client is divided into a first client and a second client by using the trigger time.

For example, A, B, C, D four users are in a video conference, the speaking time of the a user side is 10 o ' clock 10 min 01 sec, the speaking time of the B user side is 10 o ' clock 10 min 02 sec, the speaking time of the C user side is 10 o ' clock 10 min 03 sec, the speaking time of the D user side is 10 o ' clock 10 min 04 sec, and the corresponding trigger time is collected, since the speaking time of the a user side is 10 o ' clock 10 min 01 sec earlier than that of the other user sides, the a user side is taken as the first user side, and the rest (B, C, D) is taken as the second user side.

In practical applications, before acquiring the trigger times of the microphones of the plurality of user terminals in real time and acquiring the first user terminal based on the plurality of trigger times, the method further includes: and determining that the number of triggered user side microphones is greater than a preset number. It can be understood that if only one user speaks, the method of the present invention is not needed, and if more than one user speaks, the voice environment is noisy, and the method of the present invention can be used to improve the voice environment.

S102, acquiring first voice information acquired by the first user side, and sending the first voice information to the first user side and the second user side for playing processing.

Specifically, after the user terminals are classified, the sound of the first user terminal is sent to each user terminal, and is played by using the speakers of each user terminal, it can be understood that each user terminal can only hear the speaking sound of the first user terminal.

In practical application, noise analysis may be performed on the first voice information, and if the first voice information is noise information, the microphone of the first user end is turned off. It will be appreciated that if the first user terminal has the earliest trigger time, but it is always noise, such as current sound or other sounds, it needs to be shielded to avoid that it always occupies the resources of the first user terminal.

In some embodiments, the noise may also be determined by using voiceprints, for example, each user terminal may match a voiceprint feature in advance, then perform recognition processing on the voiceprint features collected by the user terminal, compare the processing results, and if the voiceprint features are not consistent, determine that the sound collected by the user terminal is noise.

S103, acquiring second voice information acquired by the second user side for conversion processing, generating text information, and sending the text information to the first user side and the second user side for display.

Specifically, referring to fig. 3, when recognizing that the second user speaks, the server does not play the sound of the second user, but converts the sound into text information to be displayed to the other user, and the other user can also know the content that the user needs to express through the text information.

In some embodiments, the textual information may be presented at a position below the corresponding video frame. In practical applications, after the text message is sent to the first user side and the second user side for display, the text message may be modified based on an editing operation of the second user side. It is understood that the user may modify the text message when the user finds that the text message is converted incorrectly.

In the embodiment, the voice of the earliest speaking user side is played and processed by collecting the speaking time of the plurality of user sides, and the rest of the voices of the speaking user sides are converted into the word processing, namely, when a plurality of persons speak at the same time, the voice of the earliest speaker is only played, so that the superposition of the voices is avoided, and the voice environment during the video conference is improved.

On the basis of the embodiment, in order to make the sound state of each user clear between users, the video frame corresponding to each user side is obtained, and the background of the video frame is processed based on different colors to generate the state frame; where green represents the user's voice playing, yellow represents the user is in a muted state, and red represents the user is speaking but not playing.

It can be understood that the sound state can be judged between users through the color of the background of the video frame, namely, the filter is added to the background of the video frame according to the preset conditions, and then the operations such as modest conversation and the like can be performed, so that the experience of the video conference of the users is improved.

In order to facilitate management of the video conference, the scheme further includes, before the triggering time of the microphones of the plurality of user terminals is collected in real time: receiving the setting of the priority of each user side, and giving authority to each user side based on the priority, wherein the priority comprises a first priority and a second priority;

it can be understood that different permissions can be issued to different user terminals according to the content of the conference, for example, the conference is dominated by a design department, and then the user terminals of the design department can be given a first priority, and the other user terminals can be given a second priority, wherein the permission level of the first priority is higher than the permission level of the second priority.

After the acquiring the first user end based on the plurality of trigger times, further comprising: and receiving an interruption request of the user terminal with a first priority, and closing a microphone of the first user terminal.

It will be appreciated that the first priority user terminal may interrupt the first user terminal, for example, an interrupt button on the user terminal may be clicked to generate an interrupt request, and the microphone of the first user terminal may be turned off.

In some embodiments, after said acquiring the first user end based on the plurality of trigger times, further comprises: receiving multiple types of operation requests of the user side with multiple second priorities, and generating a predicted value based on a preset rule, wherein the multiple types of operation requests comprise a content supplement request, a content error correction request and a content question request; and closing the microphone of the first user terminal based on a preset value and the predicted value.

It can be understood that, during a conference, a presenter continuously explains, in the process, situations that some users need to supplement content, or feel that the content is wrong, or have a question for the content, and the like occur, if the situations cannot be fed back in time, the quality of the video conference is poor, and when the user side with the first priority finds the situations, the user side with the first priority can interrupt due to high authority, whereas the user side with the second priority has low authority and cannot directly interrupt.

Wherein, the preset rule comprises:

N=0.2x+0.5y+0.3z

It can be understood that, the number of content error correction requests sent by the user side in the second priority is weighted more heavily, because the importance of the content error correction requests is higher than the other two, the importance of the content error correction requests can also be adjusted according to actual situations.

Illustratively, x is 5, y is 6, z is 5, the obtained N value is 5.5, and if the preset value is 3, the microphone of the first user end needs to be turned off for interruption.

In order to more intuitively show the number of people with each type of request, after the receiving multiple types of operation requests of the multiple second priorities at the user side, the method further includes: and displaying the number of the content supplement requests sent by the user side with the second priority, the number of the content error correction requests sent by the user side with the second priority and the change of the number of the content question requests sent by the user side with the second priority along with the time on the basis of a preset curve model. Referring to fig. 4, the vertical coordinate represents the number, and the horizontal coordinate represents the time, so that the effect of the speech can be intuitively perceived through a curve.

In other embodiments, the presentation can also be visually displayed in the form of a histogram, and the presenter can perceive the presentation effect of the presenter through a visual chart, so as to interact with the user side in time or change the presentation style of the presenter in time.

Referring to fig. 5, which is a schematic structural diagram of a video conference data synchronization apparatus according to an embodiment of the present invention, the video conference data synchronization apparatus 50 includes:

a time module 51, configured to collect trigger times of multiple user-side microphones in real time, obtain a first user side based on the multiple trigger times, and use the remaining user sides as second user sides, where the trigger time of the first user side is earlier than the trigger time of the second user side;

the playing module 52 is configured to obtain first voice information collected by the first user, and send the first voice information to the first user and the second user for playing;

and the text module 53 is configured to obtain the second voice information collected by the second user, perform conversion processing on the second voice information to generate text information, and send the text information to the first user and the second user for displaying.

The apparatus in the embodiment shown in fig. 5 can be correspondingly used to perform the steps in the method embodiment shown in fig. 1, and the implementation principle and technical effect are similar, which are not described herein again.

Referring to fig. 6, which is a schematic diagram of a hardware structure of a video conference data synchronization apparatus provided in an embodiment of the present invention, the video conference data synchronization apparatus 60 includes: a processor 61, memory 62 and computer programs; wherein

A memory 62 for storing the computer program, which may also be a flash memory (flash). The computer program is, for example, an application program, a functional module, or the like that implements the above method.

A processor 61 for executing the computer program stored in the memory to implement the steps performed by the apparatus in the above method. Reference may be made in particular to the description relating to the preceding method embodiment.

Alternatively, the memory 62 may be separate or integrated with the processor 61.

When the memory 62 is a device separate from the processor 61, the apparatus may further include:

a bus 63 for connecting the memory 62 and the processor 61.

The present invention also provides a readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to implement the methods provided by the various embodiments described above.

The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.

In the above embodiments of the apparatus, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for synchronizing videoconference data, comprising:

acquiring second voice information acquired by the second user side for conversion processing, generating text information, and sending the text information to the first user side and the second user side for display;

wherein, before the trigger time of gathering a plurality of user side microphones in real time, still include:

receiving an interruption request of the user terminal with a first priority, and closing a microphone of the first user terminal;

receiving multiple types of operation requests of the user side with multiple second priorities, and generating a predicted value based on a preset rule, wherein the multiple types of operation requests comprise a content supplement request, a content error correction request and a content question request; and closing the microphone of the first user terminal based on a preset value and the predicted value.

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising, after said obtaining the first user end based on the plurality of trigger times:

4. The method of claim 1, further comprising, after sending the text message to the first user side and the second user side for display:

5. The method of claim 1, wherein the preset rules comprise:

N=0.2x+0.5y+0.3z

6. The method according to claim 1 or 5, further comprising, after said receiving multiple types of operation requests of multiple second priorities from said ue:

7. The method of claim 1, wherein before the acquiring the trigger times of the microphones of the plurality of user terminals in real time and acquiring the first user terminal based on the plurality of trigger times, further comprises:

8. A video conference data synchronization apparatus, comprising:

the text module is used for acquiring second voice information acquired by the second user end for conversion processing, generating text information and sending the text information to the first user end and the second user end for display;