CN108111799B - Method and device for identifying speaker in video conference - Google Patents

Method and device for identifying speaker in video conference Download PDF

Info

Publication number
CN108111799B
CN108111799B CN201711339270.5A CN201711339270A CN108111799B CN 108111799 B CN108111799 B CN 108111799B CN 201711339270 A CN201711339270 A CN 201711339270A CN 108111799 B CN108111799 B CN 108111799B
Authority
CN
China
Prior art keywords
brightness
color
meeting place
preset
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711339270.5A
Other languages
Chinese (zh)
Other versions
CN108111799A (en
Inventor
尚德建
胡小鹏
顾振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Keda Technology Co Ltd
Original Assignee
Suzhou Keda Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Keda Technology Co Ltd filed Critical Suzhou Keda Technology Co Ltd
Priority to CN201711339270.5A priority Critical patent/CN108111799B/en
Publication of CN108111799A publication Critical patent/CN108111799A/en
Application granted granted Critical
Publication of CN108111799B publication Critical patent/CN108111799B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method and a device for identifying a speaker in a video conference, wherein the method comprises the following steps: acquiring the audio volume of each meeting place; determining a meeting place picture where a speaker is located according to the audio volume; marking the frame of the meeting place picture of the speaker into a preset color with initial brightness, wherein each meeting place corresponds to one meeting place picture; and controlling the brightness of the identified preset color to gradually increase according to a preset gradual change rule so as to achieve the target color brightness. The color brightness of the conference room picture is adjusted according to the preset gradual change rule by the color of the frame of the conference room picture where the speaker is located, the adjustment of the color brightness is gradual change type, the time is generally short when abnormal sound occurs, and the gradual change type brightness adjustment is adopted, so that the brightness of the color of the frame of the conference room picture is not high when the abnormal sound is ended, the attention of participants cannot be aroused, and the identification accuracy of the speaker in the video conference is higher.

Description

Method and device for identifying speaker in video conference
Technical Field
The invention relates to the technical field of video conferences, in particular to a speaker identification method and device in a video conference.
Background
With the increasing maturity of computer networks, convenient and fast information communication and collaboration modes become hot spots for network technology research and development. The video conference system is used as a novel communication and communication tool, breaks through the limitation of regions, can provide more convenient, flexible and comprehensive transmission and service of audio and video signals, and is widely applied.
In a current large-scale video conference, a plurality of participants generally participate in the discussion, and at the moment, the picture synthesis and the sound mixing need to be started simultaneously, so that each participant can see other participants and can hear the speaking content of other participants. Namely, when a video conference is held, a plurality of terminals participating in the conference need to be accessed to a conference platform, the terminals can send audio and video coding data to the platform, and the platform decodes, mixes and synthesizes the received audio and video coding data, then codes the audio and video coding data and sends the audio and video coding data to the terminals participating in the conference.
In this case, the picture composition is to display small pictures of all participants in the same video picture. Therefore, when a certain participant or some participants speak, other participants can locate the small picture of the current speaker by browsing a plurality of small pictures and then can see the expression and body language of the participant clearly; when the speaker changes, the participants need to search and locate the speaker again. This frequently happens when multiple parties discuss and exchange, which brings great inconvenience to the participants.
Chinese patent CN101080000A discloses a method for displaying a speaker in a video conference, and particularly discloses that in a video conference, a current speaker is determined according to a predetermined rule according to the volume energy of the speaker, and the video picture of the current speaker is highlighted. The highlight display mainly displays the video meeting of the current speaker; displaying the video picture of the current speaker in a ratio larger than the video pictures of other speakers; and displaying the video picture of the current speaker by using a special mark, wherein the special mark comprises the step of marking the video picture of the current speaker by using a frame. However, the problem with this solution is that when a sudden abnormal sound occurs in meeting places (for example, a cup is placed on a table, a foreign object falls off the floor, and a bump sound), the original speaker mark is instantly switched to the screen of the abnormal meeting place due to the large volume of the abnormal sound, and there is no speaker in these meeting places, which affects the accuracy of the speaker mark.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for identifying a speaker in a video conference, so as to solve the problem in the prior art that the accuracy of speaker marking is low because a speaker marking is easily affected by an abnormal sound.
The invention provides a method for identifying a speaker in a video conference, which comprises the following steps:
acquiring the audio volume of each meeting place;
determining a meeting place picture where a speaker is located according to the audio volume;
marking the meeting place picture frame of the speaker into a preset color with initial brightness, wherein each meeting place corresponds to one meeting place picture;
and controlling the brightness of the identified preset color to gradually increase according to a preset gradual change rule so as to achieve the target color brightness, wherein the preset gradual change rule specifies the rate of the gradual change of the frame color and the target color brightness.
Optionally, controlling the identified brightness of the preset color to gradually increase according to a preset gradient rule so as to reach the target color brightness includes:
judging whether the brightness of the preset color reaches the target color brightness;
when the target color brightness is not reached, judging whether the difference between the time of changing the color brightness of the picture frame of the meeting place identified last time and the current time reaches a preset time interval or not;
and when the preset time interval is reached, controlling the color brightness of the identified meeting place picture frame to increase the brightness of the preset color.
Optionally, the determining the meeting place picture where the speaker is located according to the audio volume includes the following steps:
acquiring voice information of each meeting place;
inquiring the conference places with effective voice information in all the conference places according to the voice information;
extracting the audio volume of the meeting place with the effective voice information;
screening out N meeting places with the maximum audio volume from all the meeting places with the effective voice information;
and determining the meeting place pictures of the N meeting places as the meeting place pictures of the speaker.
Optionally, the brightness levels of the target color brightness specified by the preset gradient rule are divided into N brightness levels that sequentially increase, where the N brightness levels are set corresponding to N maximum audio volumes, and the larger the audio volume is, the higher the corresponding target color brightness is.
Optionally, after controlling the identified brightness of the preset color to gradually increase according to a preset gradient rule, the method further includes:
determining the target color brightness of the meeting place picture frame of each meeting place;
acquiring the color brightness of the current frame of the meeting place picture of each meeting place;
judging whether the color brightness of the current frame reaches the determined target color brightness;
when the color brightness of the current frame does not reach the preset color brightness, judging whether the difference between the time of the change of the color brightness of the frame of the last meeting place picture and the current time reaches a preset time interval or not;
and when the preset time interval is reached, controlling the conference place picture frame of each conference place to adjust to the corresponding target color brightness according to the current frame color brightness.
Optionally, the adjusting of the meeting place picture frame to the corresponding target color brightness according to the current frame color brightness includes the following steps:
when the current frame color brightness is larger than the corresponding target color brightness, the brightness level of the current frame color brightness is adjusted downwards;
and when the current frame color brightness is smaller than the corresponding target color brightness, the brightness level of the current frame color brightness is adjusted upwards.
A second aspect of the present invention provides an apparatus for identifying a speaker in a video conference, including:
the acquisition unit is used for acquiring the audio volume of each meeting place;
the determining unit is used for determining a meeting place picture where a speaker is located according to the audio volume;
the identification unit is used for identifying the meeting place picture frame of the speaker into a preset color with initial brightness, wherein each meeting place corresponds to one meeting place picture;
and the brightness control unit is used for controlling the brightness of the identified preset color to gradually increase according to a preset gradual change rule so as to achieve the target color brightness, wherein the preset gradual change rule specifies the rate of gradual change of the frame color and the target color brightness.
Optionally, the brightness control unit comprises:
the first judgment subunit is used for judging whether the brightness of the preset color reaches the target color brightness;
the second judgment subunit is configured to, when the target color brightness is not reached, judge whether a difference between time of color brightness change of the meeting place picture frame identified last time and current time reaches a preset time interval;
and the brightness increasing subunit is used for controlling the color brightness of the identified meeting place picture frame to increase the brightness of the preset color when the preset time interval is reached.
A third aspect of the invention provides a media platform comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the one processor to cause the at least one processor to perform the method of identifying a speaker in a videoconference of any of the first aspect of the present invention or the first aspect of the present invention.
A fourth aspect of the present invention is a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method for identifying a speaker in a video conference according to any one of the first aspect of the present invention or the first aspect.
The technical scheme provided by the invention has the following advantages:
1. according to the identification method of the speaker in the video conference, provided by the embodiment of the invention, the color brightness of the frame color of the conference room picture where the speaker is located is adjusted according to the preset gradual change rule, the adjustment of the color brightness is gradual change type, the time is generally short when abnormal sound occurs, and the gradual change type brightness adjustment can ensure that the abnormal sound is ended when the brightness of the frame color of the conference room picture is not high, the frame identification of the conference room picture is ended, the attention of participants cannot be aroused, and the identification accuracy of the speaker in the video conference is higher.
2. According to the speaker identification method in the video conference, whether the brightness adjustment interval reaches the preset time interval or not is judged before the color brightness of the frame of the conference picture is adjusted, and the influence on the effect of the video conference caused by frequent adjustment of the brightness is avoided.
3. According to the identification method of the speaker in the video conference, the speaker of the video conference is determined by combining the voice information and the audio volume, so that the accuracy of determining the speaker can be improved, and the effect of the video conference is improved.
4. According to the speaker identification method in the video conference, provided by the embodiment of the invention, the brightness level of the target color brightness and the audio volume are correspondingly set, namely the higher the audio volume is, the higher the brightness of the corresponding conference picture frame of the speaker is, so that each participant can easily locate the main speaker.
5. According to the identification method of the speaker in the video conference, provided by the embodiment of the invention, the relationship between the current frame color brightness and the target color brightness of the conference picture is compared in real time, so that the conference system can adjust the frame color brightness of the speaker in time, and the effect of the video conference is improved.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are illustrative and not to be construed as limiting the invention in any way, and in which:
fig. 1 shows a specific schematic method flowchart of a method for identifying a speaker in a video conference according to embodiment 1 of the present invention;
fig. 2 shows a specific schematic method flowchart of a method for identifying a speaker in a video conference according to embodiment 2 of the present invention;
fig. 3 shows a specific schematic method flowchart of a method for identifying a speaker in a video conference according to embodiment 3 of the present invention;
fig. 4 is a schematic structural diagram showing a speaker identification apparatus in a video conference according to embodiment 5 of the present invention;
fig. 5 shows a specific schematic structural diagram of a media platform in embodiment 6 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood by those skilled in the art that the conference system described in the embodiment of the present invention includes a media platform and a plurality of conference terminals. When a video conference is held, each conference terminal needs to be accessed to a media platform, and then the terminals can send audio and video coding data to the media platform; the media platform decodes, mixes and synthesizes audio and video coding data after receiving the audio and video coding data, codes and sends the audio and video coding data to each conference terminal; each conference terminal corresponds to a conference place picture, the media platform synthesizes the conference place pictures corresponding to all the conference terminals, and the pictures of the conference places where speakers are located are displayed in a highlighted mode in the video conference process.
Example 1
An embodiment of the present invention provides a method for identifying a speaker in a video conference, which can be used in an apparatus for identifying a speaker in a video conference, and as shown in fig. 1, the method includes the following steps:
step S11, the audio volume of each meeting place is obtained.
The identification device of the speaker in the video conference acquires the audio volume of all the participating sites, namely acquires various sounds emitted by all the participating sites, including the sound of the speaker, the noise of the sites or other interference sounds.
And step S12, determining the meeting place picture where the speaker is located according to the audio volume.
The identification device of the speaker in the video conference can determine the conference place picture where the speaker is located according to the volume of the audio, and can also determine the conference place picture where the speaker is located according to the combination of the volume of the audio and other characteristics (such as voice information) of the conference place.
Step S13, marking the meeting place picture frame of the speaker as a preset color with initial brightness, where each meeting place corresponds to one meeting place picture.
When the identification device of the speaker in the video conference initializes the conference room picture frames of all the participating conference rooms, the conference room picture frame where the speaker is located, which is determined in step S12, is identified as a preset color with initial brightness. The brightness level of the initial brightness of the meeting place picture frame of different speakers is not high, and can be a brightness level which can not be noticed by human eyes; the preset colors of the meeting place picture frames of different speakers can be the same or different.
Namely, when initializing (from a non-speaker to a speaker), the color of the frame of the meeting place picture where the speaker is located is marked to be a preset color with low brightness.
And step S14, controlling the brightness of the identified preset color to gradually increase according to a preset gradual change rule so as to reach the target color brightness, wherein the preset gradual change rule specifies the rate of the frame color gradual change and the target color brightness.
The voice recognition method comprises the steps that a preset gradient rule is set in an identification device of a speaker in a video conference, and the gradient rate of the color of a frame and the target color brightness are specified in the rule.
After the meeting place picture where the speaker is located is determined, the brightness of the corresponding frame color gradually increases according to the set preset gradual change rule until the target color brightness is reached.
In the embodiment, when a video conference is initiated, after a meeting place picture where a speaker is located is determined for the first time, the frame color brightness of the corresponding meeting place picture is set to be the preset color of the initial brightness; then, the brightness is gradually increased according to a preset rule until the preset brightness is reached. The gradual change rule is set, so that the color brightness of the marked meeting place frame is not high and the change of the marked meeting place frame cannot be noticed by human eyes under the condition that abnormal sound occurs; generally, the duration of the abnormal sound is not long, when the abnormal sound is ended, the color brightness of the represented meeting place frame is not high, the ending brightness is increased, the identification is not performed any more, the change process of the color brightness is not enough to draw the attention of the participants, and further, for the participants, the identification accuracy of the speaker in the video conference is high.
Example 2
An embodiment of the present invention provides a method for identifying a speaker in a video conference, which can be used in an apparatus for identifying a speaker in a video conference, and as shown in fig. 2, the method includes the following steps:
step S21, the audio volume of each meeting place is obtained.
Similar to step S11 in embodiment 1, the description is omitted here.
And step S22, determining the meeting place picture where the speaker is located according to the audio volume.
The identification device of the speaker in the conference system determines the conference place picture of the speaker by using the audio volume and the voice information of the conference place. The method specifically comprises the following steps:
step S221, obtaining the voice information of each meeting place.
The identification device of the speaker in the conference system can be obtained by obtaining the audio information of each conference site and by voice recognition; or, the voice information of each meeting place can be directly extracted.
Step S222, according to the voice information, inquiring the meeting places with effective voice information in all meeting places.
And the identification device of the speaker in the conference system carries out voice discrimination on the acquired voice information of all the conference places and identifies whether the voice information has effective voice. The valid speech is a speech information including specific text contents, specifically, a human speech, not a speech due to an abnormal sound. I.e. the identification means are able to query all participating venues with valid language information.
In step S223, the audio volume of the conference hall with the valid voice information is extracted.
After inquiring about the participating sites with valid voice, the speaker identifier in the conference system extracts the audio volume of the participating sites with valid voice information from the audio volumes of all the participating sites acquired in step S21.
Step S224, screen out N meeting places with the largest audio volume from all meeting places with valid voice information.
The identification device of the speaker in the conference system can sort the audio volume of all the participating sites with effective voice information, and screen out the N participating sites with the maximum audio volume.
Step S225, determining the meeting place pictures of the N meeting places as the meeting place pictures where the speakers are located.
The identification device of the speaker in the conference system determines that the N meeting places with effective voice information and maximum audio volume are the meeting places where the speaker is located, and the corresponding meeting place picture is the meeting place picture where the speaker is located.
As an optional implementation manner of this embodiment, N is 1 to 4, that is, 1 to 4 meeting place pictures with valid voice information and the largest audio volume are selected from the meeting place pictures of all the participating meeting places, and the 1 to 4 meeting place pictures are the meeting place pictures where the speaker is located. The value range of N can ensure that the picture of the conference place where the speaker is located can be highlighted on one hand, and on the other hand, the whole picture can be very disordered due to overlarge value of N.
Step S23, marking the meeting place picture frame of the speaker as a preset color with initial brightness, where each meeting place corresponds to one meeting place picture.
The identification device of the speakers in the conference system can identify the colors of the conference room frames of all the speakers as the same color (for example, green) and the same initial brightness at the beginning.
And step S24, controlling the brightness of the identified preset color to gradually increase according to a preset gradual change rule so as to reach the target color brightness, wherein the preset gradual change rule specifies the rate of the frame color gradual change and the target color brightness.
The preset gradual change rule specifies the gradual change rate of the frame color and the target color brightness, the target color brightness is divided into N brightness levels which are increased in sequence, and the brightness levels are set corresponding to the audio volume; that is, the larger the audio volume is, the higher the color brightness of the frame of the conference picture where the corresponding speaker is located is. For example, the brightness levels of the target colors are classified into 1, 2, 3, …, N according to the audio volume from small to large. Determining that meeting place A, meeting place B and meeting place C are meeting places where speakers are located, wherein meeting place A > meeting place B > meeting place C corresponding to audio volume. Initially, setting the color of the frame of the 3 meeting places as green, and setting the brightness level as 1;
specifically, the method comprises the following steps:
step S241, determining whether the brightness of the preset color reaches the target color brightness. When the judgment result is no, executing step S242; otherwise, step S244 is executed.
The identification device of a speaker in the conference system judges whether the brightness of the preset color reaches the target color brightness in real time, and when the brightness of the preset color does not reach the target color brightness, the brightness of the preset color needs to be adjusted; when the target color intensity is reached, the identification of the speaker is complete.
For example, the brightness level of the target color brightness corresponding to conference site a should be 3, and the brightness level of the current brightness is 1, so the color brightness of the frame corresponding to conference site a needs to be adjusted.
In step S242, it is determined whether the difference between the time of the last identified change in the color and brightness of the meeting place frame and the current time reaches a preset time interval. When the judgment result is yes, step S243 is executed; otherwise, step S241 is executed.
When the identification device of the speaker in the conference system judges that the brightness of the preset color does not reach the preset brightness, the brightness of the preset color needs to be adjusted again. Before adjustment, whether the difference between the time of changing the color brightness of the frame of the meeting place picture identified last time and the current time reaches a preset time interval needs to be judged, and the brightness of the preset color is adjusted only when the preset time interval is reached.
The preset time interval may be a refresh time of the conference system. By setting the judgment of the preset time interval, the influence on the effect of the video conference caused by frequent adjustment of the brightness is avoided.
And step S243, controlling the identified brightness of the frame color of the meeting place picture to increase the brightness of the preset color.
And when the difference between the identified time of the change of the color and the brightness of the frame of the meeting place picture and the current time reaches a preset time interval, the identification device of the speaker in the conference system controls the identified color and the brightness of the frame of the meeting place picture to increase the brightness of the preset color according to a preset gradient rule until the target color and the brightness are reached.
For example, the current brightness level of meeting place a is 1, and the brightness level of the corresponding target color brightness is 3, the brightness level of meeting place a may be first adjusted to 2 and then adjusted to 3 according to the preset gradient rule.
In step S244, the identification of the speaker is completed.
The identification device of the speakers in the conference system indicates that the identification of all speakers is completed at the moment after the color brightness of the frames of the conference room pictures of all the speakers is adjusted to the corresponding target color brightness.
As an optional implementation manner of this embodiment, the target color brightness specified in the preset gradient rule may be divided into M brightness levels that increase in sequence, and the brightness levels are set corresponding to the audio volume (where M > N). Specifically, after the meeting places with effective voice information are screened out, different brightness of the frame color identification of the meeting place pictures of all the meeting places with effective voice information is obtained, so that the meeting places with effective voice information can be guaranteed, the frame color of the meeting place pictures of the meeting places with smaller audio volume can have certain brightness, all the meeting place pictures where speakers are located can be definitely known by each meeting place, and the video conference effect is improved.
Example 3
An embodiment of the present invention provides a method for identifying a speaker in a video conference, which can be used in an apparatus for identifying a speaker in a video conference, and as shown in fig. 3, the method includes the following steps:
step S31, the audio volume of each meeting place is obtained.
Similar to step S21 in embodiment 2, the description is omitted here.
And step S32, determining the meeting place picture where the speaker is located according to the audio volume.
Similar to step S22 in embodiment 2, the description is omitted here.
Step S33, marking the meeting place picture frame of the speaker as a preset color with initial brightness, where each meeting place corresponds to one meeting place picture.
Similar to step S23 in embodiment 2, the description is omitted here.
And step S34, controlling the brightness of the identified preset color to gradually increase according to a preset gradual change rule so as to reach the target color brightness, wherein the preset gradual change rule specifies the rate of the frame color gradual change and the target color brightness.
Similar to step S24 in embodiment 2, the description is omitted here.
Step S35, determining the target color brightness of the meeting place picture frame of each meeting place.
In the process of the video conference, the audio volume of each meeting place is constantly changed, so that the color brightness of the picture frame of each meeting place needs to be adjusted in real time. Before real-time adjustment, the target color brightness of the picture frame of each meeting place needs to be determined.
And the identification device of the speaker in the conference system acquires the audio volume and the voice information of each meeting place, and determines the meeting place picture where the speaker is located according to the audio volume and the voice information. Please refer to step S22 in embodiment 2, which is not described herein again.
After the identification device of the speaker in the conference system determines the speaker in the participating conference places, the color brightness of the corresponding conference place picture frame is higher according to the corresponding relation between the audio volume of the speaker and the color brightness of the conference place picture frame specified in the preset rule, namely the larger the audio volume is, so that the target color brightness of the conference place picture frame of each participating conference place is determined.
Step S36, obtaining the color brightness of the current frame of the meeting place picture of each meeting place.
The identification device of the speaker in the conference system acquires the color brightness of the current frame of the conference site picture of each conference site in real time so as to adjust the color brightness of the current frame in real time in the video conference process.
And step S37, judging whether the color brightness of the current frame reaches the determined target color brightness. When the judgment result is no, step S38 is executed; otherwise, step S35 is executed.
Similar to step S241 in embodiment 2, the description is omitted here.
Step S38, determine whether the difference between the time of the last meeting place picture frame color brightness change and the current time reaches a preset time interval. If yes, go to step S39; otherwise, step S35 is executed.
After determining the target color brightness of the frame of the meeting place picture of each meeting place, the identification device of the speaker in the meeting system carries out the relationship between the difference between the time of changing the color brightness of the frame of the last meeting place picture and the current time and the preset time interval.
Please refer to step S242 in embodiment 2, which is not described herein again.
And step S39, controlling the picture frame of each meeting place to adjust to the corresponding target color brightness according to the current frame color brightness.
And S391, when the brightness of the current frame color is larger than the corresponding target color brightness, the brightness level of the current frame color brightness is adjusted downwards.
For example, in meeting place a, if the brightness of the current border color is 3 and the corresponding brightness of the target color is 1, the brightness level of the current border color brightness is sequentially adjusted downward, that is, the brightness of the current border color is first adjusted downward to 2, and then adjusted downward to 1.
In step S392, when the color brightness of the current frame is less than the corresponding target color brightness, the brightness level of the color brightness of the current frame is adjusted up.
For example, in meeting place a, if the current color brightness of the frame is 1 and the corresponding target color brightness is 3, the brightness level of the current color brightness of the frame is sequentially adjusted up, that is, the current color brightness of the frame is first adjusted up to 2 and then adjusted up to 3.
Details of the steps not described in detail in this embodiment are please refer to embodiment 2, which are not described herein again.
Example 4
The embodiment of the invention provides a specific application example of a speaker identification method in a video conference, which can be used in a speaker identification device in the video conference, and the method comprises the following steps:
(1) and acquiring channel voice information of the audio mixing module.
(2) And screening channels with voice, and sequencing according to the volume of the channels.
(3) And selecting the maximum N channels according to the sorting to identify the speaker, and identifying non-speakers by other channels.
(4) And designating the Caim by the target color according to the identification.
(5) When Caim, identified as speaker, is set to Cend, Caim, identified as non-speaker, is set to Cbegin.
(6) The current color settings Ccur and Caim are checked periodically for equality.
(7) If the current color Ccur is not consistent with the target color Caim, it is determined whether the difference Telaspe between the last color change time Tlast and the current time Tcur exceeds the change time Tgap. If the color is exceeded, the color is changed to the target color, otherwise the color is not changed.
(8) The color is changed toward the target color, i.e., Caim < Ccur-Ccur, Caim > Ccur, Cur + +.
(9) The state and the color of the frame with the unchanged mark are not changed.
(10) And the synthesized data is sent to an encoder for encoding and then sent to the terminal.
Example 5
An embodiment of the present invention provides an apparatus for identifying a speaker in a video conference, which can be used to implement the method for identifying a speaker in a video conference described in any one of embodiments 1 to 4, and as shown in fig. 4, the apparatus includes:
an obtaining unit 41 is configured to obtain audio volumes of the meeting places.
And the determining unit 42 is used for determining the conference scene picture where the speaker is located according to the audio volume.
And an identification unit 43, configured to identify a meeting place picture frame of a speaker as a preset color with initial brightness, where each meeting place corresponds to one meeting place picture.
And a brightness control unit 44, configured to control the brightness of the identified preset color to gradually increase according to a preset gradient rule to reach the target color brightness, where the preset gradient rule specifies a rate of frame color gradient and the target color brightness.
As an optional implementation manner of this embodiment, wherein the brightness control unit includes:
the first judging subunit is used for judging whether the brightness of the preset color reaches the target color brightness.
And the second judgment subunit is used for judging whether the difference between the time of the color brightness change of the meeting place picture frame identified last time and the current time reaches the time interval or not when the target color brightness is not reached.
And the brightness control subunit is used for controlling the color brightness of the identified meeting place picture frame to increase the brightness of the preset color when the preset time interval is reached.
Example 6
Fig. 5 is a schematic diagram of a hardware structure of a media platform according to an embodiment of the present invention, as shown in fig. 5, the device includes one or more processors 51 and a memory 52, and one processor 51 is taken as an example in fig. 5.
The media platform may further comprise: and an image display (not shown) for displaying the processed picture image of the video conference. The processor 51, the memory 52 and the image display may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The processor 51 may be a Central Processing Unit (CPU). The Processor 51 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 52 is a non-transitory computer readable storage medium, and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the identification method of the speaker in the video conference in the embodiment of the present application. The processor 51 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 52, that is, implements the speaker identification method in the video conference in the above-described embodiment.
The memory 52 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of an identification device of a speaker in the video conference, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 52 may optionally include memory located remotely from the processor 51, and these remote memories may be connected over a network to the identification means of the speaker in the videoconference. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 52 and, when executed by the one or more processors 51, perform the method for identifying a speaker in a video conference of embodiments 1-4.
The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For details of the technology not described in detail in this embodiment, reference may be made to the description of embodiments 1 to 4.
Example 7
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions which can execute the processing method in the embodiment of the identification method of the speaker in the video conference. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A method for identifying a speaker in a video conference, comprising the steps of:
acquiring the audio volume of each meeting place;
determining a meeting place picture where a speaker is located according to the audio volume;
marking the meeting place picture frame of the speaker into a preset color with initial brightness, wherein each meeting place corresponds to one meeting place picture;
and controlling the brightness of the marked preset color to gradually increase according to a preset gradual change rule so as to achieve the target color brightness, wherein the preset gradual change rule specifies the frame color gradual change rate and the target color brightness, and the preset gradual change rule is used for ending the brightness increase and not marking any more when the color brightness of the shown meeting place frame does not reach the target color brightness when the abnormal sound is ended.
2. The method according to claim 1, wherein controlling the brightness of the identified preset color to gradually increase according to a preset gradient rule to achieve a target color brightness comprises:
judging whether the brightness of the preset color reaches the target color brightness;
when the target color brightness is not reached, judging whether the difference between the time of changing the color brightness of the picture frame of the meeting place identified last time and the current time reaches a preset time interval or not;
and when the preset time interval is reached, controlling the color brightness of the identified meeting place picture frame to increase the brightness of the preset color.
3. The identification method according to claim 1, wherein the step of determining the meeting place picture where the speaker is located according to the audio volume comprises the following steps:
acquiring voice information of each meeting place;
inquiring the conference places with effective voice information in all the conference places according to the voice information;
extracting the audio volume of the meeting place with the effective voice information;
screening out N meeting places with the maximum audio volume from all the meeting places with the effective voice information;
and determining the meeting place pictures of the N meeting places as the meeting place pictures of the speaker.
4. The method according to claim 3, wherein the brightness level of the target color brightness specified by the preset gradient rule is divided into N brightness levels which are sequentially increased, and the N brightness levels are set corresponding to N maximum audio volumes, wherein the larger the audio volume is, the higher the target color brightness is.
5. The method according to claim 4, wherein after controlling the brightness of the identified preset color to gradually increase according to a preset gradual change rule, the method further comprises:
determining the target color brightness of the meeting place picture frame of each meeting place;
acquiring the color brightness of the current frame of the meeting place picture of each meeting place;
judging whether the color brightness of the current frame reaches the determined target color brightness;
when the color brightness of the current frame does not reach the preset color brightness, judging whether the difference between the time of the change of the color brightness of the frame of the last meeting place picture and the current time reaches a preset time interval or not;
and when the preset time interval is reached, controlling the conference place picture frame of each conference place to adjust to the corresponding target color brightness according to the current frame color brightness.
6. The method according to claim 5, wherein the adjustment of the meeting place picture frame to the corresponding target color brightness according to the current frame color brightness comprises the following steps:
when the current frame color brightness is larger than the corresponding target color brightness, the brightness level of the current frame color brightness is adjusted downwards;
and when the current frame color brightness is smaller than the corresponding target color brightness, the brightness level of the current frame color brightness is adjusted upwards.
7. An apparatus for identifying a speaker in a video conference, comprising:
the acquisition unit is used for acquiring the audio volume of each meeting place;
the determining unit is used for determining a meeting place picture where a speaker is located according to the audio volume;
the identification unit is used for identifying the meeting place picture frame of the speaker into a preset color with initial brightness, wherein each meeting place corresponds to one meeting place picture;
and the brightness control unit is used for controlling the brightness of the identified preset color to gradually increase according to a preset gradual change rule so as to achieve the target color brightness, wherein the preset gradual change rule specifies the frame color gradual change rate and the target color brightness, and the preset gradual change rule is used for ending the brightness increase and not carrying out the identification when the color brightness of the represented meeting place frame does not reach the target color brightness when the abnormal sound is ended.
8. The marking device according to claim 7, wherein the brightness control unit comprises:
the first judgment subunit is used for judging whether the brightness of the preset color reaches the target color brightness;
the second judgment subunit is configured to, when the target color brightness is not reached, judge whether a difference between time of color brightness change of the meeting place picture frame identified last time and current time reaches a preset time interval;
and the brightness increasing subunit is used for controlling the color brightness of the identified meeting place picture frame to increase the brightness of the preset color when the preset time interval is reached.
9. A media platform comprising at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the one processor to cause the at least one processor to perform the method of identifying a speaker in a videoconference of any of claims 1 to 6.
10. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method for identifying a speaker in a video conference according to any one of claims 1 to 6.
CN201711339270.5A 2017-12-14 2017-12-14 Method and device for identifying speaker in video conference Active CN108111799B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711339270.5A CN108111799B (en) 2017-12-14 2017-12-14 Method and device for identifying speaker in video conference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711339270.5A CN108111799B (en) 2017-12-14 2017-12-14 Method and device for identifying speaker in video conference

Publications (2)

Publication Number Publication Date
CN108111799A CN108111799A (en) 2018-06-01
CN108111799B true CN108111799B (en) 2020-12-18

Family

ID=62216005

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711339270.5A Active CN108111799B (en) 2017-12-14 2017-12-14 Method and device for identifying speaker in video conference

Country Status (1)

Country Link
CN (1) CN108111799B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109873973B (en) 2019-04-02 2021-08-27 京东方科技集团股份有限公司 Conference terminal and conference system
CN111556277B (en) * 2020-05-19 2022-07-26 安徽听见科技有限公司 Method, device and equipment for processing identifiers of participants in video conference and storage medium
CN111447400B (en) * 2020-05-19 2021-08-17 科大讯飞股份有限公司 Method, device, equipment and storage medium for processing participant identification of video conference
CN115460371A (en) * 2021-06-09 2022-12-09 苏州译牛智能科技有限公司 Simultaneous interpretation method in video conference, server and readable storage medium
CN113286114A (en) * 2021-07-20 2021-08-20 北京微吼时代科技有限公司 Video mixed-flow live broadcast technology-based video picture marking method, device and equipment
CN113992968B (en) * 2021-10-27 2023-03-31 深圳市宝泽科技有限公司 Method and device for gradually changing colors of frames of screen projection pictures based on wireless screen projection field
CN115052126B (en) * 2022-08-12 2022-10-28 深圳市稻兴实业有限公司 Ultra-high definition video conference analysis management system based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080000A (en) * 2007-07-17 2007-11-28 华为技术有限公司 Method, system, server and terminal for displaying speaker in video conference
CN101212751A (en) * 2006-12-26 2008-07-02 鸿富锦精密工业(深圳)有限公司 Mobile communication terminal capable of displaying multi-party video call and the display method
CN102857409A (en) * 2012-09-04 2013-01-02 上海量明科技发展有限公司 Display method, client and system for local sound effect conversion in instant communication
CN105681582A (en) * 2016-03-18 2016-06-15 努比亚技术有限公司 Control color adjusting method and terminal
CN105744208A (en) * 2014-12-11 2016-07-06 北京视联动力国际信息技术有限公司 Video conference control system and control method
CN106162046A (en) * 2015-04-24 2016-11-23 中兴通讯股份有限公司 A kind of video conference image rendering method and device thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336587B1 (en) * 1998-10-19 2002-01-08 Symbol Technologies, Inc. Optical code reader for producing video displays and measuring physical parameters of objects

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101212751A (en) * 2006-12-26 2008-07-02 鸿富锦精密工业(深圳)有限公司 Mobile communication terminal capable of displaying multi-party video call and the display method
CN101080000A (en) * 2007-07-17 2007-11-28 华为技术有限公司 Method, system, server and terminal for displaying speaker in video conference
CN102857409A (en) * 2012-09-04 2013-01-02 上海量明科技发展有限公司 Display method, client and system for local sound effect conversion in instant communication
CN105744208A (en) * 2014-12-11 2016-07-06 北京视联动力国际信息技术有限公司 Video conference control system and control method
CN106162046A (en) * 2015-04-24 2016-11-23 中兴通讯股份有限公司 A kind of video conference image rendering method and device thereof
CN105681582A (en) * 2016-03-18 2016-06-15 努比亚技术有限公司 Control color adjusting method and terminal

Also Published As

Publication number Publication date
CN108111799A (en) 2018-06-01

Similar Documents

Publication Publication Date Title
CN108111799B (en) Method and device for identifying speaker in video conference
US10264214B1 (en) System and methods for testing a video conference call using a virtual assistant
CN108810649B (en) Image quality adjusting method, intelligent television and storage medium
US10306437B2 (en) Smart device grouping system, method and apparatus
US20190042187A1 (en) Replying to a spoken command
US20220174357A1 (en) Simulating audience feedback in remote broadcast events
US11810560B1 (en) Voice-controlled device switching between modes based on speech input
TWI551148B (en) Cloud server and control device and method for audio and video synchronization
US20230010466A1 (en) Adjusting audio and non-audio features based on noise metrics and speech intelligibility metrics
US10110831B2 (en) Videoconference device
CN106027589A (en) Video and audio processing devices and video conference system
EP2993860A1 (en) Method, apparatus, and system for presenting communication information in video communication
CN109327716A (en) Delay control method, delay control device and computer readable storage medium
CN108933914B (en) Method and system for carrying out video conference by using mobile terminal
CN113692091B (en) Equipment control method, device, terminal equipment and storage medium
KR20180102501A (en) Home cinema system devices
WO2013174115A1 (en) Presence control method, device, and system in continuous presence video conferencing
CN103297743A (en) Video conference display window adjusting method and video conference service equipment
CN110933485A (en) Video subtitle generating method, system, device and storage medium
WO2015180330A1 (en) Volume adjustment method and device, and multipoint control unit
CN114979755A (en) Screen projection method and device, terminal equipment and computer readable storage medium
CN103841348A (en) Video and audio recording effect adjustment method, device and system
CN114531564A (en) Processing method and electronic equipment
US20240057234A1 (en) Adjusting light effects based on adjustments made by users of other systems
KR101968847B1 (en) Provision of information system and method for individual broadcasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant