WO2016095244A1 - Method and device for adjusting video window in video conference - Google Patents

Method and device for adjusting video window in video conference Download PDF

Info

Publication number
WO2016095244A1
WO2016095244A1 PCT/CN2014/094598 CN2014094598W WO2016095244A1 WO 2016095244 A1 WO2016095244 A1 WO 2016095244A1 CN 2014094598 W CN2014094598 W CN 2014094598W WO 2016095244 A1 WO2016095244 A1 WO 2016095244A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
frequency band
current speaker
video window
participant
Prior art date
Application number
PCT/CN2014/094598
Other languages
French (fr)
Chinese (zh)
Inventor
王云华
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Publication of WO2016095244A1 publication Critical patent/WO2016095244A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Definitions

  • the present invention relates to the field of video conference technologies, and in particular, to a method and an apparatus for adjusting a video window in a video conference.
  • Video conferencing is a common means of modern conversation. Through video conferencing, it is very convenient to communicate between voice and video on the video conference client regardless of where they are.
  • each video conference client When a video conference is held, each video conference client displays all participant screens indiscriminately, or the administrator manually switches the display screen of each video conference client, or each participant manually switches the display screen of the video conference client.
  • the switching of the display screen includes displaying the number of the participant screens on the display screen, the size of the display screen, and the like.
  • the main purpose of the present invention is to solve the complicated process of switching the display screen of the video conference client, and it is not smart enough to effectively and quickly determine and display the current speaker in the display screen of the video conference client, thereby reducing the experience of the video conference client. .
  • the present invention provides a method for adjusting a video window in a video conference, and the method for adjusting a video window in the video conference includes the following steps:
  • the obtained audio data of each participant is sampled to generate audio sampling data, and the number of times the audio sample data of each participant appears in the preset frequency band is separately counted;
  • the current speaker is determined according to the number of times the audio sample data of each participant appears in the preset frequency band, and the video window corresponding to the current speaker is highlighted.
  • the step of sampling the acquired audio data of each participant to generate audio sample data includes:
  • the step of determining the current speaker according to the number of times the audio sample data of each participant appears in the preset frequency band comprises:
  • the participant corresponding to the audio sample data corresponding to the highest number of times is determined as the current speaker.
  • the step of determining the participant corresponding to the audio sample data corresponding to the highest number of times as the current speaker includes:
  • the frequency band having the highest frequency among the frequency bands corresponding to the most number of times is determined
  • the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times is determined as the current speaker.
  • the step of determining, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency in the frequency band corresponding to the highest number of times includes:
  • the frequency of the background noise of the frequency band with the highest frequency band is determined
  • the participant corresponding to the audio data with the highest background noise frequency is used as the current speaker.
  • the manner of highlighting comprises:
  • the present invention further provides an apparatus for adjusting a video window in a video conference, where the apparatus for adjusting a video window in the video conference includes:
  • An acquisition module configured to acquire audio data of all participants
  • the processing module is configured to sample the obtained audio data of each participant to generate audio sampling data, and separately count the number of times the audio sample data of each participant appears in the preset frequency band; and also use the audio sampling data according to each participant The number of occurrences in the preset frequency band determines the current speaker;
  • a display module configured to highlight a video window corresponding to the current speaker.
  • the processing module is further configured to extract audio data of a preset length from the acquired individual audio data to generate audio sample data.
  • the processing module comprises a determining unit and a processing unit,
  • the determining unit is configured to determine audio sample data with the highest number of occurrences in the preset frequency band
  • the processing unit is further configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the highest number of times.
  • the determining unit is further configured to determine, when the highest number of times is two or more, the frequency band with the highest frequency in the frequency band corresponding to the most
  • the processing unit is further configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times.
  • the determining unit is further configured to: when the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times is the same, determine the frequency of the background noise of the frequency band with the highest frequency band;
  • the processing unit is further configured to use the participant corresponding to the audio data with the highest background noise as the current speaker.
  • the display module is further configured to display only the video window of the current speaker
  • the invention samples the obtained audio data, determines the number of occurrences of the preset frequency band in the sampled data, and highlights the video data of the participant corresponding to the audio data with the highest number of occurrences.
  • the automatic switching of the display screen of the video conference client is realized, and the current speaker is effectively and quickly determined and displayed on the display screen of the video conference client, thereby improving the experience of the video conference client.
  • FIG. 1 is a schematic flowchart of a first embodiment of a method for adjusting a video window in a video conference according to the present invention
  • FIG. 2 is a schematic flow chart of an embodiment of step S30 of FIG. 1;
  • FIG. 3 is a schematic flowchart of a second embodiment of a method for adjusting a video window in a video conference according to the present invention
  • FIG. 4 is a schematic flowchart diagram of a third embodiment of a method for adjusting a video window in a video conference according to the present invention.
  • FIG. 5 is a schematic diagram of functional modules of a preferred embodiment of a video window adjusting apparatus in a video conference according to the present invention.
  • FIG. 6 is a schematic diagram of a refinement function module of an embodiment of the processing module of FIG. 5.
  • FIG. 6 is a schematic diagram of a refinement function module of an embodiment of the processing module of FIG. 5.
  • the invention provides a method for adjusting a video window in a video conference.
  • FIG. 1 is a schematic flowchart diagram of a first embodiment of a method for adjusting a video window in a video conference according to the present invention.
  • the method for adjusting a video window in the video conference includes:
  • Step S10 acquiring audio data of all participants
  • the video conference client When a user needs to perform a video conference with multiple other users, the video conference client is started, and other users who need to participate are invited to join the video conference, that is, a session communication environment with other users is established. After the video conference is successfully created, the audio data of all participants is obtained.
  • the audio data includes the voice of the participant and/or the environmental noise of the environment in which the participant is located. Of course, not all of the audio data includes the above content, and may include one or more of them.
  • the audio data is a piece of audio data, that is, audio data sent by a video conference client user and audio noise data generated by the environment for a period of time.
  • the main body of the method for adjusting the video window in the video conference of the present invention may be a management terminal of the video conference, and further, may be a video conference window adjustment software installed on the management terminal, where the management terminal may be a server, a desktop computer, or a notebook computer. , pad and other electronic terminals.
  • Step S20 sampling the obtained audio data of each participant to generate audio sampling data, and separately counting the number of times the audio sample data of each participant appears in the preset frequency band;
  • the process of sampling the acquired audio data of each participant to generate audio sample data includes: extracting preset length audio data from the acquired individual audio data to generate audio sample data. For example, audio data of a length of 600 ms is extracted.
  • the manner of extracting the preset length of audio data may be starting from the beginning position of the audio data, or extracting from the end position of the audio data, or randomly extracting audio data of a preset length therefrom.
  • the preset length may also be a length of 1000 ms and 500 ms.
  • the calculation benchmarks of the respective participants are the same, that is, the length of the audio data of each participant as the sample is guaranteed to be the same, and the calculation accuracy is ensured.
  • the audio sample data corresponding to each audio data is generated, the number of occurrences of each audio sample data in a preset frequency band is determined, and the preset frequency band may be one frequency band or multiple frequency bands. If the audio data is a sound emitted by a person, the range of the preset frequency band is 250HZ-2000HZ, and the range of the preset frequency band may be appropriately adjusted according to different sounds of the person.
  • the preset frequency band when the preset frequency band is a frequency band, it may be 250HZ-600HZ, including the endpoints 250HZ and 600HZ, or 600HZ-1000HZ, including the endpoints 600HZ and 1000HZ, or 1500HZ-2000HZ, including the endpoints 1500HZ and 2000HZ.
  • the preset frequency band is multiple frequency bands, according to different frequencies, it can be divided into high frequency band 850HZ-2000HZ, including end points 850HZ and 2000HZ, middle frequency band 550HZ-850HZ, excluding end points 550HZ and 850HZ, low frequency band 250HZ-550HZ , including endpoints 250HZ and 550HZ.
  • the preset frequency band is 250.
  • HZ-600HZ including endpoints 250HZ and 600HZ
  • the audio sample data corresponding to participant A is a
  • the audio sample data corresponding to participant B is b
  • the audio sample data corresponding to participant C is c, from a, b, c
  • b is n1, n2 and
  • Step S30 Determine the current speaker according to the number of times the audio sample data of each participant appears in the preset frequency band, and highlight the video window corresponding to the current speaker.
  • the process of determining the current speaker according to the number of times the audio sample data of each participant appears in the preset frequency band includes:
  • Step S31 determining audio sample data with the highest number of occurrences in the preset frequency band
  • step S32 the participant corresponding to the audio sample data corresponding to the highest number of times is determined as the current speaker.
  • the audio data having the highest number of occurrences is obtained from the determined number of occurrences. For example, there are 3 participants A, B, and C. The determined number of occurrences is 3, 4, and 2, respectively. The participant with the highest number of occurrences is B, and the participant B is the current speaker. The video data corresponding to participant B is displayed.
  • the highlighting manner of highlighting the video data corresponding to the current speaker may be: displaying only the video window of the current speaker; or displaying the video of the current speaker in a ratio larger than other speaker video data screens. Displaying the video data of the current speaker with a preset identifier, which may be setting the screen of the displayed video data to green, yellow, red, and the like.
  • all current speakers are sequentially displayed according to a preset display rule. For example, if the number of current speakers is three, the 2/3 video window displays the first speaker. The remaining 2/3 of the video window displays the second speaker, and the last remaining window displays the third speaker and so on.
  • the first speaker, the second speaker, and the third speaker may be determined by the length of the acquired audio data, or the first speaker, the second speaker, and the third speaker may be determined by the order of the number of occurrences (number of times) The highest first spokesperson, the second highest is the second spokesperson, and the rest is the third spokesperson).
  • the obtained audio data is sampled, and the number of occurrences of the preset frequency band in the sampled data is separately determined, and the video data of the participant corresponding to the audio data with the highest number of occurrences is highlighted.
  • the automatic switching of the display screen of the video conference client is realized, and the current speaker is effectively and quickly determined and displayed on the display screen of the video conference client, thereby improving the experience of the video conference client.
  • FIG. 3 is a schematic flowchart diagram of a second embodiment of a method for adjusting a video window in a video conference according to the present invention. Based on the first embodiment of the method for adjusting a video window in the video conference, the step S32 may include:
  • Step S321 when the highest number of times is two or more, determining the frequency band with the highest frequency among the frequency bands corresponding to the most number of times;
  • Step S322 determining, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times.
  • a frequency band with the most occurrences is determined from each sampled data. For example, there are three participants A, B, and C. The highest number of occurrences of participant A is the high frequency band, the highest frequency of participant B is the low frequency band, and the highest number of occurrences of participant C is medium. Frequency band. For example, it is determined that the highest frequency of the occurrence of the high frequency band of the participant A is three times, and that the number of occurrences of the middle frequency band of the participant B is determined to be up to four times, and the highest number of occurrences of the low frequency band of the participant C is determined to be four times, and the highest occurrence number is determined.
  • the participant corresponding to the highest frequency audio data in the frequency band with the highest occurrence frequency is regarded as the current speaker.
  • Participant B will be the current speaker, highlighting participant B's video window. If the frequency band with the highest number of times does not appear in the determined frequency band, the video window of the participant corresponding to the frequency band with the highest occurrence frequency is highlighted.
  • the preset frequency band can also be two frequency bands, four frequency bands, etc., and the specific frequency allocation process can be freely set according to the effect expected by the user, for example, set to 250HZ-500HZ and 600HZ-1500HZ.
  • the frequency band is set to four frequency bands of 250HZ-500HZ, 550HZ-700HZ, 750HZ-1500HZ and 1600HZ-200HZ.
  • the determined maximum number of occurrences of the sampled audio data of each participant is two or more
  • the highest frequency participant in the frequency band with the same number of occurrences is used as the current speaker, and the corresponding video is highlighted. data.
  • FIG. 4 is a schematic flowchart diagram of a third embodiment of a method for adjusting a video window in a video conference according to the present invention. Based on the second embodiment of the method for adjusting the video window in the video conference, the step S322 may further include:
  • Step S3221 determining the frequency of the background noise of the same frequency band when the frequency bands corresponding to the highest frequency are the same;
  • step S3222 the participant corresponding to the audio data with the highest background noise frequency is used as the current speaker.
  • the highest number of occurrences of participant A is the high frequency band
  • the highest frequency of participant B is the low frequency band
  • the highest number of occurrences of the participant C is the low frequency.
  • Frequency band It is determined that the number of occurrences of the high frequency band of the participant A is three times, that the number of occurrences of the middle frequency band of the participant B is determined to be four times, and that the number of occurrences of the low frequency band of the participant C is determined to be four times, and it is determined that the frequency band with the highest number of occurrences has the same frequency band.
  • the frequency of the background noise of the audio data of the frequency band with the highest occurrence frequency is obtained, for example, the frequency of the background noise of the audio data of the participant B and the participant C is obtained, respectively.
  • the participant C corresponding to the audio data with the highest background noise frequency is used as the current speaker. Highlight the audio data of participant C.
  • the participant corresponding to the audio data with the highest background noise frequency in the audio data with the same frequency band is used as the current speaker. And highlight its corresponding video data. Realizing automatic switching of the display screen of the video conference client, effectively and quickly determining and displaying the current speaker in the display screen of the video conference client, improves the experience of the video conference client, and makes the current speaker's locking more accurate.
  • the present invention further provides an apparatus for adjusting a video window in a video conference.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a video window adjusting apparatus in a video conference according to the present invention.
  • the adjusting device of the video window in the video conference comprises: an obtaining module 10, a processing module 20, and a display module 30.
  • the obtaining 10 is used to acquire audio data of all participants
  • the video conference client When a user needs to perform a video conference with multiple other users, the video conference client is started, and other users who need to participate are invited to join the video conference, that is, a session communication environment with other users is established. After the video conference is successfully created, the audio data of all participants is obtained.
  • the audio data includes the voice of the participant and/or the environmental noise of the environment in which the participant is located. Of course, not all of the audio data includes the above content, and may include one or more of them.
  • the audio data is a piece of audio data, that is, audio data sent by a video conference client user and audio noise data generated by the environment for a period of time.
  • the processing module 20 is configured to sample the obtained audio data of each participant to generate audio sampling data, and separately count the number of times the audio sample data of each participant appears in the preset frequency band;
  • the obtaining module 20 samples the acquired audio data of each participant to generate audio sample data, including: extracting preset length audio data from the acquired individual audio data to generate audio sample data. For example, audio data of a length of 600 ms is extracted.
  • the manner of extracting the preset length of audio data may be starting from the beginning position of the audio data, or extracting from the end position of the audio data, or randomly extracting audio data of a preset length therefrom.
  • the preset length may also be a length of 1000 ms and 500 ms.
  • the calculation benchmarks of the respective participants are the same, that is, the length of the audio data of each participant as the sample is guaranteed to be the same, and the calculation accuracy is ensured.
  • the audio sample data corresponding to each audio data is generated, the number of occurrences of each audio sample data in a preset frequency band is determined, and the preset frequency band may be one frequency band or multiple frequency bands. If the audio data is a sound emitted by a person, the range of the preset frequency band is 250HZ-2000HZ, and the range of the preset frequency band may be appropriately adjusted according to different sounds of the person.
  • the preset frequency band when the preset frequency band is a frequency band, it may be 250HZ-600HZ, including the endpoints 250HZ and 600HZ, or 600HZ-1000HZ, including the endpoints 600HZ and 1000HZ, or 1500HZ-2000HZ, including the endpoints 1500HZ and 2000HZ.
  • the preset frequency band is multiple frequency bands, according to different frequencies, it can be divided into high frequency band 850HZ-2000HZ, including end points 850HZ and 2000HZ, middle frequency band 550HZ-850HZ, excluding end points 550HZ and 850HZ, low frequency band 250HZ-550HZ , including endpoints 250HZ and 550HZ.
  • the preset frequency band is 250.
  • HZ-600HZ including endpoints 250HZ and 600HZ
  • the audio sample data corresponding to participant A is a
  • the audio sample data corresponding to participant B is b
  • the audio sample data corresponding to participant C is c, from a, b, c
  • b is n1, n2 and
  • the processing module 20 is further configured to determine, according to the number of times the audio sample data of each participant appears in the preset frequency band, the current speaker;
  • the display module 30 is configured to highlight a video window corresponding to the current speaker.
  • the processing module 20 includes a determining unit 21 and a processing unit 22,
  • the determining unit 21 is configured to determine audio sample data with the highest number of occurrences in the preset frequency band
  • the processing unit 22 is configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the highest number of times.
  • the audio data having the highest number of occurrences is obtained from the determined number of occurrences. For example, there are 3 participants A, B, and C. The determined number of occurrences is 3, 4, and 2, respectively. The participant with the highest number of occurrences is B, and the participant B is the current speaker. The video data corresponding to participant B is displayed.
  • the highlighting manner of highlighting the video data corresponding to the current speaker may be: displaying only the video window of the current speaker; or displaying the video of the current speaker in a ratio larger than other speaker video data screens. Displaying the video data of the current speaker with a preset identifier, which may be setting the screen of the displayed video data to green, yellow, red, and the like.
  • all current speakers are sequentially displayed according to a preset display rule. For example, if the number of current speakers is three, the 2/3 video window displays the first speaker. The remaining 2/3 of the video window displays the second speaker, and the last remaining window displays the third speaker and so on.
  • the first speaker, the second speaker, and the third speaker may be determined by the length of the acquired audio data, or the first speaker, the second speaker, and the third speaker may be determined by the order of the number of occurrences (number of times) The highest first spokesperson, the second highest is the second spokesperson, and the rest is the third spokesperson).
  • the obtained audio data is sampled, and the number of occurrences of the preset frequency band in the sampled data is separately determined, and the video data of the participant corresponding to the audio data with the highest number of occurrences is highlighted.
  • the automatic switching of the display screen of the video conference client is realized, and the current speaker is effectively and quickly determined and displayed on the display screen of the video conference client, thereby improving the experience of the video conference client.
  • the determining unit 21 is further configured to determine, when the highest number of times is two or more, the frequency band with the highest frequency among the frequency bands corresponding to the most
  • the processing unit 22 is further configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times.
  • a frequency band with the most occurrences is determined from each sampled data. For example, there are three participants A, B, and C. The highest number of occurrences of participant A is the high frequency band, the highest frequency of participant B is the low frequency band, and the highest number of occurrences of participant C is medium. Frequency band. For example, it is determined that the highest frequency of the occurrence of the high frequency band of the participant A is three times, and that the number of occurrences of the middle frequency band of the participant B is determined to be up to four times, and the highest number of occurrences of the low frequency band of the participant C is determined to be four times, and the highest occurrence number is determined.
  • the participant corresponding to the highest frequency audio data in the frequency band with the highest occurrence frequency is regarded as the current speaker.
  • Participant B will be the current speaker, highlighting participant B's video window. If the frequency band with the highest number of times does not appear in the determined frequency band, the video window of the participant corresponding to the frequency band with the highest occurrence frequency is highlighted.
  • the preset frequency band can also be two frequency bands, four frequency bands, etc., and the specific frequency allocation process can be freely set according to the effect expected by the user, for example, set to 250HZ-500HZ and 600HZ-1500HZ.
  • the frequency band is set to four frequency bands of 250HZ-500HZ, 550HZ-700HZ, 750HZ-1500HZ and 1600HZ-200HZ.
  • the determined maximum number of occurrences of the sampled audio data of each participant is two or more
  • the highest frequency participant in the frequency band with the same number of occurrences is used as the current speaker, and the corresponding video is highlighted. data.
  • the determining unit 21 is further configured to determine a frequency of background noise of the same frequency band when the frequency bands corresponding to the highest number of times are the same;
  • the processing unit 22 is further configured to use the participant corresponding to the audio data with the highest background noise as the current speaker.
  • the highest number of occurrences of participant A is the high frequency band
  • the highest frequency of participant B is the low frequency band
  • the highest number of occurrences of the participant C is the low frequency.
  • Frequency band It is determined that the number of occurrences of the high frequency band of the participant A is three times, that the number of occurrences of the middle frequency band of the participant B is determined to be four times, and that the number of occurrences of the low frequency band of the participant C is determined to be four times, and it is determined that the frequency band with the highest number of occurrences has the same frequency band.
  • the frequency of the background noise of the audio data of the frequency band with the highest occurrence frequency is obtained, for example, the frequency of the background noise of the audio data of the participant B and the participant C is obtained, respectively.
  • the participant C corresponding to the audio data with the highest background noise frequency is used as the current speaker. Highlight the audio data of participant C.
  • the participant corresponding to the audio data with the highest background noise frequency in the audio data with the same frequency band is used as the current speaker. And highlight its corresponding video data. Realizing automatic switching of the display screen of the video conference client, effectively and quickly determining and displaying the current speaker in the display screen of the video conference client, improves the experience of the video conference client, and makes the current speaker's locking more accurate.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Disclosed is a method for adjusting a video window in a video conference. The method for adjusting a video window in a video conference comprises the following steps: acquiring audio data of all the participants; sampling the acquired audio data of each participant to generate audio sampling data, and respectively counting the number of appearance times of the audio sampling data of each participant in a pre-set frequency band; and according to the number of appearance times of the audio sampling data of each participant in the pre-set frequency band, determining a current speaker, and highlighting a video window corresponding to the current speaker. Also disclosed is a device for adjusting a video window in a video conference. The present invention realizes automatic switching of display pictures of a video conference client, which effectively and rapidly determines and displays a current speaker in the display picture of the video conference client, thereby improving the experience of the video conference client.

Description

视频会议中视频窗口的调整方法及装置  Method and device for adjusting video window in video conference
技术领域Technical field
本发明涉及视频会议技术领域,尤其涉及视频会议中视频窗口的调整方法及装置。The present invention relates to the field of video conference technologies, and in particular, to a method and an apparatus for adjusting a video window in a video conference.
背景技术Background technique
视频会议是一种常用的现代会话手段。通过视频会议,与会者之间不论处于何方,都可以在视频会议客户端通过语音和视频进行交流,非常方便。Video conferencing is a common means of modern conversation. Through video conferencing, it is very convenient to communicate between voice and video on the video conference client regardless of where they are.
在举行视频会议时,每个视频会议客户端无差别的显示所有与会者画面,或者由管理员手动切换各个视频会议客户端的显示画面,或者由各个与会者手动切换自身视频会议客户端的显示画面。其中,所述显示画面的切换包括显示画面中显示与会者画面的个数,显示画面的大小等。When a video conference is held, each video conference client displays all participant screens indiscriminately, or the administrator manually switches the display screen of each video conference client, or each participant manually switches the display screen of the video conference client. The switching of the display screen includes displaying the number of the participant screens on the display screen, the size of the display screen, and the like.
然而,在视频会议的与会者较多,且在会议进行中参与发言的与会者较多时,需要不断的手动切换视频会议客户端的显示画面,使得视频会议客户端的显示画面的切换过程复杂,且不够智能,无法有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,降低了视频会议客户端的体验。However, when there are many participants in the video conference, and there are many participants participating in the conference during the conference, it is necessary to manually switch the display screen of the video conference client, so that the process of switching the video conference client is complicated and insufficient. Intelligent, unable to effectively and quickly determine and display the current speaker in the video conferencing client's display, reducing the experience of the video conferencing client.
上述内容仅用于辅助理解本发明的技术方案,并不代表承认上述内容是现有技术。The above content is only used to assist in understanding the technical solutions of the present invention, and does not constitute an admission that the above is prior art.
发明内容Summary of the invention
本发明的主要目的在于解决视频会议客户端的显示画面的切换过程复杂,且不够智能,无法有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,降低了视频会议客户端的体验。The main purpose of the present invention is to solve the complicated process of switching the display screen of the video conference client, and it is not smart enough to effectively and quickly determine and display the current speaker in the display screen of the video conference client, thereby reducing the experience of the video conference client. .
为实现上述目的,本发明提供的一种视频会议中视频窗口的调整方法,所述视频会议中视频窗口的调整方法包括以下步骤:To achieve the above objective, the present invention provides a method for adjusting a video window in a video conference, and the method for adjusting a video window in the video conference includes the following steps:
获取所有与会者的音频数据;Get audio data of all participants;
对获取的各个与会者的音频数据进行采样生成音频采样数据,分别统计各个与会者的音频采样数据在预设频段中出现的次数;The obtained audio data of each participant is sampled to generate audio sampling data, and the number of times the audio sample data of each participant appears in the preset frequency band is separately counted;
根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人,并突出显示所述当前发言人对应的视频窗口。The current speaker is determined according to the number of times the audio sample data of each participant appears in the preset frequency band, and the video window corresponding to the current speaker is highlighted.
优选地,所述对获取的各个与会者的音频数据进行采样生成音频采样数据的步骤包括:Preferably, the step of sampling the acquired audio data of each participant to generate audio sample data includes:
从获取的各个音频数据中提取预设长度的音频数据生成音频采样数据。Extracting audio data of a preset length from the acquired individual audio data to generate audio sample data.
优选地,所述根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人的步骤包括:Preferably, the step of determining the current speaker according to the number of times the audio sample data of each participant appears in the preset frequency band comprises:
确定在预设频段中出现的次数最高的音频采样数据;Determining the highest frequency of audio sample data occurring in the preset frequency band;
将最高次数对应的音频采样数据所对应的与会者确定为当前发言人。The participant corresponding to the audio sample data corresponding to the highest number of times is determined as the current speaker.
优选地,所述将最高次数对应的音频采样数据所对应的与会者确定为当前发言人的步骤包括:Preferably, the step of determining the participant corresponding to the audio sample data corresponding to the highest number of times as the current speaker includes:
当最高次数为两个或两个以上时,确定最该次数对应的频段中频率最高的频段;When the highest number of times is two or more, the frequency band having the highest frequency among the frequency bands corresponding to the most number of times is determined;
将最高次数对应的频段中频率最高的频段对应的音频采样数据所对应的与会者确定为当前发言人。The participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times is determined as the current speaker.
优选地,所述将最高次数对应的频段中频率最高的频段对应的音频采样数据所对应的与会者确定为当前发言人的步骤包括:Preferably, the step of determining, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency in the frequency band corresponding to the highest number of times includes:
当最高次数对应的频段中频率最高的频段相同时,确定频段最高的频段背景噪音的频率;When the frequency band with the highest frequency in the frequency band corresponding to the highest frequency is the same, the frequency of the background noise of the frequency band with the highest frequency band is determined;
将背景噪音的频率最大的音频数据所对应的与会者作为当前发言人。The participant corresponding to the audio data with the highest background noise frequency is used as the current speaker.
优选地,所述突出显示的方式包括:Preferably, the manner of highlighting comprises:
只显示所述当前发言人的视频窗口;Display only the video window of the current speaker;
或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;Or displaying the video window of the current speaker in a ratio larger than other speaker video data screens;
或以预设标识显示所述当前发言人的视频窗口。Or display the video window of the current speaker with a preset identifier.
此外,为实现上述目的,本发明还提供一种视频会议中视频窗口的调整装置,所述视频会议中视频窗口的调整装置包括:In addition, in order to achieve the above object, the present invention further provides an apparatus for adjusting a video window in a video conference, where the apparatus for adjusting a video window in the video conference includes:
获取模块,用于获取所有与会者的音频数据;An acquisition module, configured to acquire audio data of all participants;
处理模块,用于对获取的各个与会者的音频数据进行采样生成音频采样数据,分别统计各个与会者的音频采样数据在预设频段中出现的次数;还用于根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人;The processing module is configured to sample the obtained audio data of each participant to generate audio sampling data, and separately count the number of times the audio sample data of each participant appears in the preset frequency band; and also use the audio sampling data according to each participant The number of occurrences in the preset frequency band determines the current speaker;
显示模块,用于突出显示所述当前发言人对应的视频窗口。a display module, configured to highlight a video window corresponding to the current speaker.
优选地,所述处理模块,还用于从获取的各个音频数据中提取预设长度的音频数据生成音频采样数据。Preferably, the processing module is further configured to extract audio data of a preset length from the acquired individual audio data to generate audio sample data.
优选地,所述处理模块包括确定单元和处理单元,Preferably, the processing module comprises a determining unit and a processing unit,
所述确定单元,用于确定在预设频段中出现的次数最高的音频采样数据;The determining unit is configured to determine audio sample data with the highest number of occurrences in the preset frequency band;
所述处理单元,还用于将最高次数对应的音频采样数据所对应的与会者确定为当前发言人。The processing unit is further configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the highest number of times.
优选地,所述确定单元,还用于当最高次数为两个或两个以上时,确定最该次数对应的频段中频率最高的频段;Preferably, the determining unit is further configured to determine, when the highest number of times is two or more, the frequency band with the highest frequency in the frequency band corresponding to the most
所述处理单元,还用于将最高次数对应的频段中频率最高的频段对应的音频采样数据所对应的与会者确定为当前发言人。The processing unit is further configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times.
优选地,所述确定单元,还用于当最高次数对应的频段中频率最高的频段相同时,确定频段最高的频段背景噪音的频率;Preferably, the determining unit is further configured to: when the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times is the same, determine the frequency of the background noise of the frequency band with the highest frequency band;
所述处理单元,还用于将背景噪音的频率最大的音频数据所对应的与会者作为当前发言人。The processing unit is further configured to use the participant corresponding to the audio data with the highest background noise as the current speaker.
优选地,所述显示模块,还用于只显示所述当前发言人的视频窗口;Preferably, the display module is further configured to display only the video window of the current speaker;
或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;Or displaying the video window of the current speaker in a ratio larger than other speaker video data screens;
或以预设标识显示所述当前发言人的视频窗口。Or display the video window of the current speaker with a preset identifier.
本发明通过对获取的各个音频数据进行采样,并分别确定采样数据中预设频段的出现次数,并突出显示出现次数最高的音频数据对应的与会者的视频数据。实现视频会议客户端的显示画面的自动切换,有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,提高了视频会议客户端的体验。The invention samples the obtained audio data, determines the number of occurrences of the preset frequency band in the sampled data, and highlights the video data of the participant corresponding to the audio data with the highest number of occurrences. The automatic switching of the display screen of the video conference client is realized, and the current speaker is effectively and quickly determined and displayed on the display screen of the video conference client, thereby improving the experience of the video conference client.
附图说明DRAWINGS
图1为本发明视频会议中视频窗口的调整方法的第一实施例的流程示意图;1 is a schematic flowchart of a first embodiment of a method for adjusting a video window in a video conference according to the present invention;
图2为图1中步骤S30一实施例的细化流程示意图;2 is a schematic flow chart of an embodiment of step S30 of FIG. 1;
图3为本发明视频会议中视频窗口的调整方法的第二实施例的流程示意图;3 is a schematic flowchart of a second embodiment of a method for adjusting a video window in a video conference according to the present invention;
图4为本发明视频会议中视频窗口的调整方法的第三实施例的流程示意图;4 is a schematic flowchart diagram of a third embodiment of a method for adjusting a video window in a video conference according to the present invention;
图5为本发明视频会议中视频窗口的调整装置的较佳实施例的功能模块示意图;5 is a schematic diagram of functional modules of a preferred embodiment of a video window adjusting apparatus in a video conference according to the present invention;
图6为图5中处理模块一实施例的细化功能模块示意图。FIG. 6 is a schematic diagram of a refinement function module of an embodiment of the processing module of FIG. 5. FIG.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.
具体实施方式detailed description
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
本发明提供一种视频会议中视频窗口的调整方法。The invention provides a method for adjusting a video window in a video conference.
参照图1,图1为本发明视频会议中视频窗口的调整方法的第一实施例的流程示意图。Referring to FIG. 1, FIG. 1 is a schematic flowchart diagram of a first embodiment of a method for adjusting a video window in a video conference according to the present invention.
在一实施例中,所述视频会议中视频窗口的调整方法包括:In an embodiment, the method for adjusting a video window in the video conference includes:
步骤S10,获取所有与会者的音频数据;Step S10, acquiring audio data of all participants;
在用户需要与多个其他用户进行视频会议时,开启视频会议客户端,并邀请需要参加的其他用户加入视频会议中,即建立与其他用户之间的会话通信环境。在成功创建视频会议后,获取所有与会者的音频数据。所述音频数据包括与会者的说话声音及/后与会者所处环境的环境噪音等。当然并不是每一个音频数据中都包括上述内容,可以是包括其中的一种或者几种。所述音频数据为一段音频数据,即为一段时间内视频会议客户端用户发出的音频数据及或环境产生的噪音音频数据。执行本发明视频会议中视频窗口的调整方法的主体可以是视频会议的管理终端,进一步地,可以是安装在管理终端的视频会议窗口调整软件,所述管理终端可以是服务器、台式机、笔记本电脑、pad等电子终端。When a user needs to perform a video conference with multiple other users, the video conference client is started, and other users who need to participate are invited to join the video conference, that is, a session communication environment with other users is established. After the video conference is successfully created, the audio data of all participants is obtained. The audio data includes the voice of the participant and/or the environmental noise of the environment in which the participant is located. Of course, not all of the audio data includes the above content, and may include one or more of them. The audio data is a piece of audio data, that is, audio data sent by a video conference client user and audio noise data generated by the environment for a period of time. The main body of the method for adjusting the video window in the video conference of the present invention may be a management terminal of the video conference, and further, may be a video conference window adjustment software installed on the management terminal, where the management terminal may be a server, a desktop computer, or a notebook computer. , pad and other electronic terminals.
可以理解的是,为了能更快的创建视频会议,在向其他用户发出邀请时,同时发送一个检测数据包,在接收到其他用户发送的基于检测数据包的响应数据包时,判定成功创建与接收到响应数据包的视频客户端的会话通信环境;在未接收到响应数据包时,提示用户视频会议创建失败,以供视频会议客户端通过其他方式联系未接收到响应数据包的视频会议客户端用户尽快建立会话通信环境,其他方式可以是短信、电话、邮件等。It can be understood that, in order to create a video conference faster, when an invitation is sent to other users, a detection data packet is simultaneously sent, and when a response packet based on the detection data packet sent by another user is received, the determination is successfully created. The session communication environment of the video client receiving the response packet; prompting the user that the video conference creation fails when the response packet is not received, so that the video conference client contacts the video conference client that does not receive the response packet by other means The user establishes a session communication environment as soon as possible, and other methods can be short messages, telephone calls, emails, and the like.
步骤S20,对获取的各个与会者的音频数据进行采样生成音频采样数据,分别统计各个与会者的音频采样数据在预设频段中出现的次数;Step S20: sampling the obtained audio data of each participant to generate audio sampling data, and separately counting the number of times the audio sample data of each participant appears in the preset frequency band;
在获取到所有与会者的音频数据后,对获取的各个音频数据进行采样生成音频采样数据。所述对获取的各个与会者的音频数据进行采样生成音频采样数据的过程包括:从获取的各个音频数据中提取预设长度的音频数据生成音频采样数据。例如,提取600ms时间长度的音频数据。提取预设长度的音频数据的方式可以是从音频数据的开始位置开始提取,或者从音频数据的结束位置开始提取,或者从中随机提取预设长度的音频数据。当然,所述预设长度也还可以是1000ms、500ms时间长度。通过提取预设长度的音频数据,使得各个与会者的计算基准相同,即保证每个与会者作为样本的音频数据的长度相同,保证了计算的准确性。在生成各个音频数据对应的音频采样数据后,分别确定各个音频采样数据在预设频段中的出现次数,所述预设频段可以是一个频段或多个频段。若所述音频数据为人发出的声音,则所述预设频段的范围为250HZ-2000HZ,也还可以根据人发出声音不同适当调整预设频段的范围。当所述预设频段为一个频段时,可以是250HZ-600HZ,包括端点250HZ和600HZ,或者是600HZ-1000HZ,包括端点600HZ和1000HZ,或者是1500HZ-2000HZ,包括端点1500HZ和2000HZ。当所述预设频段为多个频段时,按照频率的不同,可以分为高频段850HZ-2000HZ,包括端点850HZ和2000HZ,中频段550HZ-850HZ,不包括端点550HZ和850HZ,低频段250HZ-550HZ,包括端点250HZ和550HZ。例如,在视频会议中存在3个与会者A、B和C时,所述预设频段为250 HZ-600HZ,包括端点250HZ和600HZ,与会者A对应的音频采样数据为a,与会者B对应的音频采样数据为b,与会者C对应的音频采样数据为c,从a、b、c中确定的预设频段250 HZ-600HZ的出现次数分别为m次、n次和s次;若预设频段为三个,分别为高频段850HZ-2000HZ,包括端点850HZ和2000HZ,中频段550-850,不包括端点550HZ和850HZ,低频段250HZ-550HZ,包括端点250HZ和550HZ,确定的预设频段的出现次数a的为m1、m2和m3次,b的为n1、n2和n3次,c的为s1、s2和s3次,顺序依次为高频段、中频段和低频段。After the audio data of all the participants is acquired, the obtained individual audio data is sampled to generate audio sample data. The process of sampling the acquired audio data of each participant to generate audio sample data includes: extracting preset length audio data from the acquired individual audio data to generate audio sample data. For example, audio data of a length of 600 ms is extracted. The manner of extracting the preset length of audio data may be starting from the beginning position of the audio data, or extracting from the end position of the audio data, or randomly extracting audio data of a preset length therefrom. Of course, the preset length may also be a length of 1000 ms and 500 ms. By extracting the preset length of audio data, the calculation benchmarks of the respective participants are the same, that is, the length of the audio data of each participant as the sample is guaranteed to be the same, and the calculation accuracy is ensured. After the audio sample data corresponding to each audio data is generated, the number of occurrences of each audio sample data in a preset frequency band is determined, and the preset frequency band may be one frequency band or multiple frequency bands. If the audio data is a sound emitted by a person, the range of the preset frequency band is 250HZ-2000HZ, and the range of the preset frequency band may be appropriately adjusted according to different sounds of the person. When the preset frequency band is a frequency band, it may be 250HZ-600HZ, including the endpoints 250HZ and 600HZ, or 600HZ-1000HZ, including the endpoints 600HZ and 1000HZ, or 1500HZ-2000HZ, including the endpoints 1500HZ and 2000HZ. When the preset frequency band is multiple frequency bands, according to different frequencies, it can be divided into high frequency band 850HZ-2000HZ, including end points 850HZ and 2000HZ, middle frequency band 550HZ-850HZ, excluding end points 550HZ and 850HZ, low frequency band 250HZ-550HZ , including endpoints 250HZ and 550HZ. For example, when there are 3 participants A, B, and C in a video conference, the preset frequency band is 250. HZ-600HZ, including endpoints 250HZ and 600HZ, the audio sample data corresponding to participant A is a, the audio sample data corresponding to participant B is b, and the audio sample data corresponding to participant C is c, from a, b, c Determined preset frequency band 250 The number of occurrences of HZ-600HZ is m times, n times and s times respectively; if the preset frequency band is three, respectively, the high frequency band 850HZ-2000HZ, including the endpoints 850HZ and 2000HZ, the middle frequency band 550-850, excluding the endpoint 550HZ and 850HZ, low frequency band 250HZ-550HZ, including endpoints 250HZ and 550HZ, the number of occurrences of the determined preset frequency band a is m1, m2 and m3 times, b is n1, n2 and n3 times, c is s1, s2 and s3 The order is in the order of high frequency band, medium frequency band and low frequency band.
步骤S30,根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人,并突出显示所述当前发言人对应的视频窗口。Step S30: Determine the current speaker according to the number of times the audio sample data of each participant appears in the preset frequency band, and highlight the video window corresponding to the current speaker.
具体的,参考图2,所述根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人的过程包括:Specifically, referring to FIG. 2, the process of determining the current speaker according to the number of times the audio sample data of each participant appears in the preset frequency band includes:
步骤S31,确定在预设频段中出现的次数最高的音频采样数据;Step S31, determining audio sample data with the highest number of occurrences in the preset frequency band;
步骤S32,将最高次数对应的音频采样数据所对应的与会者确定为当前发言人。In step S32, the participant corresponding to the audio sample data corresponding to the highest number of times is determined as the current speaker.
在分别确定各个音频采样数据中预设频段的出现次数后,从确定的出现次数中,获取出现次数最高的音频数据。例如,存在3个与会者A、B和C,确定的出现次数分别为3次、4次和2次;则确定的出现次数最高的与会者为B,将与会者B作为当前发言人,突出显示与会者B对应的视频数据。After determining the number of occurrences of the preset frequency band in each of the audio sample data, respectively, the audio data having the highest number of occurrences is obtained from the determined number of occurrences. For example, there are 3 participants A, B, and C. The determined number of occurrences is 3, 4, and 2, respectively. The participant with the highest number of occurrences is B, and the participant B is the current speaker. The video data corresponding to participant B is displayed.
所述突出显示所述当前发言人对应的视频数据的突出显示方式可以是:只显示所述当前发言人的视频窗口;或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;或以预设标识显示所述当前发言人的视频数据,所述预设标识可以是将显示的视频数据的画面设置为绿色、黄色、红色等。The highlighting manner of highlighting the video data corresponding to the current speaker may be: displaying only the video window of the current speaker; or displaying the video of the current speaker in a ratio larger than other speaker video data screens. Displaying the video data of the current speaker with a preset identifier, which may be setting the screen of the displayed video data to green, yellow, red, and the like.
当所述当前发言人存在多个时,按照预设的显示规则依次显示所有的当前发言人,例如,当前发言人的个数为3个,则2/3的视频窗口显示第一发言人,剩下视频窗口的2/3显示第二发言人,最后剩下的窗口显示第三发言人等。可以通过获取的音频数据的长度来确定第一发言人、第二发言人和第三发言人,或者通过出现次数的高低顺序来确定第一发言人、第二发言人和第三发言人(次数最高的第一发言人,次高的为第二发言人,剩下的为第三发言人)。When there are multiple current speakers, all current speakers are sequentially displayed according to a preset display rule. For example, if the number of current speakers is three, the 2/3 video window displays the first speaker. The remaining 2/3 of the video window displays the second speaker, and the last remaining window displays the third speaker and so on. The first speaker, the second speaker, and the third speaker may be determined by the length of the acquired audio data, or the first speaker, the second speaker, and the third speaker may be determined by the order of the number of occurrences (number of times) The highest first spokesperson, the second highest is the second spokesperson, and the rest is the third spokesperson).
本实施例通过对获取的各个音频数据进行采样,并分别确定采样数据中预设频段的出现次数,并突出显示出现次数最高的音频数据对应的与会者的视频数据。实现视频会议客户端的显示画面的自动切换,有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,提高了视频会议客户端的体验。In this embodiment, the obtained audio data is sampled, and the number of occurrences of the preset frequency band in the sampled data is separately determined, and the video data of the participant corresponding to the audio data with the highest number of occurrences is highlighted. The automatic switching of the display screen of the video conference client is realized, and the current speaker is effectively and quickly determined and displayed on the display screen of the video conference client, thereby improving the experience of the video conference client.
参照图3,图3为本发明视频会议中视频窗口的调整方法的第二实施例的流程示意图。基于上述视频会议中视频窗口的调整方法的第一实施例,所述步骤S32可以包括:Referring to FIG. 3, FIG. 3 is a schematic flowchart diagram of a second embodiment of a method for adjusting a video window in a video conference according to the present invention. Based on the first embodiment of the method for adjusting a video window in the video conference, the step S32 may include:
步骤S321,当最高次数为两个或两个以上时,确定最该次数对应的频段中频率最高的频段;Step S321, when the highest number of times is two or more, determining the frequency band with the highest frequency among the frequency bands corresponding to the most number of times;
步骤S322,将最高次数对应的频段中频率最高的频段对应的音频采样数据所对应的与会者确定为当前发言人。Step S322, determining, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times.
若存在三个预设频段,分别为高频段、中频段、低频段。分别从各个采样数据中确定一个出现次数最多的频段。例如,存在3个与会者A、B和C,从中获取的与会者A的出现次数最多的为高频段,与会者B的出现次数最高的为低频段,与会者C的出现次数最高的为中频段。例如,确定与会者A的高频段出现次数最高为3次,确定与会者B的中频段出现次数最高为4次,确定与会者C的低频段出现次数最高为4次,则确定存在最高出现次数相同的频段,在确定的频段中有最高出现次数相同的频段时,将最高出现次数相同的频段中频率最高的音频数据对应的与会者作为当前发言人。即将与会者B作为当前发言人,突出显示与会者B的视频窗口。若确定的频段中未出现最高次数相同的频段,则突出显示出现次数最高的频段对应的与会者的视频窗口。可以理解的是,所述预设频段也可以是2个频段、4个频段等,具体的频率分配过程可以根据用户预期的效果进行自由设置,例如,设置为250HZ-500HZ和600HZ-1500HZ两个频段,或者设置为250HZ-500HZ,550HZ-700HZ,750HZ-1500HZ和1600HZ-200HZ四个频段等。If there are three preset frequency bands, they are high frequency band, middle frequency band and low frequency band. A frequency band with the most occurrences is determined from each sampled data. For example, there are three participants A, B, and C. The highest number of occurrences of participant A is the high frequency band, the highest frequency of participant B is the low frequency band, and the highest number of occurrences of participant C is medium. Frequency band. For example, it is determined that the highest frequency of the occurrence of the high frequency band of the participant A is three times, and that the number of occurrences of the middle frequency band of the participant B is determined to be up to four times, and the highest number of occurrences of the low frequency band of the participant C is determined to be four times, and the highest occurrence number is determined. In the same frequency band, when there is a frequency band with the highest number of occurrences in the determined frequency band, the participant corresponding to the highest frequency audio data in the frequency band with the highest occurrence frequency is regarded as the current speaker. Participant B will be the current speaker, highlighting participant B's video window. If the frequency band with the highest number of times does not appear in the determined frequency band, the video window of the participant corresponding to the frequency band with the highest occurrence frequency is highlighted. It can be understood that the preset frequency band can also be two frequency bands, four frequency bands, etc., and the specific frequency allocation process can be freely set according to the effect expected by the user, for example, set to 250HZ-500HZ and 600HZ-1500HZ. The frequency band is set to four frequency bands of 250HZ-500HZ, 550HZ-700HZ, 750HZ-1500HZ and 1600HZ-200HZ.
本实施例通过确定的各个与会者的采样音频数据的最高出现次数有两个或以上相同时,将出现次数相同的频段中,频率最高的与会者作为当前发言人,并突出显示其对应的视频数据。实现视频会议客户端的显示画面的自动切换,有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,提高了视频会议客户端的体验,并使得当前发言人的锁定更加准确。In this embodiment, when the determined maximum number of occurrences of the sampled audio data of each participant is two or more, the highest frequency participant in the frequency band with the same number of occurrences is used as the current speaker, and the corresponding video is highlighted. data. Realizing automatic switching of the display screen of the video conference client, effectively and quickly determining and displaying the current speaker in the display screen of the video conference client, improves the experience of the video conference client, and makes the current speaker's locking more accurate.
参照图4,图4为本发明视频会议中视频窗口的调整方法的第三实施例的流程示意图。基于上述视频会议中视频窗口的调整方法的第二实施例,在所述步骤S322还可以包括:Referring to FIG. 4, FIG. 4 is a schematic flowchart diagram of a third embodiment of a method for adjusting a video window in a video conference according to the present invention. Based on the second embodiment of the method for adjusting the video window in the video conference, the step S322 may further include:
步骤S3221,当最高次数对应的频段相同时,确定相同频段的背景噪音的频率;Step S3221: determining the frequency of the background noise of the same frequency band when the frequency bands corresponding to the highest frequency are the same;
步骤S3222,将背景噪音的频率最大的音频数据所对应的与会者作为当前发言人。In step S3222, the participant corresponding to the audio data with the highest background noise frequency is used as the current speaker.
例如,存在3个与会者A、B和C,从中获取的与会者A的出现次数最高的为高频段,与会者B的出现次数最高的为低频段,与会者C的出现次数最高的为低频段。确定与会者A的高频段出现次数为3次,确定与会者B的中频段出现次数为4次,确定与会者C的低频段出现次数为4次,则确定存在最高出现次数相同的频段,在确定的频段中有最高出现次数相同的频段时,获取最高出现次数相同的频段的音频数据的背景噪音的频率,例如,获取与会者B和与会者C的音频数据的背景噪音的频率,若分别为100HZ和120HZ,则将背景噪音的频率最大的音频数据对应的与会者C作为当前发言人。突出显示与会者C的音频数据。For example, there are three participants A, B, and C. The highest number of occurrences of participant A is the high frequency band, the highest frequency of participant B is the low frequency band, and the highest number of occurrences of the participant C is the low frequency. Frequency band. It is determined that the number of occurrences of the high frequency band of the participant A is three times, that the number of occurrences of the middle frequency band of the participant B is determined to be four times, and that the number of occurrences of the low frequency band of the participant C is determined to be four times, and it is determined that the frequency band with the highest number of occurrences has the same frequency band. When there is a frequency band with the highest number of occurrences in the determined frequency band, the frequency of the background noise of the audio data of the frequency band with the highest occurrence frequency is obtained, for example, the frequency of the background noise of the audio data of the participant B and the participant C is obtained, respectively. For 100HZ and 120HZ, the participant C corresponding to the audio data with the highest background noise frequency is used as the current speaker. Highlight the audio data of participant C.
本实施例通过确定的各个与会者的采样音频数据的最高出现次数相同时,且最高的频段相同时,将频段相同的音频数据中背景噪音频率最大的音频数据对应的与会者作为当前发言人,并突出显示其对应的视频数据。实现视频会议客户端的显示画面的自动切换,有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,提高了视频会议客户端的体验,并使得当前发言人的锁定更加准确。In this embodiment, when the highest number of occurrences of the sampled audio data of each participant is the same, and the highest frequency band is the same, the participant corresponding to the audio data with the highest background noise frequency in the audio data with the same frequency band is used as the current speaker. And highlight its corresponding video data. Realizing automatic switching of the display screen of the video conference client, effectively and quickly determining and displaying the current speaker in the display screen of the video conference client, improves the experience of the video conference client, and makes the current speaker's locking more accurate.
本发明进一步提供一种视频会议中视频窗口的调整装置。The present invention further provides an apparatus for adjusting a video window in a video conference.
参照图5,图5为本发明视频会议中视频窗口的调整装置的第一实施例的功能模块示意图。Referring to FIG. 5, FIG. 5 is a schematic diagram of functional modules of a first embodiment of a video window adjusting apparatus in a video conference according to the present invention.
在一实施例中,所述视频会议中视频窗口的调整装置包括:获取模块10、处理模块20及显示模块30。In an embodiment, the adjusting device of the video window in the video conference comprises: an obtaining module 10, a processing module 20, and a display module 30.
所述获取10,用于获取所有与会者的音频数据;The obtaining 10 is used to acquire audio data of all participants;
在用户需要与多个其他用户进行视频会议时,开启视频会议客户端,并邀请需要参加的其他用户加入视频会议中,即建立与其他用户之间的会话通信环境。在成功创建视频会议后,获取所有与会者的音频数据。所述音频数据包括与会者的说话声音及/后与会者所处环境的环境噪音等。当然并不是每一个音频数据中都包括上述内容,可以是包括其中的一种或者几种。所述音频数据为一段音频数据,即为一段时间内视频会议客户端用户发出的音频数据及或环境产生的噪音音频数据。When a user needs to perform a video conference with multiple other users, the video conference client is started, and other users who need to participate are invited to join the video conference, that is, a session communication environment with other users is established. After the video conference is successfully created, the audio data of all participants is obtained. The audio data includes the voice of the participant and/or the environmental noise of the environment in which the participant is located. Of course, not all of the audio data includes the above content, and may include one or more of them. The audio data is a piece of audio data, that is, audio data sent by a video conference client user and audio noise data generated by the environment for a period of time.
可以理解的是,为了能更快的创建视频会议,在向其他用户发出邀请时,同时发送一个检测数据包,在接收到其他用户发送的基于检测数据包的响应数据包时,判定成功创建与接收到响应数据包的视频客户端的会话通信环境;在未接收到响应数据包时,提示用户视频会议创建失败,以供视频会议客户端通过其他方式联系未接收到响应数据包的视频会议客户端用户尽快建立会话通信环境,其他方式可以是短信、电话、邮件等。It can be understood that, in order to create a video conference faster, when an invitation is sent to other users, a detection data packet is simultaneously sent, and when a response packet based on the detection data packet sent by another user is received, the determination is successfully created. The session communication environment of the video client receiving the response packet; prompting the user that the video conference creation fails when the response packet is not received, so that the video conference client contacts the video conference client that does not receive the response packet by other means The user establishes a session communication environment as soon as possible, and other methods can be short messages, telephone calls, emails, and the like.
所述处理模块20,用于对获取的各个与会者的音频数据进行采样生成音频采样数据,分别统计各个与会者的音频采样数据在预设频段中出现的次数;The processing module 20 is configured to sample the obtained audio data of each participant to generate audio sampling data, and separately count the number of times the audio sample data of each participant appears in the preset frequency band;
在获取到所有与会者的音频数据后,对获取的各个音频数据进行采样生成音频采样数据。所述获取模块20对获取的各个与会者的音频数据进行采样生成音频采样数据的过程包括:从获取的各个音频数据中提取预设长度的音频数据生成音频采样数据。例如,提取600ms时间长度的音频数据。提取预设长度的音频数据的方式可以是从音频数据的开始位置开始提取,或者从音频数据的结束位置开始提取,或者从中随机提取预设长度的音频数据。当然,所述预设长度也还可以是1000ms、500ms时间长度。通过提取预设长度的音频数据,使得各个与会者的计算基准相同,即保证每个与会者作为样本的音频数据的长度相同,保证了计算的准确性。在生成各个音频数据对应的音频采样数据后,分别确定各个音频采样数据在预设频段中的出现次数,所述预设频段可以是一个频段或多个频段。若所述音频数据为人发出的声音,则所述预设频段的范围为250HZ-2000HZ,也还可以根据人发出声音不同适当调整预设频段的范围。当所述预设频段为一个频段时,可以是250HZ-600HZ,包括端点250HZ和600HZ,或者是600HZ-1000HZ,包括端点600HZ和1000HZ,或者是1500HZ-2000HZ,包括端点1500HZ和2000HZ。当所述预设频段为多个频段时,按照频率的不同,可以分为高频段850HZ-2000HZ,包括端点850HZ和2000HZ,中频段550HZ-850HZ,不包括端点550HZ和850HZ,低频段250HZ-550HZ,包括端点250HZ和550HZ。例如,在视频会议中存在3个与会者A、B和C时,所述预设频段为250 HZ-600HZ,包括端点250HZ和600HZ,与会者A对应的音频采样数据为a,与会者B对应的音频采样数据为b,与会者C对应的音频采样数据为c,从a、b、c中确定的预设频段250 HZ-600HZ的出现次数分别为m次、n次和s次;若预设频段为三个,分别为高频段850HZ-2000HZ,包括端点850HZ和2000HZ,中频段550-850,不包括端点550HZ和850HZ,低频段250HZ-550HZ,包括端点250HZ和550HZ,确定的预设频段的出现次数a的为m1、m2和m3次,b的为n1、n2和n3次,c的为s1、s2和s3次,顺序依次为高频段、中频段和低频段。After the audio data of all the participants is acquired, the obtained individual audio data is sampled to generate audio sample data. The obtaining module 20 samples the acquired audio data of each participant to generate audio sample data, including: extracting preset length audio data from the acquired individual audio data to generate audio sample data. For example, audio data of a length of 600 ms is extracted. The manner of extracting the preset length of audio data may be starting from the beginning position of the audio data, or extracting from the end position of the audio data, or randomly extracting audio data of a preset length therefrom. Of course, the preset length may also be a length of 1000 ms and 500 ms. By extracting the preset length of audio data, the calculation benchmarks of the respective participants are the same, that is, the length of the audio data of each participant as the sample is guaranteed to be the same, and the calculation accuracy is ensured. After the audio sample data corresponding to each audio data is generated, the number of occurrences of each audio sample data in a preset frequency band is determined, and the preset frequency band may be one frequency band or multiple frequency bands. If the audio data is a sound emitted by a person, the range of the preset frequency band is 250HZ-2000HZ, and the range of the preset frequency band may be appropriately adjusted according to different sounds of the person. When the preset frequency band is a frequency band, it may be 250HZ-600HZ, including the endpoints 250HZ and 600HZ, or 600HZ-1000HZ, including the endpoints 600HZ and 1000HZ, or 1500HZ-2000HZ, including the endpoints 1500HZ and 2000HZ. When the preset frequency band is multiple frequency bands, according to different frequencies, it can be divided into high frequency band 850HZ-2000HZ, including end points 850HZ and 2000HZ, middle frequency band 550HZ-850HZ, excluding end points 550HZ and 850HZ, low frequency band 250HZ-550HZ , including endpoints 250HZ and 550HZ. For example, when there are 3 participants A, B, and C in a video conference, the preset frequency band is 250. HZ-600HZ, including endpoints 250HZ and 600HZ, the audio sample data corresponding to participant A is a, the audio sample data corresponding to participant B is b, and the audio sample data corresponding to participant C is c, from a, b, c Determined preset frequency band 250 The number of occurrences of HZ-600HZ is m times, n times and s times respectively; if the preset frequency band is three, respectively, the high frequency band 850HZ-2000HZ, including the endpoints 850HZ and 2000HZ, the middle frequency band 550-850, excluding the endpoint 550HZ and 850HZ, low frequency band 250HZ-550HZ, including endpoints 250HZ and 550HZ, the number of occurrences of the determined preset frequency band a is m1, m2 and m3 times, b is n1, n2 and n3 times, c is s1, s2 and s3 The order is in the order of high frequency band, medium frequency band and low frequency band.
所述处理模块20,还用于根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人;The processing module 20 is further configured to determine, according to the number of times the audio sample data of each participant appears in the preset frequency band, the current speaker;
所述显示模块30,用于突出显示所述当前发言人对应的视频窗口。The display module 30 is configured to highlight a video window corresponding to the current speaker.
具体的,参考图6,所述处理模块20包括确定单元21和处理单元22,Specifically, referring to FIG. 6, the processing module 20 includes a determining unit 21 and a processing unit 22,
所述确定单元21,用于确定在预设频段中出现的次数最高的音频采样数据;The determining unit 21 is configured to determine audio sample data with the highest number of occurrences in the preset frequency band;
所述处理单元22,用于将最高次数对应的音频采样数据所对应的与会者确定为当前发言人。The processing unit 22 is configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the highest number of times.
在分别确定各个音频采样数据中预设频段的出现次数后,从确定的出现次数中,获取出现次数最高的音频数据。例如,存在3个与会者A、B和C,确定的出现次数分别为3次、4次和2次;则确定的出现次数最高的与会者为B,将与会者B作为当前发言人,突出显示与会者B对应的视频数据。After determining the number of occurrences of the preset frequency band in each of the audio sample data, respectively, the audio data having the highest number of occurrences is obtained from the determined number of occurrences. For example, there are 3 participants A, B, and C. The determined number of occurrences is 3, 4, and 2, respectively. The participant with the highest number of occurrences is B, and the participant B is the current speaker. The video data corresponding to participant B is displayed.
所述突出显示所述当前发言人对应的视频数据的突出显示方式可以是:只显示所述当前发言人的视频窗口;或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;或以预设标识显示所述当前发言人的视频数据,所述预设标识可以是将显示的视频数据的画面设置为绿色、黄色、红色等。The highlighting manner of highlighting the video data corresponding to the current speaker may be: displaying only the video window of the current speaker; or displaying the video of the current speaker in a ratio larger than other speaker video data screens. Displaying the video data of the current speaker with a preset identifier, which may be setting the screen of the displayed video data to green, yellow, red, and the like.
当所述当前发言人存在多个时,按照预设的显示规则依次显示所有的当前发言人,例如,当前发言人的个数为3个,则2/3的视频窗口显示第一发言人,剩下视频窗口的2/3显示第二发言人,最后剩下的窗口显示第三发言人等。可以通过获取的音频数据的长度来确定第一发言人、第二发言人和第三发言人,或者通过出现次数的高低顺序来确定第一发言人、第二发言人和第三发言人(次数最高的第一发言人,次高的为第二发言人,剩下的为第三发言人)。When there are multiple current speakers, all current speakers are sequentially displayed according to a preset display rule. For example, if the number of current speakers is three, the 2/3 video window displays the first speaker. The remaining 2/3 of the video window displays the second speaker, and the last remaining window displays the third speaker and so on. The first speaker, the second speaker, and the third speaker may be determined by the length of the acquired audio data, or the first speaker, the second speaker, and the third speaker may be determined by the order of the number of occurrences (number of times) The highest first spokesperson, the second highest is the second spokesperson, and the rest is the third spokesperson).
本实施例通过对获取的各个音频数据进行采样,并分别确定采样数据中预设频段的出现次数,并突出显示出现次数最高的音频数据对应的与会者的视频数据。实现视频会议客户端的显示画面的自动切换,有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,提高了视频会议客户端的体验。In this embodiment, the obtained audio data is sampled, and the number of occurrences of the preset frequency band in the sampled data is separately determined, and the video data of the participant corresponding to the audio data with the highest number of occurrences is highlighted. The automatic switching of the display screen of the video conference client is realized, and the current speaker is effectively and quickly determined and displayed on the display screen of the video conference client, thereby improving the experience of the video conference client.
进一步地,所述确定单元21,还用于当最高次数为两个或两个以上时,确定最该次数对应的频段中频率最高的频段;Further, the determining unit 21 is further configured to determine, when the highest number of times is two or more, the frequency band with the highest frequency among the frequency bands corresponding to the most
所述处理单元22,还用于将最高次数对应的频段中频率最高的频段对应的音频采样数据所对应的与会者确定为当前发言人。The processing unit 22 is further configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times.
若存在三个预设频段,分别为高频段、中频段、低频段。分别从各个采样数据中确定一个出现次数最多的频段。例如,存在3个与会者A、B和C,从中获取的与会者A的出现次数最多的为高频段,与会者B的出现次数最高的为低频段,与会者C的出现次数最高的为中频段。例如,确定与会者A的高频段出现次数最高为3次,确定与会者B的中频段出现次数最高为4次,确定与会者C的低频段出现次数最高为4次,则确定存在最高出现次数相同的频段,在确定的频段中有最高出现次数相同的频段时,将最高出现次数相同的频段中频率最高的音频数据对应的与会者作为当前发言人。即将与会者B作为当前发言人,突出显示与会者B的视频窗口。若确定的频段中未出现最高次数相同的频段,则突出显示出现次数最高的频段对应的与会者的视频窗口。可以理解的是,所述预设频段也可以是2个频段、4个频段等,具体的频率分配过程可以根据用户预期的效果进行自由设置,例如,设置为250HZ-500HZ和600HZ-1500HZ两个频段,或者设置为250HZ-500HZ,550HZ-700HZ,750HZ-1500HZ和1600HZ-200HZ四个频段等。If there are three preset frequency bands, they are high frequency band, middle frequency band and low frequency band. A frequency band with the most occurrences is determined from each sampled data. For example, there are three participants A, B, and C. The highest number of occurrences of participant A is the high frequency band, the highest frequency of participant B is the low frequency band, and the highest number of occurrences of participant C is medium. Frequency band. For example, it is determined that the highest frequency of the occurrence of the high frequency band of the participant A is three times, and that the number of occurrences of the middle frequency band of the participant B is determined to be up to four times, and the highest number of occurrences of the low frequency band of the participant C is determined to be four times, and the highest occurrence number is determined. In the same frequency band, when there is a frequency band with the highest number of occurrences in the determined frequency band, the participant corresponding to the highest frequency audio data in the frequency band with the highest occurrence frequency is regarded as the current speaker. Participant B will be the current speaker, highlighting participant B's video window. If the frequency band with the highest number of times does not appear in the determined frequency band, the video window of the participant corresponding to the frequency band with the highest occurrence frequency is highlighted. It can be understood that the preset frequency band can also be two frequency bands, four frequency bands, etc., and the specific frequency allocation process can be freely set according to the effect expected by the user, for example, set to 250HZ-500HZ and 600HZ-1500HZ. The frequency band is set to four frequency bands of 250HZ-500HZ, 550HZ-700HZ, 750HZ-1500HZ and 1600HZ-200HZ.
本实施例通过确定的各个与会者的采样音频数据的最高出现次数有两个或以上相同时,将出现次数相同的频段中,频率最高的与会者作为当前发言人,并突出显示其对应的视频数据。实现视频会议客户端的显示画面的自动切换,有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,提高了视频会议客户端的体验,并使得当前发言人的锁定更加准确。In this embodiment, when the determined maximum number of occurrences of the sampled audio data of each participant is two or more, the highest frequency participant in the frequency band with the same number of occurrences is used as the current speaker, and the corresponding video is highlighted. data. Realizing automatic switching of the display screen of the video conference client, effectively and quickly determining and displaying the current speaker in the display screen of the video conference client, improves the experience of the video conference client, and makes the current speaker's locking more accurate.
进一步地,所述确定单元21,还用于当最高次数对应的频段相同时,确定相同频段的背景噪音的频率;Further, the determining unit 21 is further configured to determine a frequency of background noise of the same frequency band when the frequency bands corresponding to the highest number of times are the same;
所述处理单元22,还用于将背景噪音的频率最大的音频数据所对应的与会者作为当前发言人。The processing unit 22 is further configured to use the participant corresponding to the audio data with the highest background noise as the current speaker.
例如,存在3个与会者A、B和C,从中获取的与会者A的出现次数最高的为高频段,与会者B的出现次数最高的为低频段,与会者C的出现次数最高的为低频段。确定与会者A的高频段出现次数为3次,确定与会者B的中频段出现次数为4次,确定与会者C的低频段出现次数为4次,则确定存在最高出现次数相同的频段,在确定的频段中有最高出现次数相同的频段时,获取最高出现次数相同的频段的音频数据的背景噪音的频率,例如,获取与会者B和与会者C的音频数据的背景噪音的频率,若分别为100HZ和120HZ,则将背景噪音的频率最大的音频数据对应的与会者C作为当前发言人。突出显示与会者C的音频数据。For example, there are three participants A, B, and C. The highest number of occurrences of participant A is the high frequency band, the highest frequency of participant B is the low frequency band, and the highest number of occurrences of the participant C is the low frequency. Frequency band. It is determined that the number of occurrences of the high frequency band of the participant A is three times, that the number of occurrences of the middle frequency band of the participant B is determined to be four times, and that the number of occurrences of the low frequency band of the participant C is determined to be four times, and it is determined that the frequency band with the highest number of occurrences has the same frequency band. When there is a frequency band with the highest number of occurrences in the determined frequency band, the frequency of the background noise of the audio data of the frequency band with the highest occurrence frequency is obtained, for example, the frequency of the background noise of the audio data of the participant B and the participant C is obtained, respectively. For 100HZ and 120HZ, the participant C corresponding to the audio data with the highest background noise frequency is used as the current speaker. Highlight the audio data of participant C.
本实施例通过确定的各个与会者的采样音频数据的最高出现次数相同时,且最高的频段相同时,将频段相同的音频数据中背景噪音频率最大的音频数据对应的与会者作为当前发言人,并突出显示其对应的视频数据。实现视频会议客户端的显示画面的自动切换,有效地、快速地确定并在视频会议客户端的显示画面中显示当前发言人,提高了视频会议客户端的体验,并使得当前发言人的锁定更加准确。In this embodiment, when the highest number of occurrences of the sampled audio data of each participant is the same, and the highest frequency band is the same, the participant corresponding to the audio data with the highest background noise frequency in the audio data with the same frequency band is used as the current speaker. And highlight its corresponding video data. Realizing automatic switching of the display screen of the video conference client, effectively and quickly determining and displaying the current speaker in the display screen of the video conference client, improves the experience of the video conference client, and makes the current speaker's locking more accurate.
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。The above are only the preferred embodiments of the present invention, and are not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformations made by the description of the present invention and the drawings are directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of the present invention.

Claims (16)

  1. 一种视频会议中视频窗口的调整方法,其特征在于,所述视频会议中视频窗口的调整方法包括以下步骤: A method for adjusting a video window in a video conference, wherein the method for adjusting a video window in the video conference comprises the following steps:
    获取所有与会者的音频数据;Get audio data of all participants;
    对获取的各个与会者的音频数据进行采样生成音频采样数据,分别统计各个与会者的音频采样数据在预设频段中出现的次数;The obtained audio data of each participant is sampled to generate audio sampling data, and the number of times the audio sample data of each participant appears in the preset frequency band is separately counted;
    根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人,并突出显示所述当前发言人对应的视频窗口。The current speaker is determined according to the number of times the audio sample data of each participant appears in the preset frequency band, and the video window corresponding to the current speaker is highlighted.
  2. 如权利要求1所述的视频会议中视频窗口的调整方法,其特征在于,所述对获取的各个与会者的音频数据进行采样生成音频采样数据的步骤包括:The method for adjusting a video window in a video conference according to claim 1, wherein the step of sampling the acquired audio data of each participant to generate audio sample data comprises:
    从获取的各个音频数据中提取预设长度的音频数据生成音频采样数据。Extracting audio data of a preset length from the acquired individual audio data to generate audio sample data.
  3. 如权利要求1所述的视频会议中视频窗口的调整方法,其特征在于,所述根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人的步骤包括:The method for adjusting a video window in a video conference according to claim 1, wherein the step of determining the current speaker according to the number of times the audio sample data of each participant appears in the preset frequency band comprises:
    确定在预设频段中出现的次数最高的音频采样数据;Determining the highest frequency of audio sample data occurring in the preset frequency band;
    将最高次数对应的音频采样数据所对应的与会者确定为当前发言人。The participant corresponding to the audio sample data corresponding to the highest number of times is determined as the current speaker.
  4. 如权利要求3所述的视频会议中视频窗口的调整方法,其特征在于,所述突出显示的方式包括:The method for adjusting a video window in a video conference according to claim 3, wherein the manner of highlighting comprises:
    只显示所述当前发言人的视频窗口;Display only the video window of the current speaker;
    或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;Or displaying the video window of the current speaker in a ratio larger than other speaker video data screens;
    或以预设标识显示所述当前发言人的视频窗口。Or display the video window of the current speaker with a preset identifier.
  5. 如权利要求3所述的视频会议中视频窗口的调整方法,其特征在于,所述将最高次数对应的音频采样数据所对应的与会者确定为当前发言人的步骤包括:The method for adjusting a video window in a video conference according to claim 3, wherein the step of determining the participant corresponding to the audio sample data corresponding to the highest number of times as the current speaker comprises:
    当最高次数为两个或两个以上时,确定最该次数对应的频段中频率最高的频段;When the highest number of times is two or more, the frequency band having the highest frequency among the frequency bands corresponding to the most number of times is determined;
    将最高次数对应的频段中频率最高的频段对应的音频采样数据所对应的与会者确定为当前发言人。The participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times is determined as the current speaker.
  6. 如权利要求5所述的视频会议中视频窗口的调整方法,其特征在于,所述突出显示的方式包括:The method for adjusting a video window in a video conference according to claim 5, wherein the manner of highlighting comprises:
    只显示所述当前发言人的视频窗口;Display only the video window of the current speaker;
    或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;Or displaying the video window of the current speaker in a ratio larger than other speaker video data screens;
    或以预设标识显示所述当前发言人的视频窗口。Or display the video window of the current speaker with a preset identifier.
  7. 如权利要求5所述的视频会议中视频窗口的调整方法,其特征在于,所述将最高次数对应的频段中频率最高的频段对应的音频采样数据所对应的与会者确定为当前发言人的步骤包括:The method for adjusting a video window in a video conference according to claim 5, wherein the step of determining the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency in the frequency band corresponding to the highest frequency as the current speaker include:
    当最高次数对应的频段相同时,确定相同频段的背景噪音的频率;Determining the frequency of the background noise of the same frequency band when the frequency bands corresponding to the highest frequency are the same;
    将背景噪音的频率最大的音频数据所对应的与会者作为当前发言人。The participant corresponding to the audio data with the highest background noise frequency is used as the current speaker.
  8. 如权利要求1所述的视频会议中视频窗口的调整方法,其特征在于,所述突出显示的方式包括:The method for adjusting a video window in a video conference according to claim 1, wherein the manner of highlighting comprises:
    只显示所述当前发言人的视频窗口;Display only the video window of the current speaker;
    或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;Or displaying the video window of the current speaker in a ratio larger than other speaker video data screens;
    或以预设标识显示所述当前发言人的视频窗口。Or display the video window of the current speaker with a preset identifier.
  9. 一种视频会议中视频窗口的调整装置,其特征在于,所述视频窗口的调整装置包括:An apparatus for adjusting a video window in a video conference, wherein the adjusting device of the video window comprises:
    获取模块,用于获取所有与会者的音频数据;An acquisition module, configured to acquire audio data of all participants;
    处理模块,用于对获取的各个与会者的音频数据进行采样生成音频采样数据,分别统计各个与会者的音频采样数据在预设频段中出现的次数;还用于根据各个与会者的音频采样数据在预设频段中出现的次数确定当前发言人;The processing module is configured to sample the obtained audio data of each participant to generate audio sampling data, and separately count the number of times the audio sample data of each participant appears in the preset frequency band; and also use the audio sampling data according to each participant The number of occurrences in the preset frequency band determines the current speaker;
    显示模块,用于突出显示所述当前发言人对应的视频窗口。a display module, configured to highlight a video window corresponding to the current speaker.
  10. 如权利要求9所述的视频会议中视频窗口的调整装置,其特征在于,所述处理模块,还用于从获取的各个音频数据中提取预设长度的音频数据生成音频采样数据。The apparatus for adjusting a video window in a video conference according to claim 9, wherein the processing module is further configured to extract audio data of a preset length from the acquired audio data to generate audio sample data.
  11. 如权利要求9所述的视频会议中视频窗口的调整装置,其特征在于,所述处理模块包括确定单元和处理单元,The apparatus for adjusting a video window in a video conference according to claim 9, wherein the processing module comprises a determining unit and a processing unit,
    所述确定单元,用于确定在预设频段中出现的次数最高的音频采样数据;The determining unit is configured to determine audio sample data with the highest number of occurrences in the preset frequency band;
    所述处理单元,还用于将最高次数对应的音频采样数据所对应的与会者确定为当前发言人。The processing unit is further configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the highest number of times.
  12. 如权利要求11所述的视频会议中视频窗口的调整装置,其特征在于,所述突出显示的方式包括:The apparatus for adjusting a video window in a video conference according to claim 11, wherein the manner of highlighting comprises:
    只显示所述当前发言人的视频窗口;Display only the video window of the current speaker;
    或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;Or displaying the video window of the current speaker in a ratio larger than other speaker video data screens;
    或以预设标识显示所述当前发言人的视频窗口。Or display the video window of the current speaker with a preset identifier.
  13. 如权利要求11所述的视频会议中视频窗口的调整装置,其特征在于,所述确定单元,还用于当最高次数为两个或两个以上时,确定最该次数对应的频段中频率最高的频段;The apparatus for adjusting a video window in a video conference according to claim 11, wherein the determining unit is further configured to: when the highest number of times is two or more, determine the highest frequency in the frequency band corresponding to the most Frequency band
    所述处理单元,还用于将最高次数对应的频段中频率最高的频段对应的音频采样数据所对应的与会者确定为当前发言人。The processing unit is further configured to determine, as the current speaker, the participant corresponding to the audio sample data corresponding to the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times.
  14. 如权利要求13所述的视频会议中视频窗口的调整装置,其特征在于,所述突出显示的方式包括:The apparatus for adjusting a video window in a video conference according to claim 13, wherein the manner of highlighting comprises:
    只显示所述当前发言人的视频窗口;Display only the video window of the current speaker;
    或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;Or displaying the video window of the current speaker in a ratio larger than other speaker video data screens;
    或以预设标识显示所述当前发言人的视频窗口。Or display the video window of the current speaker with a preset identifier.
  15. 如权利要求13所述的视频会议中视频窗口的调整装置,其特征在于,所述确定单元,还用于当最高次数对应的频段中频率最高的频段相同时,确定频段最高的频段背景噪音的频率;The apparatus for adjusting a video window in a video conference according to claim 13, wherein the determining unit is further configured to: when the frequency band with the highest frequency among the frequency bands corresponding to the highest number of times is the same, determine the background noise of the frequency band with the highest frequency band. frequency;
    所述处理单元,还用于将背景噪音的频率最大的音频数据所对应的与会者作为当前发言人。The processing unit is further configured to use the participant corresponding to the audio data with the highest background noise as the current speaker.
  16. 如权利要求15所述的视频会议中视频窗口的调整装置,其特征在于,所述显示模块,还用于只显示所述当前发言人的视频窗口;The apparatus for adjusting a video window in a video conference according to claim 15, wherein the display module is further configured to display only a video window of the current speaker;
    或以大于其他发言人视频数据画面的比例显示所述当前发言人的视频窗口;Or displaying the video window of the current speaker in a ratio larger than other speaker video data screens;
    或以预设标识显示所述当前发言人的视频窗口。Or display the video window of the current speaker with a preset identifier.
PCT/CN2014/094598 2014-12-15 2014-12-23 Method and device for adjusting video window in video conference WO2016095244A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410776179.X 2014-12-15
CN201410776179.XA CN105791738B (en) 2014-12-15 2014-12-15 The method of adjustment and device of video window in video conference

Publications (1)

Publication Number Publication Date
WO2016095244A1 true WO2016095244A1 (en) 2016-06-23

Family

ID=56125700

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/094598 WO2016095244A1 (en) 2014-12-15 2014-12-23 Method and device for adjusting video window in video conference

Country Status (2)

Country Link
CN (1) CN105791738B (en)
WO (1) WO2016095244A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651632A (en) * 2020-04-23 2020-09-11 深圳英飞拓智能技术有限公司 Method and device for outputting voice and video of speaker in video conference

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107734286B (en) 2016-08-12 2021-05-04 阿里巴巴集团控股有限公司 Video window display method and device
CN107396036A (en) * 2017-09-07 2017-11-24 北京小米移动软件有限公司 Method for processing video frequency and terminal in video conference
CN107682752B (en) * 2017-10-12 2020-07-28 广州视源电子科技股份有限公司 Method, device and system for displaying video picture, terminal equipment and storage medium
CN111596985B (en) * 2020-04-24 2023-03-14 腾讯科技(深圳)有限公司 Interface display method, device, terminal and medium in multimedia conference scene
CN112380234B (en) * 2020-11-03 2024-05-14 广州迈聆信息科技有限公司 Video conference window searching and displaying method and device and video conference system
CN112351237A (en) * 2020-11-05 2021-02-09 安徽马钢和菱实业有限公司 Automatic switching decision algorithm for main video of video conference
CN113596349B (en) * 2021-07-26 2024-06-04 世邦通信股份有限公司 Conference method, system, device and storage medium for automatic linkage video of speaking position

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080000A (en) * 2007-07-17 2007-11-28 华为技术有限公司 Method, system, server and terminal for displaying speaker in video conference
CN101371244A (en) * 2006-01-13 2009-02-18 微软公司 Sorting speakers in a network-enabled conference
CN101478642A (en) * 2009-01-14 2009-07-08 镇江畅联通信科技有限公司 Multi-picture mixing method and apparatus for video meeting system
CN102647578A (en) * 2011-02-17 2012-08-22 鸿富锦精密工业(深圳)有限公司 Video switching system and method
CN103297743A (en) * 2012-03-05 2013-09-11 联想(北京)有限公司 Video conference display window adjusting method and video conference service equipment

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7355623B2 (en) * 2004-04-30 2008-04-08 Microsoft Corporation System and process for adding high frame-rate current speaker data to a low frame-rate video using audio watermarking techniques
US8395653B2 (en) * 2010-05-18 2013-03-12 Polycom, Inc. Videoconferencing endpoint having multiple voice-tracking cameras
US9030520B2 (en) * 2011-06-20 2015-05-12 Polycom, Inc. Automatic camera selection for videoconferencing
EP2766901B1 (en) * 2011-10-17 2016-09-21 Nuance Communications, Inc. Speech signal enhancement using visual information
US9491404B2 (en) * 2011-10-27 2016-11-08 Polycom, Inc. Compensating for different audio clocks between devices using ultrasonic beacon
CN102857732B (en) * 2012-05-25 2015-12-09 华为技术有限公司 Menu control method, equipment and system in a kind of many pictures video conference

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101371244A (en) * 2006-01-13 2009-02-18 微软公司 Sorting speakers in a network-enabled conference
CN101080000A (en) * 2007-07-17 2007-11-28 华为技术有限公司 Method, system, server and terminal for displaying speaker in video conference
CN101478642A (en) * 2009-01-14 2009-07-08 镇江畅联通信科技有限公司 Multi-picture mixing method and apparatus for video meeting system
CN102647578A (en) * 2011-02-17 2012-08-22 鸿富锦精密工业(深圳)有限公司 Video switching system and method
CN103297743A (en) * 2012-03-05 2013-09-11 联想(北京)有限公司 Video conference display window adjusting method and video conference service equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651632A (en) * 2020-04-23 2020-09-11 深圳英飞拓智能技术有限公司 Method and device for outputting voice and video of speaker in video conference

Also Published As

Publication number Publication date
CN105791738B (en) 2019-03-12
CN105791738A (en) 2016-07-20

Similar Documents

Publication Publication Date Title
WO2016095244A1 (en) Method and device for adjusting video window in video conference
WO2016201745A1 (en) User terminal-based visit prompting method, user terminal, and network hospital platform
WO2015122624A1 (en) Mobile terminal and method for controlling same
WO2019080406A1 (en) Television voice interaction method, voice interaction control device and storage medium
WO2013152639A1 (en) Video chatting method and system
WO2016165556A1 (en) Data processing method, device and system for video stream
WO2017092268A1 (en) Terminal failure processing method, device and system
WO2018045682A1 (en) Method and device for testing audio and picture synchronization
WO2016091011A1 (en) Subtitle switching method and device
WO2014187158A1 (en) Method, server, and terminal for controlling cloud sharing of terminal data
WO2017126835A1 (en) Display apparatus and controlling method thereof
WO2017143690A1 (en) Echo cancellation method and device for use in voice communication
WO2017084301A1 (en) Audio data playing method and apparatus, and smart television
WO2017028613A1 (en) Terminal control method and apparatus based on remote controller app
WO2017152603A1 (en) Display method and apparatus
WO2015058570A1 (en) Method and device for automatically recognizing network operator to realize data configuration
WO2017045435A1 (en) Method and device for controlling television playing
WO2016090991A1 (en) Method and apparatus for downloading streaming media data
WO2017036209A1 (en) Audio data play method based on smart television, and smart television and system
WO2016029502A1 (en) Signal source switching method and device
WO2019227564A1 (en) Method and apparatus for displaying communication information, and user equipment and storage medium
WO2015016655A1 (en) Method and apparatus for establishing communication between terminals
WO2019210574A1 (en) Message processing method, apparatus, device, and readable storage medium
WO2017096764A1 (en) Audio data output method and device
WO2017032120A1 (en) Playing processing method and apparatus for audio switching

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14908279

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14908279

Country of ref document: EP

Kind code of ref document: A1