KR20150087017A

KR20150087017A - Audio control device based on eye-tracking and method for visual communications using the device

Info

Publication number: KR20150087017A
Application number: KR1020140007373A
Authority: KR
Inventors: 김회율; 한세희; 배현주; 나문수
Original assignee: 한양대학교 산학협력단
Priority date: 2014-01-21
Filing date: 2014-01-21
Publication date: 2015-07-29

Abstract

The present invention discloses an audio control device based on gaze tracking and a video communication method using the same. The audio control device based on gaze tracking includes: a gaze tracking unit which detects the location coordinates of a pupil area in an image capturing the user of a communication device and generates gaze tracking information based on the detected location coordinates of the pupil area; an interlocutor identifier which identifies one of the interlocutors in the display based on the gaze tracking information; a threshold factor application unit which applies a predetermined threshold factor to an audio signal transmitted from a microphone installed on the sides of the remaining interlocutors other than the identified interlocutor; and an acoustic filter application unit which applies an acoustic filter to each audio signal based on whether the audio signal is applied with the predetermined threshold factor.

Description

TECHNICAL FIELD [0001] The present invention relates to an audio control apparatus based on eye tracking, and a video communication method using the same. [0002]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a human-computer interface (HCI), and more particularly to a sound image technique using a sound system, a technique of correcting a user's gaze, To a technique for implementing a communication system.

The present invention relates to a technique for applying gaze tracking information to an audio signal of a talker on a display so that sound appears to exist in a direction where there is no sound source according to a virtual position of a talker on the display.

In the first generation of computers in human history, vacuum tubes (some relays) were used as logic devices. Since that time, computers have made remarkable progress and have now reached the 4th generation, and VLSI (Very Large Scale Integration), which is a more advanced form of integrated circuit (IC) integration.

Computers have rapidly developed not only in the form of personal computers (PCs) but also in the form of servers and portable terminals, and the related technologies have been developed along with the spread of the technology, and the computer has become a necessity for the individual. As the spread of computers becomes common, various services are provided to satisfy users' desires.

One of the most important areas of various services is data communication. Particularly, as the application field of communication becomes wider, a demand for communication for transmitting visual information on the screen, that is, for video communication, is also increasing. Video communication is roughly classified into two types. The first is hardcopy image communication in which image information is transmitted using paper, such as a fax, and the second is soft copy image communication using a monitor. The concept of ordinary video communication means soft copy video communication.

Conventional voice communication is a communication method for transmitting only a voice signal, whereas video communication is a communication method for simultaneously transmitting a voice signal and an image signal. When the image information is transmitted to the other party of the communication, the amount and quality of the information to be transmitted can be enriched. Since the human vision of the five senses accounts for 60-80% of human sense information, the exchange of information using visual and auditory sense can be much more effective than using only auditory sense.

It is true that video communication is increasingly in demand in the current social environment that is becoming globalized because it enables communication with viewers of distant view to each other. In addition to private conversations with family or friends who are far away, there is a real demand for business meetings, telemedicine, and so on.

In actual conversation, conversation is performed while looking at the other party's conversation. Therefore, the feelings and atmosphere of the other party can be felt as they are. Especially, the voice heard between conversation parties has directivity so that realistic conversation can be realized. However, in videoconferencing, it is not easy to feel such feelings, atmosphere, and sense of reality as in real conversation. In video communication, the most important factor that makes feelings and atmosphere feel like real conversation is the feeling of presence of audio.

When video conferencing technology has been developed and developed for reasons of time saving, efficient use of space, convenience, etc., the presence of such audio can not be felt only by the video conferencing system in which the present simple video and audio are exchanged, There is a need for users to increase the immersiveness of the conference by raising the realism of the videoconference.

On the other hand, the line-of-sight position tracking technique is a method for grasping a user's position at a screen device such as a computer monitor. That is, the gaze position tracking functions as an input device for the handicapped user by pointing the user's gaze point as in the conventional mouse operation protocol, and provides a high immersion feeling to the user in the virtual reality environment.

The technology that simply improves the realism of the audio will be possible by the recording and editing technology which edits the audio signal recorded from the microphone of several channels and outputs it through the multi-channel speaker. In addition, it is necessary to study the technology that can realize more realistic audio by adding eye tracking information of users of video conferencing system.

The present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to improve a sense of presence of a video as well as a sound by using a sound image technique and a line of sight correction technique, The method comprising:

Another object of the present invention is to enhance the presence feeling by using the sound image technique as if the participants actually gathered. It is a further object of the present invention to provide a more realistic audio and to eliminate the sense of heterogeneity by using a line-of-sight correction technique in order to solve the visual inconsistency of the meeting participants.

According to one aspect of the present invention, there is provided an apparatus for controlling an audio of a communication device having a display, the apparatus comprising: A gaze tracking unit for detecting positional coordinates of a pupil region from an image and generating gaze tracking information based on the position coordinates of the detected pupil region; And an audio signal transmitted from a microphone provided on the talker's side other than the one talker identified among the respective audio signals transmitted from the microphone provided on the plurality of talker sides, A threshold coefficient application unit for applying a predetermined threshold coefficient, And an acoustic filter applying unit for applying a sound filter to the false signal.

Here, the gaze-tracking unit may generate gaze correction information by combining images photographed by a camera at the upper and lower ends of the display, and may generate gaze correction images that are corrected based on the gaze correction information, Can be transmitted to communication equipment.

Further, the gaze correction image may be corrected by applying an alpha blending technique to the eye region detected from the image captured by the camera at the upper and lower ends of the display.

Here, the line-of-sight-tracking-based audio control apparatus determines a virtual position on a three-dimensional space for a plurality of talkers based on positions in a display of a plurality of talkers, and determines the determined virtual position as a virtual sound source And a virtual position determining unit for controlling a speaker installed in the display unit.

Further, the virtual position determination unit may control a speaker by generating a virtual sound source by applying a sound image technique.

According to another aspect of the present invention, there is provided a video communication system for realistic audio, the video communication system including a line-of-sight-based audio control device, A photographing apparatus which has one camera and photographs a user of the image communication system; and a photographing apparatus which generates gaze tracking information from an image photographed by the photographing apparatus, identifies a talker of a plurality of talkers in the display based on gaze tracking information, And a line-of-sight-based audio control device installed on the user side for applying a predetermined threshold coefficient and an acoustic filter to each communicative audio signal transmitted from a microphone installed on a plurality of communicators on the basis of the identification result .

Here, the line-of-sight-tracking-based audio control apparatus generates eye-gaze correction information by combining images photographed by the cameras at the upper and lower ends, and generates gaze correction images, which are corrected based on the gaze correction information, Can be transmitted to communication equipment.

Further, the gaze correction image may be corrected by applying an alpha blending technique to the eye region detected from the images captured by the camera at the upper and lower ends.

Here, the line-of-sight-tracking-based audio control apparatus determines virtual positions on the three-dimensional space for a plurality of talkers based on the positions of the plurality of talkers on the display, and outputs the determined virtual positions to a speaker Can be controlled.

Further, the line-of-sight-tracking-based audio control apparatus can control a speaker by generating a virtual sound source by applying a sound image technique.

Here, the video communication system for realistic audio includes a stereo microphone for receiving the user audio generated from the user and outputting the user audio signal to the eye-tracking-based audio control device, And the like.

Here, the image communication system for realistic audio may further include a stereo speaker for receiving and outputting an audio signal to which a threshold coefficient and an acoustic filter are applied, from an eye-tracking-based audio control apparatus.

According to another aspect of the present invention, there is provided a method for realizing realistic audio in a video communication system having a visual track-based audio control apparatus, The method comprising the steps of: detecting position coordinates of an eyeball from an image of a user of the eyeball and generating gaze tracking information based on the detected position coordinates of the eyeball; A step of applying a predetermined threshold coefficient to an audio signal transmitted from a microphone provided on the talker side excluding one talker identified from each audio signal transmitted from a microphone installed on a plurality of talker sides, Based on whether or not the threshold coefficient is applied, It can comprise the step of applying the fragrance filter.

Here, in the image communication method for realistic audio, before the step of generating gaze tracking information, the gaze correction information is generated by combining the images photographed by the upper and lower cameras of the display, and based on the gaze correction information, And transmitting the gaze correction image to the communication equipment of the plurality of communicators.

Further, the gaze correction image may be corrected by applying an alpha blending technique to an eye region detected from an image captured by a camera at the upper and lower ends of the display.

Here, an image communication method for realistic audio is a method for communicating a virtual position on a three-dimensional space with respect to a plurality of talkers based on positions in a display of a plurality of talkers, between a step of identifying a talker and a step of applying a threshold coefficient And a virtual positioning step of controlling the speaker installed on the user side so that the determined virtual position becomes a virtual sound source.

Further, the virtual positioning step may control the speaker by applying a sound image technique to generate a virtual sound source.

Here, the image communication method for realistic audio further includes a step of receiving the user audio generated from the user and delivering the user audio signal to the eye-tracking-based audio control device so as to be outputted to the speakers installed on the plurality of talkers .

Here, the image communication method for realistic audio may further include receiving and outputting an audio signal to which a threshold coefficient and an acoustic filter are applied, from an eye-tracking-based audio control apparatus after the step of applying the acoustic filter .

Using the audio-visual device and the video communication method using the eye-tracking system according to the present invention as described above doubles the sense of presence of the conference, so that the quality of the conference can be improved by increasing the immersion degree to a more realistic conference. In addition, eye - tracking technology has the advantage of clearer communication. The method of increasing the sense of presence is to reproduce the voice after multiplying the threshold value by the threshold coefficient by varying the threshold coefficient for each participant of the conference.

In addition, it is possible to increase the sense of presence of the video as well as the voice by using the line-of-sight correction technique, and to identify the conference participant that the user is looking at using the line-of-sight tracking technique and to make the voice of the conference participant more clearly heard.

1 is a diagram illustrating an overall configuration and a preferred embodiment of a video communication system according to an embodiment of the present invention.
FIG. 2 is a block diagram for explaining an eye-tracking-based audio control apparatus and its detailed components according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating an example of setting a virtual location on a three-dimensional space for a user and a talker in the video conferencing system according to an exemplary embodiment of the present invention.
4 is a block diagram illustrating an image communication system for realistic audio and its detailed components according to an embodiment of the present invention.
FIG. 5 is a flowchart illustrating a video communication method for realistic audio using a visual tracking-based audio control apparatus according to an exemplary embodiment of the present invention, and detailed steps thereof.

While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

The terms first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise. In the present application, the terms "comprises" or "having" and the like are used to specify that there is a feature, a number, a step, an operation, an element, a component or a combination thereof described in the specification, But do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

First, the terms used in the present application will be briefly described as follows.

In the present application, the threshold coefficient refers to a coefficient that multiplies the audio signal before audio is output to the audio output device to adjust the strength of the audio signal. For example, the output value output to the audio output device may be attenuated so as not to exceed the limit value. Generally, the amplifier of the audio output apparatus may have a built-in mechanism for attenuating the output so as not to exceed the threshold level. The threshold coefficient may also be used to reflect the human auditory characteristics.

The filter is a circuit that makes it easy to pass through any frequency band and make it difficult to pass through other frequency bands. The frequency that is the boundary of the passed area is called the cutoff frequency and is abbreviated to fc. It can be divided into several kinds according to the curve of the frequency characteristic. The following describes a representative example.

The high pass filter (HPF) has a characteristic that it is easy to pass through the frequency band above the cutoff frequency fc. The low pass filter (LPF) has a characteristic that it is easy to pass through the lower frequency band than the cutoff frequency fc. A band pass filter (BPF) has a characteristic that it is easy to pass through only one frequency domain. And has two cutoff frequencies fcH and fcL, which are the upper and lower limits of the frequency at which it is easy to pass. When the frequency domain is narrow, only the center frequency may be displayed. It is difficult for the band reject filter (BRF) to pass through only a relatively narrow frequency region. The display of the cutoff frequency is the same as that of the BPF, and there are cases where only the upper and lower limits of the area are indicated separately and only the center frequency of the cut area appears. It is called Band Eliminate Filter (BEF). A notch filter has characteristics that it is difficult to pass only a specific frequency point. You might think of it as a narrow extreme in BRF. The indication of the cutoff frequency is indicated by a frequency point that can not be passed. The all pass filter is a filter that changes the phase characteristic, not the frequency characteristic. The frequency characteristic is normal, but the phase changes 180 degrees (opposing) to one frequency boundary. It is a special feature, but it is a kind of filter.

There are many other types of filters that can be used to create any kind of circuit design. As an element of the filter, not only the cutoff frequency but also the blocking characteristic becomes a problem. The term "blocking characteristics" refers to a high-pass filter. This is usually expressed in dB, which is slightly weaker than fc by one octave below fc. For example, if half of fc is the passing amount, it is expressed as 6dB / oct. If the passing amount is 1/4, it is 12dB / oct. There is another way of expressing the amount of passage at a frequency of 1 / 10th of fc, not under one octave, as the ratio of passing through fc, which is expressed in dB / dec. (Dec. Is the abbreviation for decade) .

The alpha blending technique is a technique of expressing a transmitted image by adding a variable α to transparency in general image data. It is especially effective in enhancing expressiveness in 3D images such as smoke.

A sound image is a type of sound that appears to exist in the direction of the sound source in stereo. Sound image technique is one of the techniques that make you feel like that.

Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

FIG. 1 is a diagram illustrating an overall configuration and a preferred embodiment of a video communication system 10 according to an exemplary embodiment of the present invention. FIG. 2 is a block diagram illustrating a detailed configuration of a visual track- Fig. The operation of the eye-tracking-based audio control apparatus 200 and its detailed components will be described with reference to FIGS. 1 and 2. FIG.

The line-of-sight-tracking-based audio control apparatus 200 includes an apparatus for controlling audio of a communication apparatus having a display 300, the apparatus comprising: A gaze tracking unit 210 for detecting gaze tracking information based on the position coordinates of the detected pupil region and generating gaze tracking information based on the position coordinates of the detected pupil region, And a threshold coefficient 220 for applying a predetermined threshold coefficient to the audio signal transmitted from the microphones installed on the talker side except for one talker identified among the respective audio signals transmitted from the plurality of talker- An application unit 240 for applying a predetermined threshold coefficient to the audio signal, Including the emitter application unit 250 may be configured.

The gaze tracking unit 210 generates gaze correction information by combining the image captured by the camera at the upper end of the display 300 with the image at the lower end and generates a gaze correction image in which the image is corrected based on the gaze correction information, It can be transmitted to the communicator's communication equipment. The gaze correction image can be corrected by applying an alpha blending technique to the eye region detected from the image captured by the camera at the top and bottom of the display 300. [

Referring to FIG. 1, it can be assumed that the user 1 performs a plurality of one-to-many video conferences using a PC equipped with a camera. At this time, the user 1 can make a video conference with another user, that is, the talker 3, the talker 4, and the talker 5 via the display 300. If user 1 is talking to talker 4, user 1 can see user 4 using eye tracking technology. The voice of the user 1 is input through the stereo microphone, and the speaker 4 can hear the voice of the user 1 through the stereo speaker installed on the speaker 4 side. This approach can be applied equally to Speaker 3 and Speaker 5, or to many-to-many meetings.

The conference attendees may include the eye-tracking-based audio control apparatus 200 as shown in FIG. The user can photograph the user by driving the photographing apparatus 100 first. And transmits the photographed image to participants 3, 4, and 5 of each conference, and displays the received image from participants 3, 4, and 5 on the display 300. At this time, the transmitted image may undergo a line of sight correction process through the line-of-sight tracing unit 210. This process is performed before the image is transmitted. When the pupil area of the user 1 photographed by the camera installed at the lower end of the display 300 is covered with the pupil area detected by the camera installed on the upper side of the display 300 using the alpha blending technique, Can be corrected.

FIG. 3 is a diagram illustrating an example of setting a virtual location on a three-dimensional space for a user and a talker in the video conferencing system according to an exemplary embodiment of the present invention. A process of setting a virtual location on a three-dimensional space for a user and a talker of a video conference system will be described with reference to FIG.

A virtual location on a three-dimensional space of a plurality of talkers based on a position in the display 300 of the plurality of talkers, and controls the speaker 500 installed on the user side so that the determined virtual location becomes a virtual sound source And a virtual position determination unit 230 for determining the position of the robot. The virtual position determination unit 230 may control the speaker 500 by generating a virtual sound source by applying a sound image technique.

The virtual design can be generated as shown in FIG. 3 according to the position where the image is displayed after the images of the talkers 3, 4, and 5 are displayed on the display 300. In this case, the distances between the vertexes 3, 4 and 5 and the adjacent participants among the vertices 1 and 2 in the virtual design can be considered to be the same. Based on the virtual position, the stereo sound can be multiplied by the threshold coefficient according to the sound image technique, and the voice of the conference participants can be reproduced to the user 1 through the speaker 500.

At this time, in order to increase the quality of presence and increase the efficiency of the conversation, it is necessary to use the eye tracking technology to identify the talker that the user 1 is currently looking at, multiply the audio (speech) of the talker by the threshold coefficient, So that the voice of the talker can be heard more clearly.

4 is a block diagram for explaining an image communication system 10 for realistic audio and its detailed components according to an embodiment of the present invention. The video communication system 10 for realistic audio and its detailed components will be described with reference to FIG.

A video communication system 10 for realistic audio is a video communication system 10 having a visual tracking based audio control device 200 in which at least one An image capturing device (100) for capturing a user of the image communication system (10) having a camera; an image generation device (100) for generating eye tracking information from the image captured by the image capturing device (100) Based on a gaze tracking based on a user's side that applies a predetermined threshold coefficient and an acoustic filter to each speaker's audio signal transmitted from a microphone installed on a plurality of speakers on the side of a plurality of speakers in accordance with an identification result, And an audio control device 200. [

The line-of-sight-tracking-based audio control apparatus 200 generates line-of-sight correction information by combining images photographed by upper and lower cameras, and converts the line-of-sight-corrected image, in which the image is corrected based on the line- Equipment. The gaze correction image can correct the image by applying an alpha blending technique to the eye region detected from the images captured by the camera at the upper and lower ends.

The line-of-sight-tracking-based audio control apparatus 200 determines virtual positions on the three-dimensional space for a plurality of talkers based on the positions of the plurality of talkers on the display 300, and transmits the determined virtual positions to the user So that the installed speaker 500 can be controlled. The eye-tracking-based audio control apparatus 200 can control the speaker 500 by generating a virtual sound source by applying a sound image technique.

The video communication system 10 for realistic audio receives a user audio generated from a user and outputs a user audio signal to the eye-tracking-based audio control device 200 so as to be output to a speaker 500 installed on a plurality of talkers' And a stereo speaker 500 which can be configured to further include a stereo microphone 400 for receiving and outputting an audio signal to which a threshold coefficient and an acoustic filter are applied from the visual track-based audio control device 200, .

The description of the video communication system 10 for realistic audio and the detailed components thereof has been described in the description of the eye-tracking-based audio control device 200 described above, and thus is not described in duplicate.

The voice from a typical speaker 500 is different from a real conversation. I feel like listening to lectures rather than talking. However, if a conversation partner using a stereo sound system can give the same feeling as if he or she is speaking in front of the audience, the meeting will be more engaged and the meeting can be proceeded with a relaxed tension.

FIG. 5 is a flowchart for explaining a video communication method for realistic audio using a visual tracking-based audio control apparatus 200 according to an exemplary embodiment of the present invention and its detailed steps. The video communication method for realistic audio using the eye-tracking-based audio control apparatus 200 and detailed steps thereof will be described with reference to FIG.

A video communication method for realistic audio is a method for realizing realistic audio in a video communication system (10) provided with a visual tracking-based audio control device (200), comprising the steps of: (S520) of detecting eye position coordinates of eyeballs based on the position coordinates of the detected eyeballs, and generating eyeball tracking information based on eyeball tracking information based on the position coordinates of the eyeballs A predetermined threshold coefficient is applied to an audio signal transmitted from a microphone provided on the talker side excluding one talker identified from each audio signal S540 transmitted from a microphone installed on a plurality of talker sides A step S550 of applying an acoustic filter to each audio signal based on whether or not a predetermined threshold coefficient is applied, 55).

The visual communication method for realistic audio combines the images captured by the upper and lower cameras of the display 300 to generate gaze correction information (S515) before generating the gaze tracking information (S520) And transmitting the gaze correction image, the image of which is corrected based on the correction information, to the communication equipment of the plurality of talkers. The gaze correction image can correct the image by applying an alpha blending technique to the eye region detected from the image captured by the camera at the top and bottom of the display 300. [

A video communication method for realistic audio may include the step of identifying a talker (S525) and applying a threshold coefficient (S550) to a plurality of talkers And a virtual positioning step (S545) of determining a virtual position on the dimension space and controlling the speaker 500 installed on the user side so that the determined virtual position becomes a virtual sound source. The virtual positioning step S545 may control the speaker 500 by applying a sound image technique to generate a virtual sound source.

The video communication method for realistic audio includes a step S535 of receiving a user audio signal generated by a user and outputting the user audio signal to a speaker installed on a plurality of talkers side, (S560) of receiving and outputting the threshold coefficient and the audio signal to which the acoustic filter is applied from the visual track-based audio control apparatus 200 after the step S555 of applying the acoustic filter .

The description of the video communication method for realistic audio is not described in detail because it has been described in the above description of the eye-tracking-based audio control apparatus 200 and the video communication system 10 for realistic audio.

Although some aspects have been described in terms of apparatus, it is clear that these aspects represent a description of the corresponding method, wherein the steps of the method correspond to the apparatus. According to certain implementation requirements, embodiments of the invention may be implemented in hardware or software. Embodiments of the present invention may be implemented as program code, a computer program product having program code that is operative for performing one of the methods.

The foregoing description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the scope of the present invention but to limit the scope of the technical idea of the present invention. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents should be construed as falling within the scope of the present invention.

10: (user side) video communication system 20: (communicator side) video communication system
100:
200: Eye-tracking-based audio control
210: eye line tracking unit 220:
230: virtual positioning unit 240: threshold coefficient application unit
250: Acoustic filter applying part 300: Display
400: microphone 500: speaker

Claims

1. An apparatus for controlling audio of a communication device having a display,
A gaze tracking unit for detecting coordinates of a pupil area from an image of a user of the communication device and generating gaze tracking information based on the coordinates of the detected pupil area; And
A talker identifying unit for identifying a talker of one of the plurality of talkers in the display based on the gaze tracking information;
A threshold coefficient for applying a predetermined threshold coefficient to an audio signal transmitted from a microphone provided on the talker side except for the one talker identified among the audio signals transmitted from a microphone installed on the talker's side part; And
And applying an acoustic filter to each of the audio signals based on whether the predetermined threshold coefficient is applied or not.

The method according to claim 1,
The eye-
The gaze correction information is generated by combining the image captured by the camera at the upper and lower ends of the display, and the gaze correction image corrected with the gaze correction information is transmitted to the communication devices of the plural communicators Wherein the visual track-based audio control device is configured to perform a visual track-based audio control.

The method of claim 2,
The line-of-
Wherein the image is corrected by applying an alpha blending technique to an eye region detected from an image captured by a camera at the upper and lower ends of the display.

The method according to claim 1,
Determining a virtual position on the three-dimensional space with respect to the plurality of talkers based on a position in the display of the plurality of talkers, and transmitting the determined virtual position to a speaker installed on the user side to be a virtual sound source, And a virtual position determination unit for controlling the visual tracking based on the position information.

The method of claim 4,
The virtual position determination unit may determine,
Wherein the speaker is controlled by generating a virtual sound source by applying a sound image technique.

1. A method for realizing realistic audio in a video communication system having an eye-tracking-based audio control apparatus,
Detecting position coordinates of an eyeball from an image of a user of the communication device and generating gaze tracking information based on the detected position coordinates of the eyeball;
Identifying one of the plurality of talkers in the display based on the line of sight tracking information;
Applying a predetermined threshold coefficient to an audio signal transmitted from a microphone installed on a talker side other than the one talker identified among the respective audio signals transmitted from a microphone installed on the talker side; And
And applying an acoustic filter to each of the audio signals based on whether the predetermined threshold coefficient is applied or not.

The method of claim 6,
The gaze correction information is generated by combining the image captured by the camera at the upper and lower ends of the display, and the gaze correction image corrected with the gaze correction information is transmitted to the communication devices of the plural communicators The method further comprising the steps of:

The method of claim 7,
The line-of-
Wherein the image is corrected by applying an alpha blending technique to an eye region detected from an image captured by a camera at the upper and lower ends of the display.

The method of claim 6,
Between identifying the talker and applying the threshold coefficient,
A virtual positioning step of determining virtual positions on the three-dimensional space with respect to the plurality of talkers based on positions in the display of the plurality of talkers, and controlling the speakers installed on the user side to be virtual positions Further comprising the steps of: receiving an audio signal from the audio source;

The method of claim 9,
Wherein the virtual positioning step comprises:
And a speaker is controlled by generating a virtual sound source by applying a sound image technique.

The method of claim 6,
Further comprising the step of receiving the user audio generated from the user and outputting the user audio signal to the loudspeaker installed on the side of the talker, / RTI >

The method of claim 6,
After applying the acoustic filter,
Further comprising the step of receiving and outputting the threshold coefficient and an audio signal applied with an acoustic filter from the line-of-sight-tracking-based audio control apparatus.