WO2023048631A1

WO2023048631A1 - A videoconferencing method and system with focus detection of the presenter

Info

Publication number: WO2023048631A1
Application number: PCT/SE2022/050847
Authority: WO
Inventors: Gunnar Weibull; Tomas Christiansson
Original assignee: Flatfrog Laboratories Ab
Priority date: 2021-09-24
Filing date: 2022-09-23
Publication date: 2023-03-30

Abstract

A method of videoconferencing between a presenter and a remote user comprises outputting a display signal configured to display an image on a presenter display and on a remote user display. The method also comprises receiving one or more images of the presenter. The method further comprises determining a gaze direction of the presenter with respect to the displayed image on the presenter display based on the received images of the presenter. The method comprises determining a presenter gaze object in the displayed image based on the determined gaze direction. The method comprises modifying the output display signal configured to display a modified image on the remote user display based on the determined presenter gaze object.

Description

A VIDEOCONFERENCING METHOD AND SYSTEM WITH FOCUS DETECTION OF THE PRESENTER

Technical Field

The present disclosure relates to a method of videoconferencing and a videoconferencing system. In particular, the present disclosure relates a method of videoconferencing which automatically detects the focus of the presenter.

Remote working is becoming increasingly important to employers and employees. For example, there is an increasing demand not to travel and face to face meetings are being replaced with alternatives such as videoconferencing.

One issue with videoconferencing is that the remote participants receive less cues such as body language from the presenter. Remote participants cannot see what the presenter is looking at and therefore this can make participation in a videoconference harder.

Examples of the present disclosure aim to address the aforementioned problems.

According to an aspect of the present disclosure there is a method of videoconferencing between a presenter and a remote user comprising: outputting a display signal configured to display an image on a presenter display and on a remote user display; receiving one or more images of the presenter; determining a gaze direction of the presenter with respect to the displayed image on the presenter display based on the received images of the presenter; determining a presenter gaze object in the displayed image based on the determined gaze direction; and modifying the output display signal configured to display a modified image on the remote user display based on the determined presenter gaze object.

Optionally, the method comprises modifying the determined presenter gaze object displayed on the present display. Optionally, the modifying comprises highlighting the presenter gaze object in the image.

Optionally, the highlighting comprises increasing the brightness, increasing the contrast, changing the colour, sharpen, or distorting the determined presenter gaze object.

Optionally, the highlighting is removed when the determined gaze direction moves away from the determined presenter gaze object for more than a predetermined period of time.

Optionally, the highlighting fades away or snaps away if the determined gaze direction moves away from the determined presenter gaze object.

Optionally, the modifying comprises lowlighting one or more parts of the displayed image other than the determined presenter gaze object.

Optionally, the lowlighting comprises decreasing the brightness, reducing the contrast the colour, removing the colour, blur, defocus, or distorting the one or more parts of the displayed image other than the determined presenter gaze object.

Optionally, the presenter gaze object is one or more of an application window, an application tab, a user selection on the displayed image, an area of the displayed image, a delimited object in the displayed image, presenter annotation on the displayed image. Optionally, the gaze object is an image of person in a video conference.

Optionally, the method comprises determining an area of interest for selecting the presenter gaze object on the displayed image based on the determined gaze direction.

Optionally, the determining the presenter gaze object comprises determining a time period for one or more candidate presenter gaze objects from the last presenter interaction and / or a time period from initialization and selecting the presenter gaze object from the candidate presenter gaze objects based on the shortest time period. Optionally, the determining a gaze direction comprises detecting a face position and a face orientation with respect to the presenter display.

Optionally, the determining a gaze direction comprises tracking the eyes of the presenter.

According to another aspect of the present disclosure there is a method according to any of the preceding claims wherein the method comprises receiving one or more images of the remote user; determining a gaze direction of the remote user with respect to the displayed image on the remote user display based on the captured images of the remote user; determining a remote user gaze object in the displayed image based on the determined gaze direction; and modifying the output display signal configured to display a modified image on the presenter user display based on the determined remote user gaze object.

Optionally, an alert is issued to the presenter if the determined remote user gaze object is not the same as the determined presenter gaze object.

Optionally, modifying the output display signal comprises modifying the output display signal to display a modified image on the remote user display to indicate that that presenter is not looking at the presenter display.

According to yet another an aspect of the present disclosure there is a method according to any of the preceding claims wherein the method comprises: determining presenter image data comprising spatial position information of the position of the presenter relative to a touch sensing apparatus having a touch surface connected to the presenter display; generate a presenter representation based on the spatial position information; and output the display signal to the remote user display to display the presenter representation.

Optionally, the presenter representation is a silhouette of a hand, pointer, stylus, or other indicator object. Optionally, the modifying comprising modifying the output display signal configured to display a modified image on the remote user display when the presenter representation is near the determined presenter gaze object.

Optionally, the method comprises: receiving a touch input from the presenter on the touch surface; determine touch coordinates on the touch surface based on the touch input; generate a touch input representation based on the touch input coordinates; output the display signal to the remote user display to display the touch input representation.

According to a further aspect of the present disclosure there is a videoconferencing system comprising: a presenter terminal having: a presenter display configured to display an image; and at least one camera configured to capture an image of the presenter; a remote user terminal having: a remote user display configured to display the image; and a controller configured to determine a gaze direction of the presenter with respect to the displayed image on the presenter display, determine a presenter gaze object in the displayed image based on the determined gaze direction, and modify the output display signal configured to display a modified image on the remote user display based on the determined presenter gaze object.

Brief Description of the Drawings

Various other aspects and further examples are also described in the following detailed description and in the attached claims with reference to the accompanying drawings, in which:

Figure 1 shows a schematic representation of a videoconferencing terminal according to an example;

Figures 2 to 6 show schematic representations of a videoconferencing system according to an example;

Figures 7 to 9 show flow diagrams of the videoconferencing method according to an example;

Figures 10a, 10b and 11 show a schematic representation of a videoconferencing terminal comprising a touch sensing apparatus according to an example; and Figure 12 shows show schematic representations of a videoconferencing system according to an example.

Detailed Description

Figure 1 shows a schematic view of a videoconferencing terminal 100 according to some examples.

The videoconferencing terminal 100 comprises a camera module 102 and a presenter display 104. The videoconferencing terminal 100 selectively controls the activation of the camera module 102 and the presenter display 104. As shown in Figure 1 , the camera module 102 and the presenter display 104 are controlled by a camera controller 106 and a display controller 108 respectively. As discussed in more detail below, the camera module 102 comprises one or more cameras.

The videoconferencing terminal 100 comprises a videoconferencing controller 110. The videoconferencing controller 110, the camera controller 106 and the display controller 108 may be configured as separate units, or they may be incorporated in a single unit.

The videoconferencing controller 110 comprises a plurality of modules for processing the videos and images received from a remotely from an interface 112 and videos and images captured locally. The interface 112 and the method of transmitted and receiving videoconferencing data is known and will not be discussed any further.

In some examples, the videoconferencing controller 110 comprises a face detection module 114 for detecting facial features and an image processing module 116 for modifying a displayed image 220 (as shown in Figure 2) to be displayed on the presenter display 104. In some examples, the videoconferencing controller 110 comprises an eye tracking module 118. The eye tracking module 118 can be part of the face detection module 114 or alternatively, the eye tracking module 118 can be a separate module from the face detection module 114. The face detection module 114, the image processing module 116, and the eye tracking module 118 will be discussed in further detail below. One or all of the videoconferencing controller 110, the camera controller 106 and the display controller 108 may be at least partially implemented by software executed by a processing unit 120. The face detection module 114, the image processing module 116, and the eye-tracking module 118, may be configured as separate units, or they may be incorporated in a single unit. One or all of the face detection module 114, the image processing module 116, and the eye-tracking module 118, may be at least partially implemented by software executed by the processing unit 120.

The processing unit 120 may be implemented by special-purpose software (or firmware) run on one or more general-purpose or special-purpose computing devices. In this context, it is to be understood that each "element" or "means" of such a computing device refers to a conceptual equivalent of a method step; there is not always a one-to-one correspondence between elements/means and particular pieces of hardware or software routines. One piece of hardware sometimes comprises different means/elements. For example, a processing unit 120 may serve as one element/means when executing one instruction but serve as another element/means when executing another instruction. In addition, one element/means may be implemented by one instruction in some cases, but by a plurality of instructions in some other cases. Naturally, it is conceivable that one or more elements (means) are implemented entirely by analogue hardware components.

The processing unit 120 may include one or more processing units, e.g. a CPU ("Central Processing Unit"), a DSP ("Digital Signal Processor"), an ASIC ("Application- Specific Integrated Circuit"), discrete analogue and/or digital components, or some other programmable logical device, such as an FPGA ("Field Programmable Gate Array"). The processing unit 120 may further include a system memory and a system bus that couples various system components including the system memory to the processing unit. The system bus may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. The system memory may include computer storage media in the form of volatile and/or non-volatile memory such as read only memory (ROM), random access memory (RAM) and flash memory. The specialpurpose software and associated control parameter values may be stored in the system memory, or on other removable/non-removable volatile/non-volatile computer storage media which is included in or accessible to the computing device, such as magnetic media, optical media, flash memory cards, digital tape, solid state RAM, solid state ROM, etc. The processing unit 120 may include one or more communication interfaces, such as a serial interface, a parallel interface, a USB interface, a wireless interface, a network adapter, etc, as well as one or more data acquisition devices, such as an A/D converter. The special-purpose software may be provided to the processing unit 120 on any suitable computer-readable medium, including a record medium, and a read-only memory.

The videoconferencing terminal 100 will be used together with other remote videoconferencing terminals 202 as shown in Figure 2. In this way, the videoconferencing terminal 100 is a presenter videoconferencing terminal 100. Figure 2 shows a schematic representation of a videoconferencing system 200. The videoconferencing terminal 100 as shown in Figure 2 is the same as described in reference to Figure 1 . In some examples, the remote videoconferencing terminal 202 is the same as described in reference to Figure 1. In this way, a presenter 204 can present to one or more remote users 206. For the purposes of clarity, Figure 2 only shows one remote user 206. However, in some examples there can be one or more remote users 206. For example, there can be any number of remote users 206. In other examples, there can also be any number of remote conferencing remote videoconferencing terminals 202.

Furthermore, the presenter videoconferencing terminal 100 is videoconferencing terminal from which the presentation is being made. However, since the remote videoconferencing terminal 202 is identical to the presenter videoconferencing terminal 100, the remote videoconferencing terminal 202 can also present material to other users on the videoconference. For example, the remote user 206 can present material to the original presenter 204. The videoconferencing controller 110 is configured to selectively provide presentation authorisation to the presenter 204, or the remote user 206 as required.

Optionally, the presenter videoconferencing terminal 100 comprises additional functionality to the remote videoconferencing terminals 202. For example, the presenter videoconferencing terminal 100 can be a large touch screen e.g. a presenter display 104 comprising a touch sensing apparatus 1000 (as shown in Figures 10a, 10b, 11 ). In some examples, the remote videoconferencing terminals 202 can be a laptop, desktop computer, tablet, smartphone, or any other suitable device. In some other examples, the presenter videoconferencing terminal 100 does not comprise a touch sensing apparatus 1000 and is e.g. a laptop.

If the presenter videoconferencing terminal 100 is a large touch screen, the presenter 204 can present to both local participants 208 in the same room as the presenter 204 and the remote users 206 not in the same room as the presenter 204. As mentioned above, the touch sensing apparatus 1000 is optional, and the presenter 204 may present to both local participants 208 and remote users 206 without the touch sensing apparatus 1000.

The example where the presenter videoconferencing terminal 100 is a touch screen will now be discussed in further detail in reference to Figures 10a, 10b and 11 . Figures 10a and 10b illustrate an optional example of a touch sensing apparatus 1000 known as ‘above surface optical touch systems’. In some examples, the presenter videoconferencing terminal 100 comprises the touch sensing apparatus 1000. Whilst the touch sensing apparatus 1000 as shown and discussed in reference to Figures 10a and 10b can be an above surface optical touch system, alternative touch sensing technology can be used.

For example, the examples discussed with reference to the Figures 10a, 10b and 11 can be applied to any other above surface optical touch system configuration as well as non-above surface optical touch system types which perform touch detection in frames.

In some examples the touch sensing apparatus 1000 can use one or more of the following including: frustrated total internal reflection (FTIR), resistive, surface acoustic wave, capacitive, surface capacitance, projected capacitance, above surface optical touch, dispersive signal technology and acoustic pulse recognition type touch systems. The touch sensing apparatus 1000 can be any suitable apparatus for detecting touch input from a human interface device. The touch sensing apparatus 1000 will now be discussed in reference to Figure 10a and Figure 10b. Figure 10a shows a schematic side view of a touch sensing apparatus 1000. Figure 10b shows a schematic top view of a touch sensing apparatus 1000.

The touch sensing apparatus 1000 comprises a set of optical emitters 1004 which are arranged around the periphery of a touch surface 1008. The optical emitters 1004 are configured to emit light that is reflected to travel above a touch surface 1008. A set of optical detectors 1006 are also arranged around the periphery of the touch surface 1008 to receive light from the set of optical emitters 1004 from above the touch surface 1008. An object 1012 that touches the touch surface 1008 will attenuate the light on one or more propagation paths D of the light and cause a change in the light received by one or more of the optical detectors 1006. The location (coordinates), shape or area of the object 1012 may be determined by analysing the received light at the detectors.

In some examples, the optical emitters 1004 are optionally arranged on a substrate 1034 such as a printed circuit board, and light from the optical emitters 1004 travel above the touch surface 1008 of a touch panel 1002 via reflection or scattering on an edge reflector I diffusor 1020. The emitted light may propagate through an optional light transmissive sealing window 1024.

The optional light transmissive sealing window 1024 allows light to propagate therethrough but prevents ingress of dirt into a frame 1036 where the electronics and other components are mounted. The light will then continue until deflected by a corresponding edge reflector I diffuser 1020 at an opposing edge of the touch panel 1002, where the light will be scattered back down around the touch panel 1002 and onto the optical detectors 1006. The touch panel 1002 can be a light transmissive panel for allowing light from the presenter display 104 propagating therethrough.

In some examples the touch panel 1002 is a sheet of glass. Alternatively, in some other examples, the touch panel 1002 is a sheet of any suitable light transmissive material such as polymethyl methacrylate, or any other suitable light transmissive plastic material. In this way, the touch sensing apparatus 1000 comprising the light transmissive touch panel 1002 may be designed to be overlaid on or integrated into the presenter display 104. This means that the presenter display 104 can be viewed through the touch panel 1002 when the touch panel 1002 is overlaid on the presenter display 104.

The touch sensing apparatus 1000 allows an object 1012 that is brought into close vicinity of, or in contact with, the touch surface 1008 to interact with the propagating light at the point of touch. In Figure 10a, the object 1012 is a user’s hand, but additionally or alternatively is e.g. a pen (not shown). In this interaction, part of the light may be scattered by the object 1012, part of the light may be absorbed by the object 1012, and part of the light may continue to propagate in its original direction over the touch panel 1002.

The optical detectors 1006 collectively provide an output signal, which is received and sampled by the processor unit 120. The output signal may contain a number of subsignals, also denoted "projection signals", each representing the energy of light emitted by a certain optical emitter 1004 and received by a certain optical detector 1006. It is realized that the touching object 1012 results in a decrease (attenuation) of the received energy on one or more detection lines D as determined by the processor unit 120.

In addition to the processes mentioned above in reference to Figures 1 and 2, the processor unit 120 may be configured to process the projection signals so as to determine a distribution of signal strength values (for simplicity, referred to as a "touch surface pattern") across the touch surface 1008, where each signal strength value represents a local attenuation of light. The processor unit 120 is configured to carry out a plurality of different signal processing steps in order to extract touch data for at least one object. Additional signal processing steps may involve filtering, back projection, smoothing, and other post-processing techniques as described in WO 2011/139213, which is incorporated herein by reference. In some examples the filtering and smoothing of the reconstructed touch data is carried out by a filtering module 1120 as shown in Figure 11. The signal processing is known and will not be discussed in any further detail for the purposes of brevity. Turning back to Figure 10b, in the illustrated example the touch sensing apparatus 1000 also includes a controller 1016 which is connected to selectively control the activation of the optical emitters 1004 and, possibly, the readout of data from the optical detectors 1006. The processor unit 120 and the controller 1016 may be configured as separate units, or they may be incorporated in a single unit. In some examples the processing unit 120 can be a touch controller. The reconstruction and filtering modules 1118, 1120 of the processor unit 120 may be configured as separate units, or they may be incorporated in a single unit. One or both of the reconstruction and filtering modules 1118, 1120 may be at least partially implemented by software executed by the processing unit 120.

The relationship between the touch sensing apparatus 1000 and the presenter display 104 will now be discussed in reference to Figure 11. Figure 11 shows a schematic representation of a videoconferencing terminal 100 comprising the touch sensing apparatus 1000.

In some examples, the display controller 108 can be separate from the presenter display 104. In some examples, the display controller 108 can be incorporated into the processing unit 120.

The presenter display 104 can be any suitable device for visual output for a user such as a monitor. The presenter display 104 is controlled by the display controller 108. Presenter displays 104 and display controllers 108 are known and will not be discussed in any further depth for the purposes of expediency.

In some examples, the presenter display 104 comprises a plurality of layers such as filters, diffusers, backlights, and liquid crystals. Additional or alternative components can be provided in the plurality of layers depending on the type of presenter display 104. In some examples, the display device is an LCD, a quantum dot display, an LED backlit LCD, a WLCD, an OLCD, a plasma display, an OLED, a transparent OLED, a POLED, an AMOLED and / or a Micro LED. In other examples, any other suitable presenter display 104 can be used in the videoconferencing terminal 100. The host control device 1102 may be connectively coupled to the touch sensing apparatus 1000. The host control device 1102 receives output from the touch sensing apparatus 1000. In some examples the host control device 1102 and the touch sensing apparatus 1000 are connectively coupled via a data connection 1112 such as a USB connection. In other examples other wired or wireless data connection 1112 can be provided to permit data transfer between the host control device 1102 and the touch sensing apparatus 1000. For example, the data connection 1112 can be ethernet, firewire, Bluetooth, Wi-Fi, universal asynchronous receiver-transmitter (UART), or any other suitable data connection. In some examples there can be a plurality of data connections between the host control device 1102 and the touch sensing apparatus 1000 for transmitting different types of data. The touch sensing apparatus 1000 detects a touch object when a physical object is brought in sufficient proximity to, a touch surface 1008 so as to be detected by one or more optical detector 1006 in the touch sensing apparatus 1000. The physical object may be animate or inanimate. In preferred examples the data connection 1112 is a human interface device (HID) USB channel. The data connection 1112 can be a logical or physical connection.

In some examples the touch sensing apparatus 1000, the host control device 1102 and the presenter display 104 are integrated into the same videoconferencing terminal 100 such as a laptop, tablet, smart phone, monitor or screen. In other examples, the touch sensing apparatus 1000, the host control device 1102 and the presenter display 104 are separate components. For example, the touch sensing apparatus 1000 can be a separate component mountable on a display screen.

The host control device 1102 may comprise an operating system 1108 and one or more applications 1110 that are operable on the operating system 1108. The one or more applications 1110 are configured to allow the user to interact with the touch sensing apparatus 1000 and the presenter display 104. The operating system 1108 is configured to run one or more applications 1110 and send output information to the display controller 108 for displaying on the presenter display 104. The applications 1110 can be drawing applications or whiteboards applications for visualising user input. In other examples the applications 1110 can be any suitable application or software for receiving and displaying user input. Turning back to Figure 2, the face detection module 114, the image processing module 116, and the eye tracking module 118 will now be discussed in further detail. The videoconferencing terminal 100 comprises at least one first camera 210 for capturing image data of the presenter 204. In some examples the camera module 102 comprises a first camera 210 and a second camera 212. As shown in Figure 2, the videoconferencing terminal 100 comprises a first camera 210 and a second camera 212 mounted to the presented display 104. The first camera 210 and the second camera 212 are connected to the videoconferencing controller 110 and are configured to send image data of the presenter 204 to the videoconferencing controller 110.

The first and second cameras 210, 212 are configured to capture images of the presenter 204. The captured images of the presenter 204 are used for determining a gaze direction G of the presenter 204 with respect to the displayed image 220 on the presenter display 104. The captured images of the presenter 204 can also be used for providing a video stream of the presenter 204 to the remote videoconferencing terminal 202 during the video conference. Optionally, the first and second cameras 210, 212 are RGB cameras and configured to capture colour images of the presenter 204. Alternatively, the first and second cameras 210, 212 can be near-infrared cameras. In other examples, the first and second cameras 210, 212 can be any suitable camera.

Optionally the camera module 102 further comprises a third camera 224 for capturing images for the video stream of the presenter 204. The third camera 224 as shown in Figure 2 is mounted on the top of the presenter display 104. In other examples, the third camera 224 can be mounted in any suitable position or orientation with respect to the presenter display 104 and the presenter 204.

In some examples the first and second cameras 210, 212 are solely used for determining the gaze direction G of the presenter 204 and a third camera 224 is used for capturing the video stream of the presenter 204. The third camera 224 is a camera configured to capture colour image data (e.g. a RGB camera) of the presenter 204. Whilst Figure 2 shows a plurality (e.g. two) cameras 210, 212 for determining a gaze direction G of the presenter 204, the videoconferencing terminal 100 can comprise a single camera 210 for determining a gaze direction G of the presenter 204. In some examples the first and second cameras 210, 212 are mounted on opposite sides of the presenter display 104. Figure 2 shows the first and second cameras 210, 212 mounted on the top two corners of the presenter display 104, but the first and second cameras 210, 212 can be mounted at any position on the presenter display 104. In other examples the first and second cameras 210, 212 can be mounted remote from the presenter display 104. For example, the first and second cameras 210, 212 can be mounted on the ceiling or the wall near the presenter display 104. By separating the first and second cameras 210, 212 by a large distance, the determination of the presenter gaze direction G is more accurate.

In some other examples, the first and second cameras 210, 212 are mounted behind the presenter display 104. In this case, the first and second cameras 210, 212 are near-infrared cameras (NIR) and the presenter display 104 is optically transmissive to the near-infrared light. In some examples, the first and second cameras 210, 212 comprises a first illumination source 222 of near-infrared light for illuminating the presenter 204. As shown in Figure 2 the first illumination source 222 is mounted on the top of the presenter display 104, but the first illumination source 222 can be mounted in any suitable position for example along a centre line (not shown) of the presenter display 104. The first illumination source 222 can be a near-infrared light source such as an LED mounted to the first and I or the second camera 210, 212. Alternatively, the first illumination source 222 is mounted on the presenter display 104 remote from the first and second cameras 210, 212.

The presenter display 104 as shown in Figure 2 is showing a displayed image 220. The displayed image 220 is duplicated on a remote user display 218 on the remote videoconferencing terminal 202 with a duplicated remote image 216. The videoconferencing controller 110 is configured to display an image on the presenter display 104 and the remote user display 218 as shown in step 700 in Figure 7. Figure 7 shows a flow diagram of a method according to an example.

In this way, the remote user 206 views the same presented material as the local participants 208. Whilst the remote videoconferencing terminal 202 is shown displaying the duplicated remote image 216, the remote videoconferencing terminal 202 can also display other information as well such as an application window comprising a video stream of the presenter 204. Although not shown in Figures 2 to

6, the displayed image 220 may comprise a window comprising a video stream of one or more of the remote users 206. Similarly, the remote image 216 may comprise an application window comprises a video stream of the presenter 204.

During the videoconference, in some examples, the videoconferencing controller 110 detects whether the presenter 204 is looking at the presenter display 104 and this will be discussed further below.

The videoconferencing controller 110 receives one or more images from the first and or the second cameras 210, 212 of the presenter 204 as shown in step 702 in Figure

7. The videoconferencing controller 110 then sends the one or more images to the face detection module 114. The face detection module 114 determines the orientation and position of the face of the presenter 204 based on feature detection.

The face detection module 114 detects the position of the eyes 214 of the presenter 204 in a received image. In this way, the face detection module 114 determines the gaze direction of the presenter 204 as shown in step 704 of Figure 7. The face detection module 114 uses feature detection on an image of the presenter 204 to detect where the eyes 214 and the face of the presenter 204 are with respect to the presenter display 104. For example, the face detection module 114 may determine that only one eye or no eyes 214 of the presenter 204 are observable to the first or second camera 210, 212.

The face detection module 114 then sends a face detection signal or face position and or face orientation information of the presenter 204 to the videoconferencing controller 110. The face detection signal or face position and or face orientation information can comprise information whether no face is detected or whether the direction of the face of the presenter 204 is away from the presenter display 104.

The videoconferencing controller 110 then determines whether the presenter 204 is looking at the presenter display 104 based on the received signal from the face detection module 114. If the videoconference controller 110 does not receive a face detection signal from the face detection module 114, then the videoconference controller 110 determines that the presenter 204 is not looking at the presenter display 104. In this way, the videoconferencing controller 110 is able to determine the “general” gaze direction G of the presenter 204 based on a detection of the face of the presenter 204. In other words, the videoconferencing controller 110 determines that the gaze direction G in that the presenter 204 is looking at the presenter display 104 or not. If the videoconferencing controller 110 determines that the presenter gaze direction G is at the presenter display 104, the videoconferencing controller 110 determines that the whole displayed image 220 on the presenter display 104 constitutes a first presenter gaze object 226 as shown in step 706 of Figure 7.

The first presenter gaze object 226 is the object on the presenter display 104 that the presenter 204 is currently looking at. This may be the whole displayed image 220 on the presenter display 104. The label 226 is used to indicate that the first presenter gaze object 226 is the whole displayed image 220. Alternatively, another e.g. a second presenter gaze object 228 may be a specific part of the displayed image 220 e.g. one or more of an application window, an application tab, a user selection on the displayed image 220, an area of the displayed image 220, a delimited object in the displayed image 220, presenter annotation on the displayed image 220. Here the label 228 is used to indicate that the other presenter gaze object 228 is a specific element within the whole displayed image 220. The videoconferencing controller 110 can contextually determine the first and second presenter gaze objects 226, 228 based on one or more presenter 204 interactions with the videoconferencing terminal 100.

When the videoconferencing controller 110 determines that the presenter 204 is not looking at the presenter display 104, in some examples, the videoconferencing controller 110 can issue a signal to image processing module 116. The image processing module 116 can modify a display signal sent to the remote videoconferencing terminal 202 indicating that the presenter 204 is not looking at the presenter display 104 as shown in step 706 of Figure 7.

This may be helpful when the presenter 204 has turned away from the videoconferencing terminal 100 and is speaking directly to the local participants 208. This information may be helpful for the remote user 206 to follow the discussion during a presentation during a videoconference. In some, examples the remote videoconferencing terminal 202 issues an alert to the remote user 206 that the presenter 204 is not looking at the presenter display 104. The remote videoconferencing terminal 202 may modify one or more elements of the duplicate remote image 216 on the remote user display 218 to indicate the presenter 204 engagement with the presenter display 104. For example, the duplicate remote image 216 can change colour e.g. turn the one or more application windows black and white when the presenter 204 is not looking at the presenter display 104.

In some examples, the videoconferencing controller 110 determines a more precise presenter gaze direction G. A more precise determination of the presenter gaze direction G can be useful for showing the remote user 206 which part of the displayed image 220 the presenter 204 is looking at. This will now be discussed in further detail.

As mentioned previously, the videoconferencing terminal 100 comprises a first illumination source 222 of near-infrared light configured to illuminate the presenter 204. The infrared light is transmitted to the presenter 204 and the infrared light is reflected from the presenter eyes 214. The first and second cameras 210, 212 detect the reflected light from the presenter eyes 214.

The first and second cameras 210, 212 are configured to send one or more image signals to the videoconferencing controller 110 as shown in step 702. The videoconferencing controller 110 sends the image signals to the eye tracking module 118. Since the placement of the first and second cameras 210, 212 and the first illumination source 222 are known to the videoconferencing controller 110, the eye tracking module 118 determines through trigonometry the gaze direction G of the presenter 204 as shown in step 704. Determining the presenter gaze direction G from detection of reflected light from the eye 214 of the presenter 204 is known e.g. as discussed in US 6,659,661 which is incorporated by reference herein.

Alternatively, in some examples, the videoconferencing controller 110 determines the direction of the face of the presenter 204 based on feature detection. For example, the eye tracking module 118 determines the location of eyes 214 of the presenter 204 with respect to the nose 230 from the received image signals. In this way, the eye tracking module 118 determines the presenter gaze direction G as shown in step 704. Determining the presenter gaze direction G from facial features is known e.g. as discussed in DETERMINING THE GAZE OF FACES IN IMAGES A. H. Gee and R. Cipolla, 1994 which is incorporated by reference herein.

Alternatively, in some other examples, the eye tracking module 118 determines the presenter gaze direction G based on a trained neural network classifying the direction of the presenter eyes 214 processing the received one or more image signals from the first and second cameras 210, 212 as shown in step 704. Classifying the presenter gaze direction G from a convolutional neural network is known e.g. as discussed in Real-time Eye Gaze Direction Classification Using Convolutional Neural Network Anjith George, and Aurobinda Routray 2016 wh ch is incorporated herein by reference.

The eye tracking module 118 determines the presenter gaze direction G and sends a signal to the videoconferencing controller 110 comprising information relating to the presenter gaze direction G.

Once the videoconferencing controller 110 receives the information relating to the presenter gaze direction G, the videoconferencing controller 110 determines which part e.g. an intersection point 236 of the displayed image 220 on the presenter display 104 that the presenter gaze direction G intersects.

One or more image objects on the presenter display 104 that are near the intersection point 236 of presenter gaze direction G with the presenter display 104 or intersect with the presenter gaze direction G may be selected by the videoconferencing controller 110 as a presenter gaze object 228 as shown in step 706. As mentioned above, the presenter gaze object 228 may be one or more of an application window, an application tab, a user selection on the displayed presenter image 220, an area of the displayed image 220, a delimited object in the displayed image 220, presenter annotation on the displayed image 220.

For example, the application window 240 as shown in Figure 2 is selected by the videoconferencing controller 110 as the presenter gaze object 228 because the determined presenter gaze direction G intersects with the application window 240. In some examples, the videoconferencing controller 110 determines that the presenter gaze direction G intersects with the displayed image 220 at an intersection point 236 close to a number of different image elements in the displayed image 220. The videoconferencing controller 110 determines an area of interest 600 (best shown in Figure 6) in step 800 of Figure 8. Figure 12 shows an example of the presenter gaze direction G intersecting with the image. As can be seen from Figure 12, the displayed image 220 comprises a plurality of different image elements 230, 232, 234, 242 within a single application window 204 close to an intersection point 236 of the presenter gaze direction G with the presenter image 220.

The plurality of different image elements 230, 232, 234, 242 as shown in Figure 12 are some text 234, an image 242, an area of the image 232 and some presenter annotations 230 and the application window 240. The different image elements 230, 232, 234, 240, 242 can be any other part of the presenter image 220 as required.

Since the plurality of different image elements 230, 232, 234, 240, 242 close to an intersection point 236 of the presenter gaze direction G with the presenter image 220, the videoconferencing controller 110 determines that there are a plurality of candidate presenter gaze objects 230, 232, 234, 240, 242 as shown in step 802 of Figure 8. In other words, the videoconferencing controller 110 determines the plurality of candidate presenter gaze objects 230, 232, 234, 240, 242 have a likelihood that the presenter 204 is looking at one of the candidate presenter gaze objects 230, 232, 234, 240, 242. In some examples, the presenter gaze object 230, 232, 234, 240, 242 is a person in a video conference application window 240. For example, the presenter 204 is asking a question to a specific person in the video conference application window 240. In this case, the video conference application window 240 comprises one or more participants e.g. a remote user 206 and the presenter gaze object 230, 232, 234, 240, 242 is the image of one or more remote users 206 in the application window 204.

In some examples, the videoconferencing controller 110 optionally selects the presenter gaze object 228 from one of the candidate presenter gaze objects 230, 232, 234, 240, 242. This is indicated in Figure 7 by the steps in labelled “A” and shown in detail in Figure 8. The videoconferencing controller 110 selects the presenter gaze object 228 based on one or more predetermined selection algorithms.

In a first example, the videoconferencing controller 110 selects the presenter gaze object 228 based on the most recent element of the displayed image 220 that was previously selected as the presenter gaze object 228 as shown in step 804. For example in Figure 12, the presenter 204 was previously looking at an image 242 comprising a thinking bubble image and the videoconferencing controller 110 then selects the thinking bubble image 242 as the presenter gaze object 228.

Whilst time elapsed since initialization can be a criterium for selecting the presenter gaze object 228 from the candidate presenter gaze objects 230, 232, 234, 240, 242, one or more other criteria can be used for selection.

In another example, the videoconferencing controller 110 selected the presenter gaze object 228 based on the most recent image element to receive focus e.g. with an input device such as a mouse. For example, the presenter 204 has just moved the mouse cursor 238 over the application window 240 comprising thinking bubble image 242 and the videoconferencing controller 110 then selects this as the presenter gaze object 228. In some examples, the presenter 204 interaction with the displayed image 220 and presenter display 104 can be with a touch event. The touch event will be detected by the touch sensing apparatus 1000 described above. Accordingly, the videoconferencing controller 110 then selects the closest candidate presenter gaze objects 230, 232, 234, 240, 242 to the touch event as the presenter gaze object 228 as shown in step 806.

In another example, the videoconferencing controller 110 selects the presenter gaze object 228 based on the most recently drawn, initiated, or created element in the displayed image 220. In this case the presenter 204 has drawn the thinking bubble image 242 and the videoconferencing controller 110 then selects this as the presenter gaze object 228. In another example, the videoconferencing controller 110 selects the presenter gaze object 228 based on a time period for the candidate presenter gaze objects 230, 232, 234, 240, 242 from the last presenter interaction and / or a time period from initialization and selecting the presenter gaze object 228 from the candidate presenter gaze objects 230, 232, 234, 240, 242 based on the shortest time period.

In another example, the videoconferencing controller 110 selects the presenter gaze object 228 based on received manual input from the presenter 204. For example, the presenter 204 can manually select an element of the displayed image 220 to become the presenter gaze object 228. The received manual input can be from a mouse, keyboard, or any other user input such as a touch event from the touch sensing apparatus 1000.

In another example as shown in Figure 6, the videoconferencing controller 110 selects the presenter gaze object 228 from all the candidate presenter gaze objects 230, 232, 234, 240, 242 within an area of interest 600 having a distance D from the intersection point 236. This means that multiple presenter gaze objects 602, 604 may be selected at the same time.

Referring back to Figures 2 and 7, once the videoconferencing controller 110 selects the presenter gaze object 228, the videoconferencing controller 110 sends a signal to the image processing module 116 to modify the displayed image 220 on the presenter display 104 and I or the remote image 216 on the remote terminal 202. The videoconferencing controller 110 sends a signal to the image processing module 116 to modify the presenter gaze object 228. In this way, the presenter 204 and I or the remote user 206 can identify the presenter gaze object 228. This means that the remote user 206 can identify which part of the displayed image 220, the presenter 204 is currently looking at.

In some examples, the image processing module 116 to modifies only the remote image 216 (this is illustrated in Figure 6). This may be helpful so that the local participants 208 and the presenter 204 are not distracted by the modified displayed image 220. In some other examples, the image processing module 116 modifies the displayed image 220 as well as the remote image 216 (this is illustrated in Figures 3, 4, and 5). This may be advantageous because the presenter 204 receives feedback from the videoconferencing controller 110 that the correct presenter gaze object 228 is selected. If the wrong part of the displayed image 220 has been highlighted, the presenter 204 can optionally manually override the highlighting in some examples by e.g. moving the cursor 238 to a particular part of the displayed image 220.

The image processing module 116 modifies the remote image 216 and the displayed image 220 by highlighting the presenter gaze object 228.

Turning to Figure 3, the presenter gaze object 228 and the remote presenter gaze object 300 on the remote image 216 are illustrated as being highlighted with a thicker outline. The image processing module 116 can highlight the presenter gaze object 228 and the remote presenter gaze object 300 in one or more different ways.

For example, the image processing module 116 is configured to modify the brightness, the contrast, the colour, sharpen, or distorting the presenter gaze object 228 and remote presenter gaze object 300 to increase the interest of the remote user 206 in the presenter gaze object 2281 the remote presenter gaze object 300.

The videoconferencing controller 110 may determine that the presenter gaze object 228 is no longer in line with the presenter gaze direction G. The videoconferencing controller 110 may receive a signal from the eye tracking module 118 that the presenter gaze direction G has moved if the presenter gaze direction G is not in line with the presenter gaze object 228 after a predetermined period of time.

When the videoconferencing controller 110 receives a signal that the presenter gaze direction G has shifted away from the presenter gaze object 228, the videoconferencing controller 110 issues a signal to the image processing module 116 to modify the displayed image 220. Accordingly, the image processing module 116 removes the modification to the presenter gaze object 228. For example, highlighting of the presenter gaze object 228 fades away or snaps away if the determined presenter gaze direction G moves away from the determined presenter gaze object 228. The videoconferencing controller 110 can then determine a new presenter gaze object 228 and repeat the steps as shown in Figures 7 and 8 to highlight a new presenter gaze object 228. Accordingly, the videoconferencing controller 110 dynamically selects and deselects parts of the displayed image 220 which are highlighted to the remote user 206 as the presenter 204 looks at different parts of the displayed image 220 during the videoconference.

In some examples as shown in Figure 5, the image processing module 116 can modify other parts 500, 502 of the displayed image 220 and the remote image 216 alternatively or additionally to modifying the presenter gaze object 228. In this case the image processing module 116 modifies the rest of the displayed image 220 is lowlighted e.g. so that the remote user 206 is less interested in other parts 500, 502 of the remote image 216. Whilst Figure 5 shows both highlighting and lowlighting of the displayed image 220 and the remote image 216, the image processing module 116 may be configured to only perform lowlighting and not highlighting of the displayed image 220 and the remote image 216.

For example, the image processing module 116 is configured to modify the brightness, the contrast, the colour, blur, or distorting the presenter gaze object 228 and remote presenter gaze object 300 to decrease the interest of the remote user 206 in the rest of the remote image 216.

In some further examples the remote videoconferencing terminal 202 is identical to the videoconferencing terminal 100. In this way the remote videoconferencing terminal 202 comprises similar cameras to determine a remote user gaze direction and a remote user gaze object. The videoconferencing controller 110 is configured to determine the gaze direction of the remote user 206 with respect to the displayed remote image 216 on the remote user display 218 based on the captured images of the remote user 206. The videoconferencing controller 110 is configured to determining the remote user gaze object in the displayed remote image 216 based on the determined remote user gaze direction. The videoconferencing controller 110 is configured to modify the output display signal to the presenter display 104 configured to display a modified displayed image 220 on the presenter display 104 based on the determined remote user gaze object. This means that the presenter 204 can receive real time feedback of how the remote user 206 is engaging with the subject matter displayed on the remote user display 218. For example, if the presenter 204 determines that the remote user 206 is not looking at the presenter gaze object 228 on the remote user display 218 enough, the presenter 204 can adapt their presentation accordingly.

In some examples as mentioned above with reference to Figures 10a, 10b and 11 , the videoconferencing terminal 100 also comprises a touch sensing apparatus 1000. The touch sensing apparatus 1000 is configured to determine displayed image data comprising spatial position information of the position of the presenter 204 relative to the touch sensing apparatus 1000 as shown in step 900 of Figure 9.

In this case, the touch sensing apparatus 1000 is configured to issue one or more signals to the videoconferencing controller 110 relating to a touch event and the spatial position information. The videoconferencing controller 110 is configured to issue a control signal to the image processing module 116 to modify the image to create a silhouette 400 of a hand 402 presenter 204 on the displayed image 220. The image processing module 116 is configured to generate a presenter representation 400 based on the spatial position information as shown in step 902. The image processing module 116 is configured to modify the remote image 216 to also provide the remote silhouette 404. The videoconferencing controller 110 is then configured to output a display signal to the remote user display 218 to display the presenter representation 400 as shown in step 906 of Figure 9. Additionally or alternatively, videoconferencing controller 110 is configured to output a display signal to modify the presenter display 104 to display the presenter representation 400 as shown in step 904 of Figure 9.

In some examples, the presenter representation 400 is a silhouette of a hand, pointer, stylus, or other indicator object.

In some examples, the videoconferencing controller 110 determine the distance between the presenter gaze object 228 and the presenter representation 400 as shown in step 908. The videoconferencing controller 110 determines that presenter gaze object 228 and the presenter representation 400 are directed to the same image element on the displayed image 220 as shown in step 910. For example, the videoconferencing controller 110 determines that presenter gaze object 228 and the presenter representation 400 are directed to the same image element when the presenter gaze object 228 and the presenter representation 400 are within a predetermined distance e.g. 2cm - 5cm on the displayed image 220.

The videoconferencing controller 110 then sends a signal to the image processing module 116 is configured to modify the output display signal on the remote user display 218 to highlight the present gaze object 228 when the presenter representation 400 is near the determined presenter gaze object 228 as shown in step 906.

Determining the silhouette 400 is described in detail in SE2130042-1 which is incorporated herein by reference.

In some other examples, any of the examples described with reference to Figures 1 to 12 can be used together with a multi-display setups. That is, the videoconferencing terminal 100 and I or the other remote videoconferencing terminals 202 each comprise a plurality of displays 104. In this scenario, the presenter 204 is using the videoconferencing terminal 100 with a plurality of displays 104 in a conference room. The plurality of displays 104 of the videoconferencing terminal 100 are arranged side by side. In this case, the videoconferencing controller 110 is configured to determine the gaze direction of the presenter 204 and which of the plurality of displays 104 the presenter 204 is looking at. In this case, the videoconferencing controller 110 determines which display 104 of the plurality of displays 104 to show to remote participants 104. This may be advantageous if the remote users 206 only have one remote user display 218.

In some other examples, videoconferencing controller 110 is configured to highlight the presenter gaze objects 226, 228 in various ways as described above. The remote users 206 optionally have the ability to zoom in on the presenter gaze objects 226, 228. This may be advantageous if the remote videoconferencing terminals 202 comprise a small remote user display 218 e.g. remote videoconferencing terminals 202 is a smartphone. In another example, two or more examples are combined. Features of one example can be combined with features of other examples.

Examples of the present disclosure have been discussed with particular reference to the examples illustrated. However, it will be appreciated that variations and modifications may be made to the examples described within the scope of the disclosure.

Claims

1. A method of videoconferencing between a presenter and a remote user comprising: outputting a display signal configured to display an image on a presenter display and on a remote user display; receiving one or more images of the presenter; determining a gaze direction of the presenter with respect to the displayed image on the presenter display based on the received images of the presenter; determining a presenter gaze object in the displayed image based on the determined gaze direction; and modifying the output display signal configured to display a modified image on the remote user display based on the determined presenter gaze object.

2. A method of videoconferencing according to claim 1 wherein the method comprises modifying the determined presenter gaze object displayed on the present display.

3. A method according to claims 1 or 2 wherein the modifying comprises highlighting the presenter gaze object in the image.

4. A method according to claim 3 wherein the highlighting comprises increasing the brightness, increasing the contrast, changing the colour, sharpen, or distorting the determined presenter gaze object.

5. A method according to claims 3 or 4 wherein the highlighting is removed when the determined gaze direction moves away from the determined presenter gaze object for more than a predetermined period of time.

6. A method according to any of claims 3 to 5 wherein the highlighting fades away or snaps away if the determined gaze direction moves away from the determined presenter gaze object.

27

7. A method according to any of the preceding claims wherein the modifying comprises lowlighting one or more parts of the displayed image other than the determined presenter gaze object.

8. A method according to claim 7 wherein the lowlighting comprises decreasing the brightness, reducing the contrast the colour, removing the colour, blur, defocus, or distorting the one or more parts of the displayed image other than the determined presenter gaze object.

9. A method according to any of the preceding claims wherein the presenter gaze object is one or more of an application window, an image of a remote user, an application tab, a user selection on the displayed image, an area of the displayed image, a delimited object in the displayed image, presenter annotation on the displayed image.

10. A method according to any of the preceding claims wherein the method comprises determining an area of interest for selecting the presenter gaze object on the displayed image based on the determined gaze direction.

11. A method according to any of the preceding claims wherein the determining the presenter gaze object comprises determining a time period for one or more candidate presenter gaze objects from the last presenter interaction and / or a time period from initialization and selecting the presenter gaze object from the candidate presenter gaze objects based on the shortest time period.

12. A method according to any of the preceding claims wherein the determining a gaze direction comprises detecting a face position and a face orientation with respect to the presenter display.

13. A method according to any of the preceding claims wherein the determining a gaze direction comprises tracking the eyes of the presenter.

14. A method according to any of the preceding claims wherein the method comprises receiving one or more images of the remote user; determining a gaze direction of the remote user with respect to the displayed image on the remote user display based on the captured images of the remote user; determining a remote user gaze object in the displayed image based on the determined gaze direction; and modifying the output display signal configured to display a modified image on the presenter user display based on the determined remote user gaze object.

15. A method according to claim 14 wherein an alert is issued to the presenter if the determined remote user gaze object is not the same as the determined presenter gaze object.

16. A method according any of the preceding claims wherein modifying the output display signal comprises modifying the output display signal to display a modified image on the remote user display to indicate that that presenter is not looking at the presenter display.

17. A method according to any of the preceding claims wherein the method comprises: determining presenter image data comprising spatial position information of the position of the presenter relative to a touch sensing apparatus having a touch surface connected to the presenter display; generate a presenter representation based on the spatial position information; and output the display signal to the remote user display to display the presenter representation.

18. A method according to claim 17 wherein the presenter representation is a silhouette of a hand, pointer, stylus, or other indicator object.

19. A method according to claims 17 or 18 wherein the modifying comprising modifying the output display signal configured to display a modified image on the remote user display when the presenter representation is near the determined presenter gaze object.

20. A method according to any of claims 17 to 19 wherein the method comprises: receiving a touch input from the presenter on the touch surface; determine touch coordinates on the touch surface based on the touch input; generate a touch input representation based on the touch input coordinates; output the display signal to the remote user display to display the touch input representation.

21 . A videoconferencing system comprising: a presenter terminal having: a presenter display configured to display an image; and at least one camera configured to capture an image of the presenter; a remote user terminal having: a remote user display configured to display the image; and a controller configured to determine a gaze direction of the presenter with respect to the displayed image on the presenter display, determine a presenter gaze object in the displayed image based on the determined gaze direction, and modify the output display signal configured to display a modified image on the remote user display based on the determined presenter gaze object.