WO2021097600A1

WO2021097600A1 - Inter-air interaction method and apparatus, and device

Info

Publication number: WO2021097600A1
Application number: PCT/CN2019/119129
Authority: WO
Inventors: 夏璐; 陆勤; 张建顺; 甘启
Original assignee: 华为技术有限公司
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2021-05-27
Also published as: CN111527468A

Abstract

An inter-air interaction method and apparatus, and a terminal device, wherein same are applied to the technical field of human-computer interaction. The method comprises: acquiring a first depth image of a user (S401); identifying eye coordinates of the eyes of the user in the first depth image and first operated object coordinates of an operated object therein (S403); and determining, according to the eye coordinates and the first operated object coordinates, a click position where the user clicks on a screen (S405). By means of acquiring three-dimensional coordinates of the eyes or a dominant eye of a user and three-dimensional coordinates of a finger or an object held by the hand, and then determining the click position of the user on a screen according to a cross point formed by a straight line constituted by two coordinate points being projected onto the screen, inter-air interaction is realized.

Description

一种隔空交互方法、装置和设备Space interaction method, device and equipment

技术领域Technical field

本发明涉及人机交互技术领域，尤其涉及一种隔空交互方法、装置和设备。The present invention relates to the technical field of human-computer interaction, in particular to an air-space interaction method, device and equipment.

背景技术Background technique

如今大屏幕的显示器越来越普及，例如从家庭的电视机到室外的广告牌等设备都采用各类显示器显示图像信息。但是，在用户与设备传统的交互方式中，主要是采用触摸屏和遥控器。然而，在一些场景下，用户即操作者无法直接接触到屏幕，触摸屏方式不可实施，如家用电视机、或户外高楼的显示屏等不适宜采用触摸屏。并且，使用遥控器控制，也有诸多问题，如遥控器容易丢失、操作不便捷、或导致界面显示不自然等。因此，现有的人机交互方式存在缺陷。如何提供一种更为便利的交互方法来控制显示器就成了一个问题。Nowadays, large-screen displays are becoming more and more popular. For example, devices from home televisions to outdoor billboards use various types of displays to display image information. However, in the traditional way of interaction between the user and the device, the touch screen and the remote control are mainly used. However, in some scenarios, the user, ie, the operator, cannot directly touch the screen, and the touch screen method cannot be implemented. For example, the touch screen is not suitable for a home TV or a display screen of an outdoor high-rise building. In addition, there are many problems with the remote control, such as easy loss of the remote control, inconvenient operation, or unnatural interface display. Therefore, the existing human-computer interaction methods have defects. How to provide a more convenient interactive method to control the display has become a problem.

发明内容Summary of the invention

为了克服上述问题，本申请的实施例提供了一种隔空交互方法、装置和设备。In order to overcome the above-mentioned problems, the embodiments of the present application provide an air-space interaction method, device, and equipment.

第一方面，本申请提供一种隔空交互方法，包括：获取用户的第一深度图像，所述第一深度图像包括第一红绿蓝RGB图像和第一深度信息；识别所述第一深度图像中的所述用户的眼睛的眼睛坐标和操控物的第一操控物坐标；根据所述眼睛坐标和所述第一操控物坐标，确定所述用户对所述屏幕进行点击的点击位置。In a first aspect, the present application provides an air-space interaction method, including: acquiring a first depth image of a user, where the first depth image includes a first red, green, and blue RGB image and first depth information; and identifying the first depth The eye coordinates of the user's eyes and the first manipulator coordinates of the manipulator in the image; determine the click position where the user clicks on the screen according to the eye coordinates and the first manipulator coordinates.

本申请实施例提供的隔空交互方法，通过获取用户的眼睛的三维坐标和手指或手中握持的物品的三维坐标后，根据两个坐标点构成的直线投射到屏幕上的交点，确定用户在屏幕上点击的位置，以实现隔空交互。In the air-space interaction method provided by the embodiments of the present application, after obtaining the three-dimensional coordinates of the user’s eyes and the three-dimensional coordinates of the objects held by the fingers or hands, according to the intersection of the two coordinate points projected on the screen, it is determined that the user is Click the position on the screen to realize the air interaction.

在另一个可能的实现中，当所述眼睛坐标和所述第一操控物坐标属于深度图像坐标系时，所述方法还包括：将所述眼睛坐标和所述第一操控物坐标从所述深度图像坐标系转换至空间三维坐标系。In another possible implementation, when the eye coordinates and the first manipulator coordinates belong to a depth image coordinate system, the method further includes: converting the eye coordinates and the first manipulator coordinates from the The depth image coordinate system is converted to the spatial three-dimensional coordinate system.

在另一个可能的实现中，所述根据所述眼睛坐标和所述第一操控物坐标，确定所述用户对所述屏幕进行点击的点击位置，包括：将穿过所述空间三维坐标系中的所述眼睛坐标和所述第一操控物坐标的直线与所述屏幕所在平面的交点，确定为所述点击位置。In another possible implementation, the determining the click position of the user on the screen according to the eye coordinates and the first manipulator coordinates includes: passing through the spatial three-dimensional coordinate system The intersection of the straight line of the eye coordinates and the coordinates of the first manipulator and the plane where the screen is located is determined as the click position.

在另一个可能的实现中，在确定所述用户对所述屏幕进行点击的点击位置之后，还包括：将所述点击位置从所述空间三维坐标系转换至所述屏幕坐标系。In another possible implementation, after determining the click position where the user clicks on the screen, the method further includes: converting the click position from the spatial three-dimensional coordinate system to the screen coordinate system.

在另一个可能的实现中，所述方法还包括：获取所述用户的第二深度图像，所述第二深度图像包括第二GRB图像和第二深度信息，所述第一深度图像和所述第二深度图像是时域上不同时刻的图像；识别所述第二深度图像中的所述操控物的第二操控物坐标；判断所述第一操控物坐标和所述第二操控物坐标的变化幅度是否超过预设阈值；所述根据所述主视眼坐标和所述第一操控物坐标，确定所述用户对所述屏幕进行点击的点击位置，包括：当所述变化幅度超过所述预设阈值时，根据所述主视眼坐标和所述第一操控物坐标，确定所述用户对屏幕进行点击的点击位置。In another possible implementation, the method further includes: acquiring a second depth image of the user, the second depth image including a second GRB image and second depth information, the first depth image and the The second depth image is an image at a different time in the time domain; identifying the second manipulator coordinates of the manipulator in the second depth image; judging the coordinates of the first manipulator and the second manipulator coordinates Whether the magnitude of change exceeds a preset threshold; the determining, according to the coordinates of the dominant eye and the coordinates of the first manipulator, the click position where the user clicks on the screen includes: when the magnitude of change exceeds the When the threshold is preset, the click position where the user clicks on the screen is determined according to the dominant eye coordinates and the first manipulator coordinates.

本申请通过判断用户的手或手中握持的物品是否进行操作，只获取有操作的手指或手中握持的物品的坐标，这样就过滤掉静止状态的用户的手图像，以减少处理器的工作负担。This application judges whether the user's hand or the object held in the hand is operated, and only obtains the coordinates of the operated finger or the object held in the hand, so that the image of the user's hand in a static state is filtered out to reduce the work of the processor burden.

在另一个可能的实现中，所述获取用户的第一深度图像或第二深度图像之前，包括：通过至少一个摄像头获取所述用户的第一图像和第二图像，所述第一图像包括所述第一RGB信息，所述第二图像包括第二RGB信息或所述第一深度信息；根据所述第一图像和所述第二图像，计算出所述第一深度图像或所述第二深度图像。例如，该过程由包括所述至少一个摄像头的设备执行或由执行第一方面或其中任一可能实现方式中的方法的设备或设备内处理器执行。In another possible implementation, before the acquiring the first depth image or the second depth image of the user, the method includes: acquiring the first image and the second image of the user through at least one camera, and the first image includes all the images. The first RGB information, the second image includes the second RGB information or the first depth information; according to the first image and the second image, the first depth image or the second depth image is calculated Depth image. For example, the process is executed by a device including the at least one camera or a device or an in-device processor that executes the method in the first aspect or any one of the possible implementation manners.

第二方面，本申请还提供了一种隔空交互设备，包括屏幕和至少一个摄像头和执行如第一方面或第一方面的各个可能实现的实施例的处理器。In a second aspect, the present application also provides an airborne interactive device, including a screen, at least one camera, and a processor that executes the first aspect or each possible implementation of the first aspect.

第三方面，本申请还提供了一种隔空交互设备，包括：处理器和存储器；所述存储器存储有一个或多个程序，所述一个或多个程序包括指令，所述处理器，用于执行所述指令，使得所述设备执行如第一方面中的任意一种可能实现的实施例。In a third aspect, the present application also provides an air-space interaction device, including: a processor and a memory; the memory stores one or more programs, the one or more programs include instructions, and the processor uses By executing the instruction, the device executes any one of the possible implementation embodiments in the first aspect.

第四方面，本申请还提供了一种可读存储介质，用于存储指令，其特征在于，当所述指令被执行时，使得执行如第一方面中的任意一种可能实现的实施例。In a fourth aspect, the present application also provides a readable storage medium for storing instructions, which is characterized in that when the instructions are executed, any one of the possible embodiments in the first aspect is executed.

第五方面，本申请还提供了一种包含指令的计算机程序设备，其特征在于，当其在设备或处理器上运行时，使得所述设备执行如第一方面中的任意一种可能实现的实施例。In the fifth aspect, the present application also provides a computer program device containing instructions, which is characterized in that when it runs on a device or a processor, the device executes any one of the possible implementations in the first aspect. Examples.

第六方面，本申请还提供了一种隔空交互装置，所述装置执行如第一方面中的任意一种可能实现的实施例。In the sixth aspect, the present application also provides an air-space interaction device, which implements any one of the possible embodiments in the first aspect.

附图说明Description of the drawings

下面对实施例或现有技术描述中所需使用的附图作简单地介绍。The following briefly introduces the drawings needed in the description of the embodiments or the prior art.

图1为本申请实施例提供的一种人机交互的场景示意图；FIG. 1 is a schematic diagram of a human-computer interaction scenario provided by an embodiment of the application;

图2(a)为双目摄像头进行拍摄的场景示意图；Figure 2(a) is a schematic diagram of a scene where the binocular camera is shooting;

图2(b)为双目摄像头进行拍摄的几何模型的结构示意图；Figure 2(b) is a schematic structural diagram of a geometric model taken by a binocular camera;

图3为本申请实施例提供的一种终端设备的结构示意图；FIG. 3 is a schematic structural diagram of a terminal device provided by an embodiment of this application;

图4为本申请实施例提供的一种隔空交互方法的流程示意图；FIG. 4 is a schematic flowchart of an air interaction method provided by an embodiment of this application;

图5为本申请实施例提供的摄像头获取人脸图像的场景示意图；FIG. 5 is a schematic diagram of a scene in which a camera provided in an embodiment of the application obtains a face image;

图6为本申请实施例提供的主摄像头获取的图像中各个位置的坐标示意图；6 is a schematic diagram of coordinates of various positions in an image obtained by a main camera provided by an embodiment of the application;

图7为本申请实施例提供的在隔空交互过程中以主摄像头作为原点的笛卡尔坐标系下各个位置的坐标示意图；FIG. 7 is a schematic diagram of coordinates of various positions in a Cartesian coordinate system with the main camera as the origin during the air interaction process provided by an embodiment of the application;

图8为本申请实施例提供的在屏幕坐标系下的用户点击位置的坐标示意图；FIG. 8 is a schematic diagram of the coordinates of the user's click position in the screen coordinate system provided by an embodiment of the application;

图9为本申请实施例提供的一种隔空交互装置的结构示意图。FIG. 9 is a schematic structural diagram of an air-space interaction device provided by an embodiment of the application.

具体实施方式Detailed ways

下面将结合附图对本实施例的实施方式进行详细描述。The implementation of this embodiment will be described in detail below in conjunction with the accompanying drawings.

本申请实施例提供的一种隔空交互方法，可应用于手机、平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer，UMPC)、手持计算机、上网本、个人数字助理(personal digital assistant，PDA)、电视机、投影设备、虚拟现实设备、广告牌或大屏幕设备等具有屏幕的终端设备中，本申请实施例对此不做任何限制。An air interaction method provided by the embodiments of this application can be applied to mobile phones, tablet computers, notebook computers, ultra-mobile personal computers (UMPC), handheld computers, netbooks, and personal digital assistants (personal digital assistants). , PDA), televisions, projection devices, virtual reality devices, billboards or large-screen devices and other terminal devices with screens, the embodiments of the present application do not make any restrictions on this.

图1为本申请实施例提供的一种人机交互的场景示意图。如图1所示，本申请提供的终端设备包括：屏幕10，用于显示用户所要观看图像的设备。在本申请实施例中，屏幕10可以包括但不限于电视机、或广告牌等屏幕，也可以包括但不限于投影设备投影时作为屏幕的墙面、幕布、或玻璃等设备。终端设备包括但不限于之前提到的手机。FIG. 1 is a schematic diagram of a human-computer interaction scenario provided by an embodiment of the application. As shown in Fig. 1, the terminal device provided by the present application includes: a screen 10, which is a device for displaying an image to be viewed by a user. In the embodiment of the present application, the screen 10 may include, but is not limited to, a television, or a billboard, etc., and may also include, but is not limited to, a wall, curtain, or glass that is used as a screen during projection by a projection device. Terminal devices include but are not limited to the aforementioned mobile phones.

所述终端设备还包括至少一个主摄像头20，用于获取屏幕10正前方一定区域内的人物的GRB图像。主摄像头20可以内置在屏幕10内部，也可以作为独立的装置连接在屏幕10上。在本申请实施例中，主摄像头20包括但不限于可见光摄像头、红外光摄像头或其它类型的摄像头。其中，在本申请中优选的采用红外光摄像头，由于红外线对于人眼来说，是不可见的，这样避免主摄像头20在采集人物图像过程中，对屏幕10正前方人员产生影响。The terminal device also includes at least one main camera 20 for acquiring GRB images of people in a certain area directly in front of the screen 10. The main camera 20 may be built into the screen 10 or connected to the screen 10 as an independent device. In the embodiment of the present application, the main camera 20 includes, but is not limited to, a visible light camera, an infrared light camera, or other types of cameras. Among them, an infrared camera is preferably used in the present application. Since infrared rays are invisible to human eyes, this prevents the main camera 20 from affecting the person directly in front of the screen 10 during the process of collecting images of people.

所述终端设备还包括至少一个辅摄像头30，用于获取屏幕10正前方一定区域内的人物图像的图像深度信息。辅摄像头30可以内置在屏幕10内部，也可以作为独立的装置连接在屏幕10上。在本申请实施例中，辅摄像头30包括但不限于结构光摄像头、时间飞行(time of flight，TOF)摄像头或其它类型的摄像头。The terminal device also includes at least one auxiliary camera 30 for acquiring image depth information of a person image in a certain area directly in front of the screen 10. The auxiliary camera 30 may be built into the screen 10 or connected to the screen 10 as an independent device. In the embodiment of the present application, the auxiliary camera 30 includes but is not limited to a structured light camera, a time of flight (TOF) camera, or other types of cameras.

所述终端设备还包括处理器(具体参照图3中处理器303的描述)，其具有通用计算能力，用于处理主摄像头20获取的RGB图像和辅摄像头30获取的深度信息。The terminal device also includes a processor (refer to the description of the processor 303 in FIG. 3 for details), which has general computing capabilities for processing the RGB image obtained by the main camera 20 and the depth information obtained by the auxiliary camera 30.

需要特别说明的是，副摄像头30可以为如主摄像头20类型相同的摄像头，此时主摄像头20和辅摄像头30构成双目摄像头，可以通过双目摄像头原理计算图像深度信息。在一个可能的实施例中，如图2(a)所示，当主摄像头20和辅摄像头30对空间的一景物点P进行拍摄时，将进行拍摄的过程简化为几何模型，如图2(b)所示，O _L为辅摄像头30的光圈中心点、O _R主摄像头20的光圈中心点、P _L为辅摄像头30的成像点、P _R为主摄像头20的成像点，根据△PP _LP _R和△PO _LO _R的相似关系，计算景物点P与摄像头之间的距离如下： It should be particularly noted that the secondary camera 30 may be a camera of the same type as the main camera 20. In this case, the main camera 20 and the auxiliary camera 30 constitute a binocular camera, and the image depth information can be calculated by the principle of the binocular camera. In a possible embodiment, as shown in Figure 2(a), when the main camera 20 and the auxiliary camera 30 shoot a scene point P in space, the shooting process is simplified to a geometric model, as shown in Figure 2(b) ) as shown, O _L supplemented camera center point of the diaphragm 30, O _R 20 master camera center point of the diaphragm, P _L, supplemented camera image point 30, P _R points based imaging camera 20, according △ PP _L P _{The similar relationship between R} and △PO _L O _R , the distance between the scene point P and the camera is calculated as follows:

其中，基线b为主摄像头20和辅摄像头30的光圈之间的距离，焦距f为主摄像头20和辅摄像头30的成像点与光圈中心的距离，u _L为辅摄像头30光圈中心与辅摄像头30成像点在水平方向上的距离，u _R为主摄像头20光圈中心与主摄像头20成像点在水平方向上的距离，距离z为景物点与光圈中心的距离。 Among them, the baseline b is the distance between the aperture of the main camera 20 and the auxiliary camera 30, the focal length f is the distance between the imaging point of the main camera 20 and the auxiliary camera 30 and the center of the aperture, and u _{L is the} aperture center of the auxiliary camera 30 and the auxiliary camera 30 The distance of the imaging point in the horizontal direction, u _R is the distance between the aperture center of the main camera 20 and the imaging point of the main camera 20 in the horizontal direction, and the distance z is the distance between the scene point and the aperture center.

在操控员(图中为观众D)与终端设备进行隔空交互过程中，处理器通过控制主摄像头20进行拍摄，以获取屏幕10正前方一定区域内的图像或视频，包括但不限于RGB图像。然后主摄像头20将采集的图像或视频发送给处理器。与此同时，处理器通过控制辅摄像头30进行拍摄，以获取屏幕10正前方一定区域内的图像深度信息。然后辅摄像头30将采集的图像深度信息发送给处理器。During the air interaction between the operator (viewer D in the figure) and the terminal device, the processor controls the main camera 20 to shoot to obtain images or videos in a certain area directly in front of the screen 10, including but not limited to RGB images . Then the main camera 20 sends the captured image or video to the processor. At the same time, the processor controls the auxiliary camera 30 to shoot to obtain image depth information in a certain area directly in front of the screen 10. Then the auxiliary camera 30 sends the collected image depth information to the processor.

处理器在接收到主摄像头20发送的图像或视频后，通过人脸识别模块识别出图像或视频中人脸图像和通过手势识别模块识别出图像或视频中手图像。然后根据已有的人脸识别算法计算出人脸图像中眼睛，例如主视眼的瞳孔在图像或视频中的位置，并根据手势识别算法计算出手图像中手指的指尖、指甲或其它部位在图像或视频中的位置。其中人脸识别模块和手势识别模块可以是预置在终端设备中的软件模块或神经网络模型，可以被处理器执行或预置在处理器中，本实施例不做限定。After receiving the image or video sent by the main camera 20, the processor recognizes the face image in the image or video through the face recognition module and recognizes the hand image in the image or video through the gesture recognition module. Then calculate the position of the eyes in the face image, such as the position of the pupil of the dominant eye in the image or video according to the existing face recognition algorithm, and calculate the position of the fingertips, nails or other parts of the fingers in the hand image according to the gesture recognition algorithm. The location in the image or video. The face recognition module and the gesture recognition module may be software modules or neural network models preset in the terminal device, and may be executed by the processor or preset in the processor, which is not limited in this embodiment.

本申请实施例中，如果用户手持遥控笔、签字笔等操控物时，处理器可以获取操控物尖部位置，来替代获取手指的指尖、指甲或其它部位的位置。In the embodiment of the present application, if the user holds a remote control pen, a signature pen and other manipulation objects, the processor may obtain the position of the tip of the manipulation object instead of obtaining the position of the fingertip, nail or other parts of the finger.

处理器在接收到辅摄像头30发送的图像深度信息后，先将得到的瞳孔和手指的指尖、指甲或其它部位在图像或视频中的位置转换为空间坐标中的坐标点；然后计算由瞳孔和手指的指尖、指甲或其它部位在空间坐标中的坐标点构成的直线；最后计算该直线与屏幕10在空间坐标中的平面相交的交点的坐标，得到用户所要对屏幕10进行点击的点击位置。后续，如果用户对屏幕10上显示的内容进行操作时，处理器通过识别用户的手势，然后执行该手势对应的指令，实现在屏幕上进行点击、放大、缩小、移动等操作。After receiving the image depth information sent by the auxiliary camera 30, the processor first converts the position of the pupil and fingertips, nails or other parts in the image or video into coordinate points in the spatial coordinates; then calculates the pupil A straight line formed with the coordinate points of the fingertip, nail or other parts of the finger in the space coordinates; finally, the coordinates of the intersection point where the straight line intersects the plane of the screen 10 in the space coordinates are calculated to obtain the click that the user wants to click on the screen 10 position. Subsequently, if the user performs an operation on the content displayed on the screen 10, the processor recognizes the user's gesture and then executes the instruction corresponding to the gesture to implement operations such as tapping, zooming in, zooming out, and moving on the screen.

本申请实施例通过主摄像头20和辅摄像头30获取用户的眼睛，例如主视眼的某个部位的三维坐标点和手指在进行点击时的手指某个部位的三维坐标点后，根据这两个三维坐标点连接的直线投射到预设屏幕上到的交点，即为用户所要点击的位置，实现在不借助任何工具，与终端设备进行隔空交互。In the embodiment of the present application, the main camera 20 and the auxiliary camera 30 obtain the user’s eyes, such as the three-dimensional coordinate points of a certain part of the main eye and the three-dimensional coordinate points of a certain part of the finger when the finger is clicked. According to these two The straight line connected by the three-dimensional coordinate points is projected to the intersection point on the preset screen, which is the position that the user wants to click, so as to realize the air interaction with the terminal device without the help of any tools.

另外，所述终端设备还可以有至少一个补光灯50，用于向屏幕10正前方一定区域内的进行补光。补光灯50可以内置在屏幕10内部，也可以作为独立的装置连接在屏幕10上。在本申请实施例中，补光灯50包括但不限于可见光照明设备、红外光照明设备等其它类型的照明设备。其中，补光灯50的进行补光的种类和主摄像头20的进行拍摄获取的光源的种类相同，使得补光灯50补充的光更好的为主摄像头20进行拍摄。In addition, the terminal device may also have at least one supplemental light 50 for supplementing light in a certain area directly in front of the screen 10. The fill light 50 may be built into the screen 10 or connected to the screen 10 as an independent device. In the embodiment of the present application, the fill light 50 includes, but is not limited to, other types of lighting equipment such as visible light lighting equipment and infrared light lighting equipment. Wherein, the type of the fill light of the fill light 50 is the same as the type of the light source acquired by the main camera 20 for shooting, so that the light supplemented by the fill light 50 is better for the main camera 20 to shoot.

需要说明的是，后续实施例以用户的主视眼为例进行说明，但实际应用中，被处理的图像中涉及的眼睛也可以是非主视眼，本实施例对此不限定。It should be noted that the following embodiments take the user's dominant eye as an example for description, but in practical applications, the eyes involved in the processed image may also be non-dominant eyes, which is not limited in this embodiment.

图3为本发明实施例提供的一种终端设备的结构示意图。如图3所示的一种终端设备300，该终端设备300包括传感器301，显示器302，处理器303、存储器304、通信接口305以及接口306。终端设备300中的处理器303、存储器304和通信接口305可以通过接口306建立通信连接。传感器301，用于获取包括RGB-D图像、RGB图像和图像深度信息。传感器301可包括主摄像头20和辅摄像头30。显示器302，用于显示处理后的数据，如视频、及虚拟操作界面。显示器302可以为屏幕10。FIG. 3 is a schematic structural diagram of a terminal device provided by an embodiment of the present invention. As shown in FIG. 3, a terminal device 300 includes a sensor 301, a display 302, a processor 303, a memory 304, a communication interface 305, and an interface 306. The processor 303, the memory 304, and the communication interface 305 in the terminal device 300 can establish a communication connection through the interface 306. The sensor 301 is used to obtain information including RGB-D images, RGB images, and image depth. The sensor 301 may include a main camera 20 and an auxiliary camera 30. The display 302 is used to display processed data, such as video, and a virtual operation interface. The display 302 may be the screen 10.

处理器303可以为中央处理器(central processing unit，CPU)。处理器303用于根据至少一张红绿蓝-深度(red green blue-depth map，RGB-D)图像，识别出用户的主视眼的主视眼坐标，以及根据至少一张RGB-D图像，检测操控物在进行操作时，确定操控物的操控物坐标；处理器303还用于将主视眼坐标和操控物坐标从深度图像坐标系转换至空间三维坐标系；然后将穿过空间三维坐标系中的主视眼坐标和第一操控物坐标的直线与显示器302所在平面的交点，确定为点击位置；处理器303还用于将点击位置从空间三维坐标系转换至屏幕坐标系。RGB-D图像在本申请实施例中也叫深度图像。The processor 303 may be a central processing unit (CPU). The processor 303 is configured to identify the dominant eye coordinates of the user's dominant eye according to at least one red green blue-depth map (RGB-D) image, and according to at least one RGB-D image , To detect when the manipulator is operating, determine the manipulator coordinates of the manipulator; the processor 303 is also used to convert the dominant eye coordinates and the manipulator coordinates from the depth image coordinate system to the spatial three-dimensional coordinate system; The intersection of the line between the dominant eye coordinates and the coordinates of the first manipulator in the coordinate system and the plane where the display 302 is located is determined as the click position; the processor 303 is also configured to convert the click position from the spatial three-dimensional coordinate system to the screen coordinate system. The RGB-D image is also called a depth image in the embodiment of this application.

存储器304可以包括易失性存储器(volatile memory)，例如随机存取存储器(random-access memory，RAM)；存储器也可以包括非易失性存储器(non-volatile memory)，例如只读存储器(read-only memory，ROM)，快闪存储器，硬盘(hard disk drive，HDD)或固态硬盘(solid state drive，SSD)；存储器304还可以包括上述种类的存储器的组合。其中，人脸RGB图像、手RGB图像和图像深度信息等数据将存储在存储器304中。另外，存储器304中还将用于存储处理器303执行的用于实现上述实施例的隔空交互对应的程序指令等等。The memory 304 may include a volatile memory (volatile memory), such as a random-access memory (random-access memory, RAM); the memory may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory, ROM), flash memory, hard disk drive (HDD) or solid state drive (SSD); the memory 304 may also include a combination of the foregoing types of memories. Among them, data such as a face RGB image, a hand RGB image, and image depth information will be stored in the memory 304. In addition, the memory 304 will also be used to store the program instructions corresponding to the space interaction and the like executed by the processor 303 for realizing the above-mentioned embodiment.

在图3中，通信接口305可以实现终端设备的对外通信，包括但不限于蜂窝通信、短距离通信或有线通信等。接口306可以是处理器303与其他部件交互的接口或通道。例如，该接口可以是总线或其他接口。接口306可用于连接至传感器301，并用于传递传感器301采集到的各类图像信息至处理器303。In FIG. 3, the communication interface 305 can implement external communication of the terminal device, including but not limited to cellular communication, short-distance communication, or wired communication. The interface 306 may be an interface or channel for the processor 303 to interact with other components. For example, the interface may be a bus or other interface. The interface 306 can be used to connect to the sensor 301 and to transmit various types of image information collected by the sensor 301 to the processor 303.

图4为本申请实施例提供的一种隔空交互方法的流程示意图。可结合图1示例的终端设备，如图4所示，本申请实施例提供的隔空交互方法具体实现的过程如下。步骤S401，处理器303获取用户的第一深度图像。在图1中的终端设备开始进行隔空交互时，处理器303控制主摄像头20和辅摄像头30对屏幕10前方一定区域内进行实时拍摄，将获取的图像信息和深度信息发送到处理器303，处理器303可以通过计算得到深度图像。或者，如果主摄像头20和辅摄像头30作为单独的拍摄设备或模组，其内部有一个单独的处理单元，则将获取的图像信息和深度信息进行处理后，生成深度图像发送给处理器303，此时的处理器303直接接收拍摄该设备或模组发送的深度图像，本实施例对此不限定。FIG. 4 is a schematic flowchart of an air interaction method provided by an embodiment of the application. In combination with the terminal device illustrated in FIG. 1, as shown in FIG. 4, the specific implementation process of the air-space interaction method provided in the embodiment of the present application is as follows. In step S401, the processor 303 obtains a first depth image of the user. When the terminal device in FIG. 1 starts to interact in the air, the processor 303 controls the main camera 20 and the auxiliary camera 30 to take real-time shooting in a certain area in front of the screen 10, and sends the acquired image information and depth information to the processor 303. The processor 303 may obtain the depth image through calculation. Or, if the main camera 20 and the auxiliary camera 30 are used as separate shooting devices or modules, and there is a separate processing unit inside, the acquired image information and depth information are processed, and then the depth image is generated and sent to the processor 303. The processor 303 at this time directly receives and photographs the depth image sent by the device or module, which is not limited in this embodiment.

其中，RGB-D图像获取的方法有两种：第一种是，主摄像头30用于获取屏幕10前方一定区域内人物RGB图像，辅摄像头30用于获取屏幕10前方一定区域内人物的图像深度信息，然后拍摄该设备或模组或处理器303将人物RGB图像和人物的图像深度信息结合，得到RGB-D图像。第二种是，通过两个完全相同的摄像头(主摄像头20和辅摄像头30相同)获取RGB图像后，根据双目摄像头原理，拍摄该设备或模组或处理器303计算出RGB-D图像。不管采用哪种方法，我们都可以将RGB-D图像分为RGB图像和图像深度信息。下面我们以第一种获取RGB-D图像的方法为例讲述本申请实施例的方案。Among them, there are two methods for RGB-D image acquisition: the first is that the main camera 30 is used to obtain the RGB image of a person in a certain area in front of the screen 10, and the auxiliary camera 30 is used to obtain the image depth of a person in a certain area in front of the screen 10. Then, the device or module or processor 303 combines the RGB image of the character and the image depth information of the character to obtain an RGB-D image. The second is to obtain RGB images through two identical cameras (the main camera 20 and the auxiliary camera 30 are the same), and then according to the principle of binocular cameras, the device or module or the processor 303 calculates the RGB-D image by shooting the device or module. No matter which method is adopted, we can divide RGB-D images into RGB images and image depth information. Hereinafter, we will take the first method of acquiring an RGB-D image as an example to describe the solution of the embodiment of the present application.

人脸识别模块对主摄像头20拍摄获取的图像实时进行人脸识别，以捕捉图像内的人脸图像。同时，处理器303可以实时计算人脸识别模块识别出的人脸图像的位置。对于处理器303获取人脸图像的原理，可以参考现有手机、摄像机等设备拍照原理，如图5所示。在主摄像头20进行实时拍摄时，通过人脸识别模块识别出在镜头中人脸图像后，将每个人脸图像通过一个个方框给框出来，每一个人脸框表示一个用户的人脸图像，以方便后期计算人脸图像的位置。The face recognition module performs face recognition on the image captured by the main camera 20 in real time to capture the face image in the image. At the same time, the processor 303 can calculate the position of the face image recognized by the face recognition module in real time. For the principle of the processor 303 to obtain a face image, reference may be made to the principle of taking pictures of devices such as existing mobile phones and cameras, as shown in FIG. 5. When the main camera 20 performs real-time shooting, after the face recognition module recognizes the face image in the lens, each face image is framed by a box, and each face box represents a user's face image , In order to facilitate the later calculation of the position of the face image.

步骤S403，处理器303识别第一深度图像中的用户的眼睛的眼睛坐标和操控物的第一操控物坐标。后续介绍以眼睛是主视眼为例。在人脸识别模块识别人脸图像后，处理器303可以根据应用场景，有选择性的保留部分或全部的人脸识别模块识别出的人脸图像。在一种可能的实施例中，处理器303仅保留有抬手动作的用户的人脸图像。由于在进行隔空交互过程中，终端设备需要通过识别用户的手指动作来实现隔空交互，因此只需要获取有抬手动作的用户的人脸图像即可。对于静止状态的用户或没有抬手动作的用户，处理器303就不用获取其人脸图像，以减少处理器303的工作负担。处理器303可通过运行人脸识别模块和手势识别模块，例如运行人脸神经网络模型和手势神经网络模型，对第一深度图像进行识别，以得到主视眼在空间中的坐标和第一手图像中的至少一个手指在空间中的坐标，相关神经网络模型采用人工智能识别技术，相关技术具体可参照现有技术的描述，此处不做展开。In step S403, the processor 303 recognizes the eye coordinates of the user's eyes and the first manipulator coordinates of the manipulator in the first depth image. The follow-up introduction takes the eye as the dominant eye as an example. After the face recognition module recognizes the face image, the processor 303 may selectively retain part or all of the face images recognized by the face recognition module according to the application scenario. In a possible embodiment, the processor 303 only retains the face image of the user who has raised his hand. Since in the air interaction process, the terminal device needs to realize the air interaction by recognizing the user's finger movement, so it only needs to obtain the face image of the user who has the gesture of raising the hand. For a user in a static state or a user who does not raise his hand, the processor 303 does not need to obtain the face image, so as to reduce the workload of the processor 303. The processor 303 can recognize the first depth image by running a face recognition module and a gesture recognition module, such as a face neural network model and a gesture neural network model, to obtain the coordinates of the main eye in space and the first hand For the coordinates of at least one finger in the image in space, the relevant neural network model adopts artificial intelligence recognition technology. For the specific relevant technology, please refer to the description of the prior art, which will not be expanded here.

在另一种可能的实施例中，处理器303仅保留设定的用户的人脸图像。在本申请实施例在进行隔空隔空交互过程中，如果有两个或两个以上的用户有抬手动作时，处理器303在判定控制者的时候会出现紊乱。因此本申请实施例可以在存储器中预先存储设定的一个或多个用户的人脸图像作为控制者的人脸图像，在人脸识别模块识别出包括设定的控制者在内的多个用户有抬手动作时，处理器303优先判定存储器存储的设定的控制者作为进行隔空交互的控制者。In another possible embodiment, the processor 303 only retains the set face image of the user. In the embodiment of the present application, when two or more users raise their hands during the air-to-air interaction, the processor 303 may be confused when determining the controller. Therefore, the embodiment of the present application can pre-store the set face images of one or more users as the face images of the controller in the memory, and identify multiple users including the set controller in the face recognition module. When there is a hand-raising action, the processor 303 preferentially determines the controller of the settings stored in the memory as the controller for the air interaction.

处理器303得到在人脸识别模块对拍摄获取的图像中识别出符合要求的用户的人脸图像时，将该人脸图像作为进行隔空交互的控制者的人脸图像。处理器303控制人脸识别模块识别出控制者的人脸图像的主视眼的眼球、瞳孔或其它部位，然后计算主视眼的眼球、瞳孔或其它部位在主摄像头20获取的图像中的位置。When the processor 303 obtains the face image of the user who meets the requirements in the image obtained by the face recognition module, it uses the face image as the face image of the controller performing the air interaction. The processor 303 controls the face recognition module to recognize the eyeball, pupil or other parts of the main eye of the controller's face image, and then calculates the position of the eyeball, pupil or other parts of the main eye in the image obtained by the main camera 20 .

其中，处理器303根据主摄像头30获取的图像的分辨率H(Height)×W(Width)，计算控制者的主视眼的眼球、瞳孔或其它部位在主摄像头20获取的图像中的位置A1(Xp，Yp)，如图6所示。随后，处理器303结合辅摄像头30获取的控制者的主视眼的眼球、瞳孔或其它部位的图像深度信息，计算出控制者的主视眼在RGB-D图像中位置A2(Xp，Yp，Zp)。The processor 303 calculates the position A1 of the eyeball, pupil or other parts of the main eye of the controller in the image obtained by the main camera 20 according to the resolution H(Height)×W(Width) of the image obtained by the main camera 30 (Xp, Yp), as shown in Figure 6. Subsequently, the processor 303 combines the image depth information of the eyeball, pupil or other parts of the main eye of the controller obtained by the auxiliary camera 30 to calculate the position A2 (Xp, Yp, Zp).

在本申请实施例中，获取用户的眼睛，一般为用户的主视眼，但是不限于此，也可以基于非主视眼或基于双眼执行处理，对于处理所针对的眼睛数量也不限定。本文涉及的主视眼也叫注视眼、优势眼。从人的生理角度讲，每个人都有一个主视眼，可能是左眼，也可能是右眼。主视眼所看到的东西会被大脑优先接受。对于大多数人来说，右眼为主视眼，所以在此***默认右眼为主视眼。In the embodiment of the present application, the user's eyes are generally obtained as the user's dominant eye, but it is not limited to this, and processing can also be performed based on the non-dominant eye or on the basis of binocular eyes, and the number of eyes targeted by the processing is not limited. The dominant eye involved in this article is also called the fixation eye and the dominant eye. From the perspective of human physiology, everyone has a dominant eye, which may be the left eye or the right eye. What the dominant eye sees will be accepted first by the brain. For most people, the right eye is the dominant eye, so the system defaults that the right eye is the dominant eye.

另外，***可以通过眼睛部位和手指部位的连线方向是否指向屏幕10来判断用户的主视眼。如果用户的左眼部位和手指部位的连线方向指向屏幕10内的某点，而的右眼部位和手指部位的连线方向指向屏幕10外的某点，则认为主视眼为左眼；如果用户的右眼部位和手指部位的连线方向指向屏幕10内的某点，而的左眼部位和手指部位的连线方向指向屏幕10外的某点，则认为主视眼为右眼；如果用户的左眼部位和手指部位的连线方向指向屏幕10内的某点，而的右眼部位和手指部位的连线方向也指向屏幕10内的某点，则默认为主视眼为右眼。In addition, the system can determine the user's dominant eye based on whether the connection direction of the eye part and the finger part points to the screen 10. If the direction of the connection between the user's left eye and the finger points to a point in the screen 10, and the direction of the connection between the right eye and the finger points to a point outside the screen 10, the dominant eye is considered to be the left Eyes; if the direction of the connection between the user's right eye and fingers points to a point in the screen 10, and the direction of the connection between the left eye and the fingers points to a point outside the screen 10, the dominant eye is considered Is the right eye; if the connection direction of the user’s left eye and finger points to a certain point in the screen 10, and the connection direction of the right eye and the finger part also points to a certain point in the screen 10, the default The dominant eye is the right eye.

在手势识别模块识别出用户的手图像后，处理器303可以根据应用场景，有选择性的保留部分或全部的手势识别模块识别出的手图像。在一个可能的实施例中，处理器303根据连续的至少两张图片中的手的位置是否变化，即变化幅度，来判断用户的手是否进行操作，处理器303仅保留有操作动作的手图像。由于主摄像头20获取的图像或视频中不仅包括进行操作的用户的手图像，还包括围观的用户的手图像。但是对静止状态的用户，处理器303就不用获取其手图像，以减少处理器303的工作负担。After the gesture recognition module recognizes the user's hand image, the processor 303 may selectively retain part or all of the hand image recognized by the gesture recognition module according to the application scenario. In a possible embodiment, the processor 303 determines whether the user's hand is operating according to whether the position of the hand in at least two consecutive pictures has changed, that is, the magnitude of the change, and the processor 303 only retains the image of the hand with the operation action. . Because the image or video captured by the main camera 20 includes not only the hand image of the user performing the operation, but also the hand image of the onlooker user. However, for users in a static state, the processor 303 does not need to obtain the image of their hands, so as to reduce the workload of the processor 303.

在一个可能的实施例中，处理器303仅保留特定的动作的手图像。由于用户的手除对终端设备进行操作的动作外，还进行如挠痒、拿东西等动作，但是这些动作并不是对终端设备进行操作的。因此本申请实施例可以在存储器中预先存储设定的操作动作，如抬手、点击等动作，作为检测用户是否对终端设备进行操作的动作。In a possible embodiment, the processor 303 only retains the hand image of a specific action. Since the user's hand performs actions such as tickling and taking things in addition to operating the terminal device, these actions are not for operating the terminal device. Therefore, in the embodiment of the present application, preset operation actions, such as raising a hand, clicking, etc., may be stored in the memory in advance, as actions for detecting whether the user operates the terminal device.

处理器303得到在手势识别模块对拍摄获取的图像中识别出符合要求的用户的手图像时，将该手图像作为进行隔空交互的控制者的手图像。处理器303控制手势识别模块识别出控制者的手图像的手指的指尖、指甲或其它部位，然后计算手指的指尖、指甲或其它部位在主摄像头20获取的图像中的位置。When the processor 303 obtains the hand image of the user that meets the requirements in the image captured by the gesture recognition module, the hand image is used as the hand image of the controller performing the air interaction. The processor 303 controls the gesture recognition module to recognize the fingertips, nails or other parts of the fingers of the controller's hand image, and then calculates the positions of the fingertips, nails or other parts of the fingers in the image captured by the main camera 20.

以上实施例中提到的操控物是人的手。需要特别说明的是，处理器303得到在手势识别模块对拍摄获取的图像中识别出符合要求的用户的手图像时，可能检测到手图像中的手中有遥控笔或签字笔等操控物，则表明用户的手中握持有其他物体作为操控物。此时处理器303可以计算操控物突出的尖部在主摄像头20获取的图像中的位置，来替代手指的指尖、指甲或其它部位的位置。综上，本申请实施例的操控物包括手、手持设备、手持物体或其他代替手实现操控的设施，本实施例对此不限定。The manipulation object mentioned in the above embodiment is a human hand. It should be noted that when the processor 303 obtains the user's hand image that meets the requirements in the image captured by the gesture recognition module, it may detect that there is a remote control pen or a signature pen in the hand in the hand image, which indicates The user's hand holds other objects as manipulation objects. At this time, the processor 303 can calculate the position of the protruding tip of the manipulator in the image obtained by the main camera 20 to replace the position of the fingertip, nail or other parts of the finger. In summary, the manipulation objects in the embodiments of the present application include hands, handheld devices, handheld objects, or other facilities that replace hands to achieve manipulation, which is not limited in this embodiment.

在本申请实施例中，处理器303根据主摄像头30获取的图像的分辨率H(Height)×W(Width)，计算控制者的手指的指尖、指甲或其它部位在主摄像头20获取的图像中的位置B1(Xf，Yf)，如图6所示。随后，处理器303结合辅摄像头30获取的控制者的手指的指尖、指甲或其它部位的图像深度信息，计算出控制者的手指在RGB-D图像中位置B2(Xf，Yf，Zf)。也即是说，本坐标包括了指尖、指甲或其它部位在二维的RGB图像中的坐标B1(Xf，Yf)和深度信息Zf。其中，分辨率中H(Height)表示图像中在垂直方向上占的点数的单位，W(Width)表示图像中在水平方向上占的点数的单位。In the embodiment of the present application, the processor 303 calculates the image obtained on the main camera 20 by the fingertip, nail or other parts of the controller's finger according to the resolution H (Height) × W (Width) of the image obtained by the main camera 30 The position in B1 (Xf, Yf) is shown in Figure 6. Subsequently, the processor 303 combines the image depth information of the fingertip, nail or other parts of the controller's finger acquired by the auxiliary camera 30 to calculate the position B2 (Xf, Yf, Zf) of the controller's finger in the RGB-D image. In other words, the coordinates include the coordinates B1 (Xf, Yf) and depth information Zf of the fingertip, nail or other parts in the two-dimensional RGB image. Among them, in the resolution, H (Height) represents the unit of the number of dots occupied in the vertical direction in the image, and W (Width) represents the unit of the number of dots occupied in the horizontal direction in the image.

步骤S405，处理器303根据眼睛坐标和第一操控物坐标，确定用户对屏幕进行点击的点击位置。上述本申请实施例的处理器303得到在RGB-D图像中的控制者的主视眼的瞳孔的坐标A2(Xp，Yp，Zp)和手指的坐标B2(Xf，Yf，Zf)后，处理器303需要将在RGB-D坐标下的坐标点转换为空间中的笛卡尔坐标系下的坐标点。如图7所示，以主摄像头20为原点，将以与屏幕10所在的平面作为xoy平面，将与屏幕10垂直的方向为z轴方向，其中，以主摄像头20到辅摄像头30的方向为x轴方向，分别与x轴和z轴相垂直的方向为y轴方向。将在RGB-D图像中的主视眼和手指的坐标转换为空间中的笛卡尔坐标系下坐标的计算过程具体如下：In step S405, the processor 303 determines the click position at which the user clicks on the screen according to the eye coordinates and the coordinates of the first manipulator. The processor 303 of the above embodiment of the present application obtains the coordinates A2 (Xp, Yp, Zp) of the pupil of the main eye of the controller and the coordinates B2 (Xf, Yf, Zf) of the finger in the RGB-D image, and then processes The converter 303 needs to convert the coordinate points in the RGB-D coordinates into the coordinate points in the Cartesian coordinate system in space. As shown in Figure 7, taking the main camera 20 as the origin, taking the plane where the screen 10 is located as the xoy plane, and taking the direction perpendicular to the screen 10 as the z-axis direction, where the direction from the main camera 20 to the auxiliary camera 30 is The x-axis direction, the direction perpendicular to the x-axis and the z-axis, respectively, is the y-axis direction. The calculation process of converting the coordinates of the dominant eye and finger in the RGB-D image to the coordinates in the Cartesian coordinate system in space is as follows:

Z _SP＝Z _P Z _SP ＝Z _P

Z _Sf＝Z _f Z _Sf ＝Z _f

其中，Cx、Cy、Fx和Fy为主摄像头20的内参数据，Cx，Cy为图像原点相对于光圈中心成像点的纵横偏移量(单位：像素)，Fx＝f/dx，其中f为相机的焦距，dx为x方向的一个像素占多少长度单位，Fy＝f/dy，其中f为相机的焦距，dy为y方向的一个像素占多少长度单位。Among them, Cx, Cy, Fx, and Fy are the internal parameter data of the main camera 20, Cx, Cy are the vertical and horizontal offset (unit: pixel) of the image origin relative to the imaging point of the aperture center, Fx=f/dx, where f is the camera Dx is the length unit occupied by a pixel in the x direction, Fy=f/dy, where f is the focal length of the camera, and dy is the length unit occupied by a pixel in the y direction.

由上述公式(2)和公式(3)计算得到空间中以主摄像头20为原点的笛卡尔坐标系下的控制者的主视眼的三维坐标A3(Xsp，Ysp，Zsp)和手指的三维坐标B3(Xsf，Ysf，Zsf)。需要特别说明的是，上述实施例是以主摄像头20的光圈中心点与屏幕10的显示层在一个平面上进行举例说明的，如果主摄像头20的光圈中心点与屏幕10的显示层不在一个平面上，此时计算的Zsp和Zsf需要考虑主摄像头20的光圈中心点与屏幕10的显示层之间的距离。The three-dimensional coordinates A3 (Xsp, Ysp, Zsp) of the main eye of the controller and the three-dimensional coordinates of the finger in the Cartesian coordinate system with the main camera 20 as the origin are calculated from the above formula (2) and formula (3) B3 (Xsf, Ysf, Zsf). It should be noted that the above embodiment is illustrated by taking the center point of the aperture of the main camera 20 and the display layer of the screen 10 on the same plane. If the center point of the aperture of the main camera 20 and the display layer of the screen 10 are not on the same plane. Above, the Zsp and Zsf calculated at this time need to consider the distance between the center point of the aperture of the main camera 20 and the display layer of the screen 10.

然后，处理器303根据主视眼坐标A3和手指坐标B3这两个坐标点构成的直线投射到屏幕上的交点，计算手指指向屏幕上的坐标点M1，具体如下：Then, the processor 303 calculates the coordinate point M1 where the finger points to the screen according to the intersection point of the straight line formed by the two coordinate points of the dominant eye coordinate A3 and the finger coordinate B3 projected on the screen, as follows:

Z _point＝0 Z _point = 0

由上述公式(4)计算得到在空间中笛卡尔坐标系下的控制者的手指指向屏幕上的坐标点M1(Xpoint，Ypoint，Zpoint)。The above formula (4) is calculated to obtain the coordinate point M1 (Xpoint, Ypoint, Zpoint) on the screen where the finger of the controller in the Cartesian coordinate system points to the screen.

最后，处理器303在得到空间中笛卡尔坐标系下的控制者手指指向的屏幕坐标M1后，由于屏幕要显示控制者所指示的点，所以需要将空间中笛卡尔坐标系下的坐标点M1转化成屏幕坐标系下的坐标M2。Finally, after the processor 303 obtains the screen coordinates M1 pointed to by the controller's finger in the Cartesian coordinate system in the space, since the screen wants to display the point indicated by the controller, it needs to change the coordinate point M1 in the Cartesian coordinate system in the space. Converted to coordinate M2 in the screen coordinate system.

如图8所示，假设屏幕坐标系的坐标原点为屏幕左下角处的直角点，X轴为向右为正(以图1中屏幕10显示为基准)，Y轴向上为正，屏幕分辨率为H(Height)×W(Width)，主摄像头在屏幕坐标系下位置为(Xc，Yc)，则在屏幕坐标系下的控制者所指示的交点坐标M2计算过程具体如下：As shown in Figure 8, assuming that the origin of the screen coordinate system is the right-angled point at the lower left corner of the screen, the X axis is positive to the right (take the screen 10 shown in Figure 1 as the reference), and the Y axis is positive upwards. The screen resolution The ratio is H(Height)×W(Width), and the position of the main camera in the screen coordinate system is (Xc, Yc), then the calculation process of the intersection coordinate M2 indicated by the controller in the screen coordinate system is as follows:

由上述公式(5)计算得到在屏幕10坐标系下的控制者的手指指向屏幕10上的坐标点M2(Xs，Ys)。然后，控制器根据坐标点M2在屏幕10的以屏幕10左下角处的直角点为原点向右距Xs、且以屏幕10左下角处的直角点为原点向上距Ys处的位置上显示一个如箭头、小手、或圆点等标识，以提示控制者看到自己进行隔空交互在屏幕10上所要指示的坐标点。According to the above formula (5), it is calculated that the finger of the controller in the coordinate system of the screen 10 points to the coordinate point M2 (Xs, Ys) on the screen 10. Then, according to the coordinate point M2, the controller displays a position on the screen 10 with the right-angle point at the lower left corner of the screen 10 as the origin and the right-angle point at the lower left corner of the screen 10 as the origin and the position upwards from Ys. Marks such as arrows, small hands, or dots are used to remind the controller to see the coordinate points that they want to indicate on the screen 10 during the air interaction.

同时，处理器303根据控制者的手的姿态对标识点所处的位置中的文本进行点击、移动、放大等操作。在本申请实施例中，处理器303通过获取连续多张RGB-D图像，并计算用户的手的在RGB-D图像中的位置，然后根据用户的手指的位置是否发生变化，来判断用户的手指的姿势。At the same time, the processor 303 performs operations such as clicking, moving, and zooming in on the text in the position where the marking point is located according to the gesture of the hand of the controller. In the embodiment of the present application, the processor 303 obtains multiple consecutive RGB-D images, calculates the position of the user's hand in the RGB-D image, and then determines the user's position according to whether the position of the user's finger has changed. Finger posture.

隔空交互在一种情况下，处理器303根据在规定的时间内获取的多张RGB-D图像，检测到用户的手指在垂直于屏幕10的方向上移动的距离大于预设距离，且移动的方向指向屏幕10的方向，则处理器303判定控制者的手的姿态为点击操作或落笔操作。然后处理器303打开在屏幕10上的坐标点M2处的文件、APP图标等应用。Space interaction In one case, the processor 303 detects that the distance of the user's finger in the direction perpendicular to the screen 10 is greater than the preset distance based on the multiple RGB-D images acquired within the specified time, and the movement If the direction of is pointing to the direction of the screen 10, the processor 303 determines that the gesture of the controller's hand is a click operation or a pen down operation. Then the processor 303 opens applications such as files and APP icons at the coordinate point M2 on the screen 10.

在第二种情况下，处理器303根据在规定的时间内获取的多张RGB-D图像，检测到用于的手指在垂直于屏幕10的方向上移动的距离大于预设距离，且移动的方向背向屏幕10的方向，则处理器303判定控制者的手的姿态为抬手操作或抬笔操作。然后处理器303停止对在屏幕10上的坐标点M2处的文件、或页面等内容的修改。In the second case, the processor 303 detects that the moving distance of the finger in the direction perpendicular to the screen 10 is greater than the preset distance based on the multiple RGB-D images acquired within the specified time, and the moving If the direction is away from the direction of the screen 10, the processor 303 determines that the gesture of the controller's hand is a hand-raising operation or a pen-raising operation. Then the processor 303 stops modifying the content of the file or page at the coordinate point M2 on the screen 10.

在第三种情况下，处理器303根据在规定的时间内获取的多张RGB-D图像，检测到用户的手指在垂直于屏幕10的方向上移动的距离小于预设距离，且移动的方向有指向屏幕10的方向，也有背向屏幕10的方向，则处理器303判定控制者的手的姿态为连击操作。然后处理器303对在屏幕10上的坐标点M2处的文件、或页面等内容进行重点显示。显示的方法有将坐标点M2周围的文字显示红色、图标放大、或背景变亮等等。In the third case, the processor 303 detects that the distance of the user's finger moving in the direction perpendicular to the screen 10 is less than the preset distance and the direction of movement based on the multiple RGB-D images acquired within the specified time If there is a direction pointing to the screen 10 and a direction facing away from the screen 10, the processor 303 determines that the gesture of the controller's hand is a combo operation. Then, the processor 303 focuses on displaying the content of the file or page at the coordinate point M2 on the screen 10. The display methods include displaying the text around the coordinate point M2 in red, magnifying the icon, or brightening the background, etc.

在第四种情况下，处理器303根据在规定的时间内获取的多张RGB-D图像，检测到用户的手指在与屏幕10平行的平面上移动的距离大于预设距离，且在垂直于屏幕10的方向上并未检测到移动，或移动的距离小于预设距离，则处理器303判定控制者的手的姿态为滑动操作。然后处理器303将在屏幕10上的坐标点M2处的文件、或APP图标等内容移动到最后落手对应的屏幕10上的坐标点位置上。In the fourth case, the processor 303 detects that the distance of the user’s finger on the plane parallel to the screen 10 is greater than the preset distance, and is perpendicular to the If no movement is detected in the direction of the screen 10, or the movement distance is less than the preset distance, the processor 303 determines that the gesture of the controller's hand is a sliding operation. Then the processor 303 moves the file or APP icon and other content at the coordinate point M2 on the screen 10 to the coordinate point position on the screen 10 corresponding to the last touch.

在第五种情况下，处理器303根据在规定的时间内获取的多张RGB-D图像，检测到主摄像头20获取的图像中的手指数量超过一个的时候，处理器303再结合辅摄像头30获取的控制者的手指的图像深度信息，确定各个手指之间的距离是否发生变化。当处理器303检测到多个手指之间的距离不断地放大时，则判定控制者对屏幕上选定的目标进行放大，将在屏幕10上的坐标点M2处的文件、或APP图标等内容进行放大、或将已打开的文件打开；当处理器303检测到多个手指之间的距离不断地缩小时，则判定控制者对屏幕上选定的目标进行缩小，将在屏幕10上的坐标点M2处的文件、或APP图标等内容进行缩小、或将已打开的文件关闭。In the fifth case, when the processor 303 detects that the number of fingers in the image acquired by the main camera 20 exceeds one according to the multiple RGB-D images acquired within the specified time, the processor 303 then combines with the auxiliary camera 30 The acquired image depth information of the fingers of the controller determines whether the distance between the fingers has changed. When the processor 303 detects that the distance between multiple fingers is continuously zoomed in, it is determined that the controller zooms in on the selected target on the screen, and the file at the coordinate point M2 on the screen 10, or APP icon, etc. Zoom in, or open the opened file; when the processor 303 detects that the distance between multiple fingers is continuously reduced, it is determined that the controller will zoom out the selected target on the screen, and the coordinates on the screen 10 will be changed. Tap the file at M2, or the APP icon to zoom out, or close the opened file.

需要说明的是，操作动作不仅限上述五种情况，还可以为其它操作动作，本申请在此不再一一举例。It should be noted that the operation actions are not limited to the above five cases, but may also be other operation actions, and this application will not give examples one by one here.

本申请实施例提供的隔空交互方法，通过获取用户的主视眼中的某个部位的三维坐标和手指中的某个部位的三维坐标后，根据两个坐标点构成的直线投射到屏幕上的交点，确定用户在屏幕上点击的位置，并根据用户的手指的姿态对文件进行各种操作，以实现隔空交互。The space interaction method provided by the embodiments of the present application obtains the three-dimensional coordinates of a certain part of the user’s main eye and the three-dimensional coordinates of a certain part of the finger, and then projects it on the screen according to a straight line formed by two coordinate points. Intersection point is to determine the position where the user clicks on the screen, and perform various operations on the file according to the posture of the user's finger to achieve air interaction.

图9为本申请实施例提供的一种隔空交互装置的结构示意图。如图9所示，本申请实施例提供的隔空交互装置900包括：获取单元901和处理单元903。其中，获取单元901用于获取用户的RGB-D图像。处理单元903用于识别RGB-D图像中的用户的一个眼睛的眼睛坐标和操控物的操控物坐标；以及根据眼睛坐标和第一操控物坐标，确定用户对屏幕进行点击的点击位置。每个单元具体的实现方案可参考之前实施例的介绍。除了获取深度图像外，处理单元903还可执行以上实施例提到的其他操作，具体参照之前实施例的介绍。FIG. 9 is a schematic structural diagram of an air-space interaction device provided by an embodiment of the application. As shown in FIG. 9, the air-space interaction device 900 provided in this embodiment of the present application includes: an acquisition unit 901 and a processing unit 903. Wherein, the acquiring unit 901 is used to acquire the RGB-D image of the user. The processing unit 903 is used to identify the eye coordinates of one eye of the user and the manipulator coordinates of the manipulator in the RGB-D image; and determine the click position where the user clicks on the screen according to the eye coordinates and the first manipulator coordinates. For the specific implementation scheme of each unit, refer to the introduction of the previous embodiment. In addition to acquiring a depth image, the processing unit 903 may also perform other operations mentioned in the above embodiment, and for details, refer to the introduction of the previous embodiment.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请实施例的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the embodiments of the present application.

此外，本申请实施例的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本申请中使用的术语“制品”涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如，计算机可读介质可以包括，但不限于:磁存储器件(例如，硬盘、软盘或磁带等)，光盘(例如，压缩盘(compact disc，CD)、数字通用盘(digital versatile disc，DVD)等)，智能卡和闪存器件(例如，可擦写可编程只读存储器(erasable programmable read-only memory，EPROM)、卡、棒或钥匙驱动器等)。另外，本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可包括但不限于，无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。In addition, various aspects or features of the embodiments of the present application can be implemented as methods, devices, or products using standard programming and/or engineering techniques. The term "article of manufacture" used in this application encompasses a computer program accessible from any computer-readable device, carrier, or medium. For example, computer-readable media may include, but are not limited to: magnetic storage devices (for example, hard disks, floppy disks, or tapes, etc.), optical disks (for example, compact discs (CD), digital versatile discs (DVD)) Etc.), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.). In addition, various storage media described herein may represent one or more devices and/or other machine-readable media for storing information. The term "machine-readable medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.

在上述实施例中，图9中隔空交互装置900可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时，可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时，全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中，或者从一个计算机可读存储介质向另一个计算机可读存储介质传输，例如，所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质，(例如，软盘、硬盘、磁带)、光介质(例如，DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。In the foregoing embodiment, the space-interactive device 900 in FIG. 9 may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

应当理解的是，在本申请实施例的各种实施例中，上述各过程的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that in the various embodiments of the embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution. The execution order of the processes should be determined by their functions and internal logic, and should not be dealt with. The implementation process of the embodiments of the present application constitutes any limitation.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的***、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which is not repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的***、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个***，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者接入网设备等)执行本申请实施例各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application are essentially or the part that contributes to the prior art or the part of the technical solutions can be embodied in the form of a software product, and the computer software product is stored in a storage medium. , Including several instructions to enable a computer device (which may be a personal computer, a server, or an access network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

以上所述，仅为本申请实施例的具体实施方式，但本申请实施例的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请实施例的保护范围之内。The above are only specific implementations of the embodiments of the present application, but the protection scope of the embodiments of the present application is not limited to this. Any person skilled in the art can easily fall within the technical scope disclosed in the embodiments of the present application. Any change or replacement should be covered within the protection scope of the embodiments of the present application.

Claims

一种隔空交互方法，其特征在于，包括：An airborne interaction method, characterized in that it comprises:

获取用户的第一深度图像，所述第一深度图像包括第一红绿蓝RGB信息和第一深度信息；Acquiring a first depth image of the user, where the first depth image includes first red, green, and blue RGB information and first depth information;

识别所述第一深度图像中的所述用户的眼睛的眼睛坐标和操控物的第一操控物坐标；Identifying the eye coordinates of the user's eyes and the first manipulator coordinates of the manipulator in the first depth image;

根据所述眼睛坐标和所述第一操控物坐标，确定所述用户对所述屏幕进行点击的点击位置。According to the eye coordinates and the coordinates of the first manipulator, the click position where the user clicks on the screen is determined.
根据权利要求1所述的方法，其特征在于，所述眼睛坐标和所述第一操控物坐标属于深度图像坐标系时，所述方法还包括：The method according to claim 1, wherein when the eye coordinates and the first manipulator coordinates belong to a depth image coordinate system, the method further comprises:

将所述眼睛坐标和所述第一操控物坐标从所述深度图像坐标系转换至空间三维坐标系。The eye coordinates and the first manipulator coordinates are converted from the depth image coordinate system to a spatial three-dimensional coordinate system.
根据权利要求2所述的方法，其特征在于，所述根据所述眼睛坐标和所述第一操控物坐标，确定所述用户对所述屏幕进行点击的点击位置，包括：The method according to claim 2, wherein the determining the click position where the user clicks on the screen according to the eye coordinates and the first manipulator coordinates comprises:

将穿过所述空间三维坐标系中的所述眼睛坐标和所述第一操控物坐标的直线与所述屏幕所在平面的交点，确定为所述点击位置。The point of intersection of a straight line passing through the eye coordinates and the coordinates of the first manipulator in the three-dimensional coordinate system of the space and the plane where the screen is located is determined as the click position.
根据权利要求1-3中任一项所述的方法，其特征在于，在确定所述用户对所述屏幕进行点击的点击位置之后，还包括：The method according to any one of claims 1 to 3, wherein after determining the click position where the user clicks on the screen, the method further comprises:

将所述点击位置从所述空间三维坐标系转换至所述屏幕坐标系。The click position is converted from the spatial three-dimensional coordinate system to the screen coordinate system.
根据权利要求1-4中任一项所述的方法，其特征在于，所述方法还包括：The method according to any one of claims 1-4, wherein the method further comprises:

获取所述用户的第二深度图像，所述第二深度图像包括第二RGB图像和第二深度信息，所述第一深度图像和所述第二深度图像是时域上不同时刻的图像；Acquiring a second depth image of the user, where the second depth image includes a second RGB image and second depth information, and the first depth image and the second depth image are images at different moments in the time domain;

识别所述第二深度图像中的所述操控物的第二操控物坐标；Identifying the second manipulator coordinates of the manipulator in the second depth image;

判断所述第一操控物坐标和所述第二操控物坐标的变化幅度是否超过预设阈值；Judging whether the variation range of the coordinates of the first manipulator and the coordinates of the second manipulator exceeds a preset threshold;

所述根据所述眼睛坐标和所述第一操控物坐标，确定所述用户对所述屏幕进行点击的点击位置，包括：当所述变化幅度超过所述预设阈值时，根据所述眼睛坐标和所述第一操控物坐标，确定所述用户对屏幕进行点击的点击位置。The determining the click position at which the user clicks on the screen according to the eye coordinates and the coordinates of the first manipulator includes: when the magnitude of change exceeds the preset threshold, according to the eye coordinates And the coordinates of the first manipulator to determine the click position where the user clicks on the screen.
根据权利要求1-5中任一项所述的方法，其特征在于，所述获取用户的第一深度图像或第二深度图像之前，包括：The method according to any one of claims 1 to 5, wherein before the acquiring the first depth image or the second depth image of the user, the method comprises:

通过至少一个摄像头获取所述用户的第一图像和第二图像，所述第一图像包括所述第一RGB信息，所述第二图像包括第二RGB信息或所述第一深度信息；Acquiring a first image and a second image of the user through at least one camera, the first image including the first RGB information, and the second image including the second RGB information or the first depth information;

根据所述第一图像和所述第二图像，计算出所述第一深度图像或所述第二深度图像。According to the first image and the second image, the first depth image or the second depth image is calculated.
一种隔空交互设备，包括屏幕、至少一个摄像头和执行如权利要求1-6所述的处理器。An air-space interaction device, comprising a screen, at least one camera, and a processor that executes according to claims 1-6.
一种隔空交互设备，包括：处理器和存储器；An airborne interactive device, including: a processor and a memory;

所述存储器存储有一个或多个程序，所述一个或多个程序包括指令，The memory stores one or more programs, and the one or more programs include instructions,

所述处理器，用于执行所述指令，使得所述设备执行根据权利要求1-6中的任意项所述的方法。The processor is configured to execute the instructions so that the device executes the method according to any of claims 1-6.
一种可读存储介质，用于存储指令，当所述指令被执行时，使得如权利要求1-6中的任一项所述的方法被实现。A readable storage medium for storing instructions. When the instructions are executed, the method according to any one of claims 1-6 is realized.
一种包含指令的计算机程序设备，当其在终端上运行时，使得所述终端执行如权利要求1-6中的任一项所述的方法。A computer program device containing instructions, when it runs on a terminal, causes the terminal to execute the method according to any one of claims 1-6.
一种隔空交互装置，包括：An air-space interaction device, including:

获取单元，用于获取用户的第一深度图像，所述第一深度图像包括第一RGB图像和第一深度信息；An acquiring unit, configured to acquire a first depth image of a user, where the first depth image includes a first RGB image and first depth information;

处理单元，用于识别所述第一深度图像中的所述用户的眼睛的眼睛坐标和操控物的第一操控物坐标；以及A processing unit for identifying the eye coordinates of the user's eyes and the first manipulator coordinates of the manipulator in the first depth image; and

根据所述眼睛坐标和所述第一操控物坐标，确定所述用户对所述屏幕进行点击的点击位置。According to the eye coordinates and the coordinates of the first manipulator, the click position where the user clicks on the screen is determined.