WO2022088072A1

WO2022088072A1 - Visual tracking method and apparatus, movable platform, and computer-readable storage medium

Info

Publication number: WO2022088072A1
Application number: PCT/CN2020/125402
Authority: WO
Inventors: 吴博
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2022-05-05

Abstract

Disclosed in the embodiments of the present application is a visual tracking method, comprising: acquiring first images captured by at least two cameras on a movable platform; splicing the first images acquired by the at least two cameras to obtain a second image; predicting a region where a tracking object is located and cutting out a target image corresponding to the region from the second image; and visually tracking the tracking object on the basis of the target image.

Description

视觉跟踪方法、装置、可移动平台及计算机可读存储介质Visual tracking method, device, removable platform, and computer-readable storage medium

技术领域technical field

本申请涉及视觉跟踪技术领域，尤其涉及一种视觉跟踪方法、装置、可移动平台及计算机可读存储介质。The present application relates to the technical field of visual tracking, and in particular, to a visual tracking method, device, movable platform, and computer-readable storage medium.

背景技术Background technique

视觉跟踪是一种可以持续确定跟踪对象在图像中位置的算法。在所拍摄的视频帧序列的初始帧指定跟踪对象后，视觉跟踪算法可以在后续帧自动确定出跟踪对象在每一帧的位置。Visual tracking is an algorithm that continuously determines the location of the tracked object in an image. After specifying the tracking object in the initial frame of the captured video frame sequence, the visual tracking algorithm can automatically determine the position of the tracking object in each frame in subsequent frames.

可移动平台通常配备有摄像头，通过摄像头可以拍摄视频帧序列，从而可以在视频帧序列中指定跟踪对象，进行视觉跟踪。但可移动平台的摄像头的视场范围有限，一旦跟踪对象离开了摄像头的视场范围，则会导致发生跟踪丢失。另外，现有的多个摄像头进行视觉跟踪的方案还存在算力资源浪费及关联度不佳导致跟踪结果较差等问题。The movable platform is usually equipped with a camera, through which a video frame sequence can be captured, so that a tracking object can be specified in the video frame sequence for visual tracking. However, the field of view of the camera of the movable platform is limited, and once the tracked object leaves the field of view of the camera, tracking loss will occur. In addition, the existing solutions for visual tracking with multiple cameras also have problems such as waste of computing resources and poor correlation resulting in poor tracking results.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本申请实施例提供了一种视觉跟踪方法、装置、可移动平台及计算机可读存储介质，目的之一是解决跟踪对象离开摄像头的视场范围导致发生跟踪丢失的技术问题，还可以在多个摄像头进行视觉跟踪的情况下节省算力及功耗，且还可以关联不同摄像头的视觉跟踪结果从而提高视觉跟踪效果。In view of this, the embodiments of the present application provide a visual tracking method, device, movable platform and computer-readable storage medium, one of the purposes is to solve the technical problem of tracking loss caused by tracking objects leaving the field of view of the camera, and also It can save computing power and power consumption when multiple cameras perform visual tracking, and can also correlate the visual tracking results of different cameras to improve the visual tracking effect.

本申请实施例第一方面提供了一种视觉跟踪方法，包括：A first aspect of the embodiments of the present application provides a visual tracking method, including:

获取可移动平台上至少两个摄像头拍摄的第一图像；acquiring a first image captured by at least two cameras on the movable platform;

对所述至少两个摄像头获取的所述第一图像进行拼接，得到第二图像；Stitching the first images obtained by the at least two cameras to obtain a second image;

预测跟踪对象所在的区域，并从所述第二图像中裁剪出所述区域对应的目标图像；predicting the area where the tracking object is located, and cropping out the target image corresponding to the area from the second image;

基于所述目标图像对所述跟踪对象进行视觉跟踪。The tracking object is visually tracked based on the target image.

本申请实施例第二方面提供了一种视觉跟踪装置，包括：处理器与存储有计算机指令的存储器；A second aspect of the embodiments of the present application provides a visual tracking device, including: a processor and a memory storing computer instructions;

所述处理器执行所述计算机指令时实现以下步骤：The processor implements the following steps when executing the computer instructions:

本申请实施例第三方面提供了一种可移动平台，包括：A third aspect of the embodiments of the present application provides a movable platform, including:

机体；body;

与所述机体连接的驱动装置，用于为所述可移动平台提供动力；a driving device connected with the body for powering the movable platform;

设置在所述机体上的多个摄像头，所述多个摄像头对应不同的视场，所述视场之间具有重叠区域；a plurality of cameras arranged on the body, the plurality of cameras correspond to different fields of view, and there are overlapping areas between the fields of view;

处理器与存储有计算机指令的存储器，其中，所述处理器在执行所述计算机指令时实现以下步骤：A processor and a memory storing computer instructions, wherein the processor implements the following steps when executing the computer instructions:

获取至少两个摄像头拍摄的第一图像；acquiring a first image captured by at least two cameras;

本申请实施例第四方面提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，所述计算机指令被处理器执行时实现本申请实施例提供的视觉跟踪方法。A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and when the computer instructions are executed by a processor, implements the visual tracking method provided by the embodiments of the present application.

本申请实施例提供的视觉跟踪方法，可以利用至少两个摄像头拍摄的第一图像进行拼接，从而可以得到大视场的第二图像。该第二图像所对应的视场等于所述至少两个摄像头的视场总和，因此，即便跟踪对象运动到其中一个摄像头的视场范围之外，但只要该跟踪对象仍位于所述至少两个摄像头的总视场范围之内，则在大视场的第二图像中仍然可以找到跟踪对象，确定出跟踪对象的位置，减少了跟踪对象丢失的发生。并且，考虑到第二图像的尺寸较大，若对第二图像整体进行视觉跟踪需要耗费较多的算力，因此，可以基于预测的跟踪对象所在的区域对第二图像进行裁剪，从而可以基于尺寸较小的目标图像对跟踪对象进行视觉跟踪，达到节省算力的效果。In the visual tracking method provided by the embodiment of the present application, first images captured by at least two cameras can be used for splicing, so that a second image with a large field of view can be obtained. The field of view corresponding to the second image is equal to the sum of the fields of view of the at least two cameras. Therefore, even if the tracking object moves out of the field of view of one of the cameras, as long as the tracking object is still located in the at least two cameras Within the total field of view of the camera, the tracking object can still be found in the second image with a large field of view, the position of the tracking object is determined, and the occurrence of the loss of the tracking object is reduced. In addition, considering that the size of the second image is relatively large, it will take more computing power to visually track the entire second image. Therefore, the second image can be cropped based on the region where the predicted tracking object is located, so that the The target image with smaller size performs visual tracking on the tracking object to achieve the effect of saving computing power.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

图1是本申请实施例提供的无人机在视觉跟踪时跟踪对象丢失的场景示意图。FIG. 1 is a schematic diagram of a scene in which a tracking object is lost during visual tracking of a UAV according to an embodiment of the present application.

图2是本申请实施例提供的一种无人机的视场分布示意图。FIG. 2 is a schematic diagram of the field of view distribution of an unmanned aerial vehicle provided by an embodiment of the present application.

图3是本申请实施例提供的视觉跟踪方法的流程图。FIG. 3 is a flowchart of a visual tracking method provided by an embodiment of the present application.

图4是本申请实施例提供的一个跟踪对象发生跟踪丢失的场景示意图。FIG. 4 is a schematic diagram of a scenario where tracking of a tracking object is lost according to an embodiment of the present application.

图5是本申请实施例提供的两个跟踪对象发生跟踪丢失的场景示意图。FIG. 5 is a schematic diagram of a scenario in which two tracking objects are lost in tracking according to an embodiment of the present application.

图6是本申请实施例提供的视觉跟踪装置的结构示意图。FIG. 6 is a schematic structural diagram of a visual tracking device provided by an embodiment of the present application.

图7是本申请实施例提供的一种无人机的结构示意图。FIG. 7 is a schematic structural diagram of an unmanned aerial vehicle provided by an embodiment of the present application.

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

视觉跟踪是一种可以持续确定跟踪对象在图像中位置的算法。具体而言，在视频帧序列的初始帧指定跟踪对象之后，视觉跟踪算法可以基于初始帧所指定的跟踪对象，在后续帧中自动确定出跟踪对象在图像中的位置。Visual tracking is an algorithm that continuously determines the location of the tracked object in an image. Specifically, after the tracking object is specified in the initial frame of the video frame sequence, the visual tracking algorithm can automatically determine the position of the tracking object in the image in subsequent frames based on the tracking object specified in the initial frame.

可移动平台如无人机、无人车、机器人等通常配备有摄像头。通过摄像头，可移动平台可以拍摄视频帧序列，从而可以在所拍摄的视频帧序列上指定跟踪对象，对跟踪对象进行视觉跟踪。这里，跟踪对象可以是人、动物等生物，也可以是车、飞机等物体。Movable platforms such as drones, unmanned vehicles, robots, etc. are usually equipped with cameras. Through the camera, the movable platform can shoot a video frame sequence, so that a tracking object can be designated on the captured video frame sequence, and the tracking object can be visually tracked. Here, the tracking object can be a creature such as a person or an animal, or an object such as a car or an airplane.

视觉跟踪算法的目的是确定跟踪对象在所拍摄图像中的具***置，而要确定跟踪对象在所拍摄图像中的具***置，跟踪对象至少要存在于所拍摄的图像中。但摄像头的视场范围是有限的，当跟踪对象运动到摄像头的视场范围以外时，所拍摄的图像中将没有跟踪对象，从而导致跟踪失败，跟踪对象丢失。The purpose of the visual tracking algorithm is to determine the specific position of the tracking object in the captured image, and to determine the specific position of the tracking object in the captured image, the tracking object must exist at least in the captured image. However, the field of view of the camera is limited. When the tracking object moves beyond the field of view of the camera, there will be no tracking object in the captured image, resulting in tracking failure and loss of the tracking object.

可以参考图1，图1示出无人机在视觉跟踪时跟踪对象丢失的场景示意图。其中，虚线的矩形框为无人机的摄像头的视场范围，当跟踪对象运动到矩形框外时，则视觉跟踪算法将跟踪丢失。Referring to FIG. 1 , FIG. 1 shows a schematic diagram of a scene in which a tracking object is lost during visual tracking by a UAV. The dotted rectangle is the field of view of the camera of the drone. When the tracking object moves outside the rectangle, the visual tracking algorithm will lose the tracking.

为解决上述问题，本申请实施例提供了一种视觉跟踪方法，该方法可以应用于可移动平台。如前所述，可移动平台可以是无人机、无人车、无人船、机器人等，这里可以以无人机为例。无人机可以配备有至少两个摄像头，不同的摄像头所对应的视场不同。在一种实施方式中，相邻摄像头之间的视场可以具有重叠区域。在一种实施方式中，无人机可以包括覆盖全方位视场的多个摄像头，比如，在一个例子中，可以参考图2，图2所示的无人机可以包括前、后、左、右、上、下六个方向的摄像头(上、下方向的摄像头未示出)，相邻摄像头之间可以有视场的重叠。这里，作为示例的，六个方向的摄像头可以是辅助摄像头，辅助摄像头可以是区别于主摄像头的，比如，辅助摄像头的画质可以低于主摄像头。通过辅助摄像头，无人机可以实现环境感知、智能避障等功能。To solve the above problem, the embodiment of the present application provides a visual tracking method, which can be applied to a movable platform. As mentioned above, the movable platform can be a drone, an unmanned vehicle, an unmanned ship, a robot, etc., and a drone can be used here as an example. The drone can be equipped with at least two cameras, and different cameras have different fields of view. In one embodiment, the fields of view between adjacent cameras may have overlapping regions. In one embodiment, the UAV may include multiple cameras covering an omnidirectional field of view. For example, in an example, referring to FIG. 2, the UAV shown in FIG. 2 may include front, rear, left, The cameras in the six directions of right, up, and down (the cameras in the up and down directions are not shown) may have overlapping fields of view between adjacent cameras. Here, as an example, the camera in the six directions may be an auxiliary camera, and the auxiliary camera may be different from the main camera. For example, the image quality of the auxiliary camera may be lower than that of the main camera. Through auxiliary cameras, UAVs can realize environmental perception, intelligent obstacle avoidance and other functions.

可以参考图3，图3是本申请实施例提供的视觉跟踪方法的流程图。该方法可以包括以下步骤：Referring to FIG. 3 , FIG. 3 is a flowchart of a visual tracking method provided by an embodiment of the present application. The method may include the following steps:

S302、获取可移动平台上至少两个摄像头拍摄的第一图像。S302: Acquire a first image captured by at least two cameras on the movable platform.

第一图像可以是可移动平台上的摄像头所拍摄的图像，比如可以是无人机上的摄像头拍摄的图像。可移动平台可以包括多个摄像头，在其中的至少两个摄像头进行拍摄后，可以获取该至少两个摄像头各自拍摄的第一图像。这里，所述至少两个摄像头可以有视场上的重叠，从而所拍摄的第一图像之间也具有重叠区域。The first image may be an image captured by a camera on the movable platform, such as an image captured by a camera on a drone. The movable platform may include a plurality of cameras, and after at least two of the cameras are photographed, the first images respectively photographed by the at least two cameras may be acquired. Here, the at least two cameras may have overlapping fields of view, so that there is also an overlapping area between the captured first images.

S304、对所述至少两个摄像头获取的所述第一图像进行拼接，得到第二图像。S304, stitching the first images obtained by the at least two cameras to obtain a second image.

不同摄像头拍摄的第一图像对应不同的视场，而拼接所得的第二图像可以是各个第一图像所对应视场的总和。因此，在一个例子中，若可移动平台包括全方位的摄像头，则根据全方位的各个摄像头拍摄的第一图像拼接所得的第二图像可以是全景图像。而在拼接时，由于第一图像之间存在重叠区域，因此，在一种实施方式中，可以通过特征点匹配的方式进行第一图像之间的拼接，得到第二图像。在一种实施方式中，还可以将第一图像反投影至指定的曲面模型，从而可以在该曲面模型表面得到第二图像，即第二图像可以是曲面图像。这里，指定的曲面模型可以是各种立体形状，比如可以是圆柱体模型、球体模型等。The first images captured by different cameras correspond to different fields of view, and the second image obtained by splicing may be the sum of the fields of view corresponding to the respective first images. Therefore, in one example, if the movable platform includes omnidirectional cameras, the second image obtained by splicing the first images captured by the omnidirectional cameras may be a panoramic image. During splicing, since there is an overlapping area between the first images, in one embodiment, the splicing between the first images may be performed by means of feature point matching to obtain the second image. In one embodiment, the first image can also be back-projected to a specified curved surface model, so that a second image can be obtained on the surface of the curved model, that is, the second image can be a curved surface image. Here, the specified surface model can be various solid shapes, such as a cylinder model, a sphere model, and the like.

对于第一图像之间的重叠区域，可以进行融合处理。具体的融合方式有多种，在一种实施方式中，重叠区域像素的灰度值可以在重叠的两张图像中选择其中一张，将所选择的图像在相同位置的像素的灰度值作为融合后的灰度值。在一种实施方式中，重叠区域像素的灰度值可以是重叠的两张图像中相同位置像素的灰度值进行加权融合的结果。For the overlapping area between the first images, fusion processing may be performed. There are many specific fusion methods. In one embodiment, the gray value of the pixel in the overlapping area can be selected from one of the two overlapping images, and the gray value of the pixel in the same position of the selected image can be used as The fused gray value. In one embodiment, the gray value of the pixels in the overlapping area may be the result of weighted fusion of the gray values of the pixels at the same position in the two overlapping images.

S306、预测跟踪对象所在的区域，并从所述第二图像中裁剪出所述区域对应的目标图像。S306. Predict the area where the tracking object is located, and crop out the target image corresponding to the area from the second image.

S308、基于所述目标图像对所述跟踪对象进行视觉跟踪。S308. Perform visual tracking on the tracking object based on the target image.

拼接所得的第二图像，其所对应的视场范围可以等于所述至少两个摄像头的视场范围总和，因此，即便跟踪对象从所述至少两个摄像头中的任一摄像头的视场范围中离开，但只要该跟踪对象仍在所述至少两个摄像头的视场范围总和内，就可以在第二图像中找到该跟踪对象，可以通过视觉跟踪算法确定出跟踪对象的位置。For the second image obtained by splicing, the corresponding field of view may be equal to the sum of the fields of view of the at least two cameras. Therefore, even if the tracking object falls from the field of view of any one of the at least two cameras leave, but as long as the tracked object is still within the sum of the fields of view of the at least two cameras, the tracked object can be found in the second image, and the position of the tracked object can be determined by a visual tracking algorithm.

在一种实施方式中，可以在整张第二图像上进行跟踪对象的视觉跟踪，确定出跟踪对象所在的位置。但考虑到第二图像是多张第一图像拼接得到的，其尺寸较大，若基于第二图像进行视觉跟踪，则需要耗费大量的算力，因此，在一种实施方式中，可以根据预测的跟踪对象所在的区域，从第二图像裁剪出该预测所得区域对应的目标图像，从而，可以基于裁剪出的小尺寸的目标图像进行视觉跟踪，节约算力。In one embodiment, visual tracking of the tracking object may be performed on the entire second image to determine the location of the tracking object. However, considering that the second image is obtained by splicing multiple first images, and its size is relatively large, if visual tracking is performed based on the second image, it will consume a lot of computing power. The target image corresponding to the predicted region is cropped from the second image, so that visual tracking can be performed based on the cropped target image of small size, saving computing power.

如前文所述，在一种实施方式中，第二图像可以是曲面图像。在从第二图像中裁剪出预测所的区域对应的目标图像时，可以根据将第二图像中该预测所得区域对应的图像正投影至二维平面，从而得到目标图像，该目标图像可以用于对跟踪对象进行视觉跟踪。As mentioned above, in one embodiment, the second image may be a curved image. When the target image corresponding to the predicted region is cropped from the second image, the target image can be obtained by orthographically projecting the image corresponding to the predicted region in the second image to a two-dimensional plane, and the target image can be used for Visual tracking of tracked objects.

对于跟踪对象在第二图像中所处的区域，可以有多种预测方式。在一种实施方式中，可以根据历史的视觉跟踪结果对跟踪对象当前所在的区域进行预测。对跟踪对象的视觉跟踪结果可以包括跟踪对象在图像中的位置信息，因此，在预测跟踪对象的所在的区域时，可以根据历史的视觉跟踪结果中所包含的跟踪对象的历史位置信息，对跟踪对象的运动轨迹进行预测，基于预测出的运动轨迹，可以预测跟踪对象所在的区域。可以举个例子，在持续对视频帧序列中的跟踪对象进行视觉跟踪时，若根据前3帧的视觉跟踪结果可以确定跟踪对象在前3帧的运动轨迹是向右运动，则可以预测该跟踪对象在当前帧所在的区域是上一帧其所在位置靠右的部分。在一种实施方式中，也可以直接预测该跟踪对象在当前帧所在的区域是上一帧所在位置对应的区域，比如，可以将以上一帧所在位置为中心的矩形区域作为预测的该跟踪对象在当前帧所在的区域。For the region where the tracking object is located in the second image, there can be various prediction methods. In one embodiment, the current region of the tracked object may be predicted based on historical visual tracking results. The visual tracking result of the tracking object can include the position information of the tracking object in the image. Therefore, when predicting the region where the tracking object is located, the tracking object can be tracked according to the historical position information of the tracking object contained in the historical visual tracking results. The motion trajectory of the object is predicted, and based on the predicted motion trajectory, the area where the tracking object is located can be predicted. For example, when the tracking object in the video frame sequence is continuously visually tracked, if it can be determined that the tracking object's motion trajectory in the first 3 frames is to the right according to the visual tracking results of the first 3 frames, then the tracking can be predicted. The area where the object is in the current frame is the part to the right of where it was in the previous frame. In one embodiment, it is also possible to directly predict that the area where the tracking object is located in the current frame is the area corresponding to the location of the previous frame. For example, a rectangular area centered on the location of the previous frame can be used as the predicted tracking object. in the region where the current frame is located.

在一种实施方式中，跟踪对象的历史位置信息可以包括跟踪对象在图像中的二维位置信息，该二维位置信息比如可以是二维坐标信息，从而，根据该二维位置信息预测得到的运动轨迹可以是二维运动轨迹，即可反映跟踪对象在平面上的运动。在一种实施方式中，跟踪对象的历史位置信息可以包括跟踪对象的二维位置信息和深度信息，从而，根据跟踪对象的二维位置信息以及深度信息，可以预测得到跟踪对象的三维运动轨迹，能够预测跟踪对象在三维空间中的动向，使基于该三维运动轨迹预测出的跟踪对象所在的区域更准确。In one embodiment, the historical position information of the tracked object may include two-dimensional position information of the tracked object in the image, and the two-dimensional position information may be, for example, two-dimensional coordinate information. The motion track can be a two-dimensional motion track, which can reflect the motion of the tracked object on the plane. In one embodiment, the historical position information of the tracked object may include two-dimensional position information and depth information of the tracked object, so that, according to the two-dimensional position information and depth information of the tracked object, the three-dimensional motion trajectory of the tracked object can be predicted, The movement of the tracking object in the three-dimensional space can be predicted, so that the region where the tracking object is predicted based on the three-dimensional motion trajectory is more accurate.

由于可移动平台配备了至少两个摄像头，因此，可以利用该至少两个摄像头之间的视差可以获取视场范围内的深度信息。具体的，在获取到所述至少两个摄像头各自拍摄的第一图像后，可以利用这些第一图像获取视差图。视差图的获取方式有多种，在一种实施方式中，可利用所述至少两个摄像头拍摄的第一图像进行立体匹配得到视差图，在一种实施方式中，还可以以所述至少两个摄像头拍摄的第一图像为输入，利用机器学习算法计算得到视差图。Since the movable platform is equipped with at least two cameras, the parallax between the at least two cameras can be used to obtain depth information within the field of view. Specifically, after acquiring the first images captured by the at least two cameras, the disparity map may be acquired by using the first images. There are many ways to obtain the disparity map. In one embodiment, the disparity map can be obtained by stereo matching using the first images captured by the at least two cameras. The first image captured by each camera is used as the input, and the disparity map is calculated by using the machine learning algorithm.

在一种实施方式中，获取的视差图可以是所述至少两个摄像头中任一摄像头的视场所对应的视差图。比如，若可移动平台包括六个摄像头，利用该六个摄像头中任意两个视场范围有重叠的摄像头拍摄的第一图像，可以计算出该两个摄像头中任一摄像头的视场所对应的视差图。在一种实施方式中，获取的视差图可以是各个摄像头的视场总和所对应的视差图。比如，若可移动平台包括六个摄像头，可以利用该六个摄像头中各自拍摄的第一图像，计算各个摄像头的视场对应的视差图，并可以对这些视差图进行拼接，得到视场相当于各个摄像头的视场总和的大视场视差图，该大视场视差图与所述的第二图像在视场上对应。可见，从任一第一图像对应的视差图中，从第二图像对应的视差图中，或者从目标图像对应的视差图中，均可以获取到跟踪对象对应的深度信息。In one embodiment, the acquired disparity map may be a disparity map corresponding to a field of view of any one of the at least two cameras. For example, if the movable platform includes six cameras, the parallax corresponding to the field of view of any one of the two cameras can be calculated by using the first image captured by any two cameras with overlapping fields of view among the six cameras picture. In one embodiment, the acquired disparity map may be a disparity map corresponding to the sum of the fields of view of each camera. For example, if the movable platform includes six cameras, the first images captured by the six cameras can be used to calculate the disparity map corresponding to the field of view of each camera, and these disparity maps can be stitched together to obtain a field of view equivalent to A large-field disparity map of the sum of the fields of view of each camera, where the large-field disparity map corresponds to the second image on the field of view. It can be seen that, from the disparity map corresponding to any first image, from the disparity map corresponding to the second image, or from the disparity map corresponding to the target image, the depth information corresponding to the tracking object can be obtained.

由于视差图中包含了各个像素对应的深度信息，而跟踪对象通常仅对应视差图中的部分像素，因此，在获取到视差图后，可以根据跟踪对象在图像中的二维位置信息，将视差图中该二维位置信息处的像素对应的深度信息作为该跟踪对象的深度信息。Since the disparity map contains the depth information corresponding to each pixel, the tracked object usually only corresponds to some pixels in the disparity map. Therefore, after the disparity map is obtained, the disparity map can be calculated according to the two-dimensional position information of the tracked object in the image. The depth information corresponding to the pixel at the two-dimensional position information in the figure is used as the depth information of the tracking object.

在一种实施方式中，跟踪对象的数量可以有多个，即可以同时对多个跟踪对象进行视觉跟踪。在可移动平台各个摄像头的视场范围总和可以覆盖全方位时，由于不存在跟踪对象运动至视场范围以外的情况，因此视觉跟踪算法可以对同一跟踪对象进行持续性的跟踪，如此，即便有多个跟踪对象，各个跟踪对象之间也不会混淆。In one embodiment, the number of tracking objects may be multiple, that is, visual tracking may be performed on multiple tracking objects at the same time. When the total field of view of each camera of the movable platform can cover all directions, since the tracking object does not move beyond the field of view, the visual tracking algorithm can continuously track the same tracking object. Multiple tracking objects, and there will be no confusion between tracking objects.

但即便是全方位的视场覆盖，在一些情况中，跟踪对象仍然可能丢失，比如跟踪对象可能因为被遮挡而丢失。在一个跟踪对象发生跟踪丢失时，在一种实施方式中，可以对丢失发生的区域进行目标检测，当检测到该丢失发生的区域出现目标时，可以将该出现的目标确定为之前丢失的跟踪对象。例如，可以参考图4，如图4所示的场景中，跟踪对象A从左向右跑步经过一颗大树时，由于跟踪对象A被大树所遮挡，因此跟踪对象A发生了跟踪丢失。此时，可以对跟踪对象A丢失的区域进行目标检测，当检测到有目标出现时，可以将该出现的目标作为跟踪对象A继续进行跟踪。But even with full field of view coverage, in some cases, the tracked object may still be lost, for example, the tracked object may be lost due to occlusion. When the tracking of a tracking object is lost, in one embodiment, target detection may be performed on the area where the loss occurred, and when a target is detected in the area where the loss occurred, the present target may be determined as the previously lost tracking. object. For example, you can refer to FIG. 4. In the scene shown in FIG. 4, when the tracking object A runs from left to right through a big tree, since the tracking object A is blocked by the big tree, the tracking object A loses the tracking. At this time, target detection can be performed on the area where the tracking object A is lost, and when it is detected that a target appears, the appearing target can be used as the tracking object A to continue tracking.

上述实施方式在跟踪对象丢失时，是对丢失发生的区域进行目标检测的，而在一种实施方式中，进行目标检测的区域也可以是根据跟踪对象的运动轨迹预测出的其可能重新出现的区域。可以继续使用上述跟踪对象A的例子，由于跟踪对象A在丢失之前是从左向右运动的，因此，当跟踪对象A在大树处丢失时，可以预测跟踪对象A重新出现的区域为大树的右侧区域。When the tracking object is lost in the above-mentioned embodiment, target detection is performed on the area where the loss occurs, and in one embodiment, the target detection area may also be predicted according to the motion trajectory of the tracking object, which may reappear. area. The above example of tracking object A can continue to be used. Since the tracking object A moves from left to right before it is lost, when the tracking object A is lost at the big tree, it can be predicted that the area where the tracking object A reappears is the big tree. area on the right.

而在两个或两个以上的跟踪对象在同一区域丢失时，仍然可以对该同一区域进行目标检测，但当检测到该同一区域出现目标时，对该所出现的目标，则需要确定其是所丢失的跟踪对象中的哪一个。在一种实施方式中，可以结合深度信息以准确的区分所丢失的跟踪对象。举个例子，可以参考图5，在如图5所示的场景中，跟踪对象A与B均从左向右跑步经过一棵大树，均受到大树的遮挡而出现跟踪丢失，当检测到跟踪丢失的区域或预测的可能重新出现的区域出现目标时，可以获取该出现的目标的深度信息(深度信息的获取方式可以参考前文中的相关说明)。若根据跟踪对象A和B在丢失之前的历史深度信息可以确定，跟踪对象B在丢失之前与摄像头的距离较远，而跟踪对象A在丢失之前与摄像头的距离较近，则可以将该出现的目标的深度信息分别与丢失的跟踪对象A和B的历史深度信息进行匹配，若该出现的目标的深度信息与跟踪对象A的历史深度信息的差距在预设范围内，则可以确定该出现的目标为丢失的跟踪对象A，若该出现的目标的深度信息与跟踪对象B的历史深度信息的差距在预设范围内，则可以确定该出现的目标为丢失的跟踪对象B。When two or more tracking objects are lost in the same area, target detection can still be performed in the same area, but when a target is detected in the same area, it is necessary to determine whether the target appears Which of the lost tracked objects. In one embodiment, depth information can be combined to accurately distinguish lost tracked objects. For example, please refer to Figure 5. In the scene shown in Figure 5, the tracking objects A and B both run through a big tree from left to right, and they are both blocked by the big tree and lose tracking. When a target appears in a tracked lost area or a predicted reappearing area, the depth information of the appearing target can be acquired (for the acquisition method of the depth information, please refer to the relevant description in the previous section). If it can be determined according to the historical depth information of the tracked objects A and B before they are lost, that the distance of the tracked object B to the camera before the loss is farther, and the distance of the tracked object A to the camera before the loss of The depth information of the target is respectively matched with the historical depth information of the lost tracking objects A and B. If the gap between the depth information of the appearing target and the historical depth information of the tracking object A is within a preset range, it can be determined that the appearing target is within a preset range. The target is the lost tracking object A. If the difference between the depth information of the appearing target and the historical depth information of the tracking object B is within a preset range, it can be determined that the appearing target is the lost tracking object B.

在基于目标图像获得跟踪对象对应的视觉跟踪结果后，可以利用该视觉跟踪结果进行多种应用。如前所述，视觉跟踪结果可以包含跟踪对象的位置信息，这里，位置信息可以包括跟踪对象在图像中的二维位置信息，还可以包括跟踪对象的深度信息。其中，跟踪对象的深度信息可以根据跟踪对象的二维位置信息在视差图中获取到，具体的获取方式在前文已有说明。After the visual tracking result corresponding to the tracking object is obtained based on the target image, the visual tracking result can be used for various applications. As mentioned above, the visual tracking result may include position information of the tracked object, where the position information may include two-dimensional position information of the tracked object in the image, and may also include depth information of the tracked object. The depth information of the tracked object can be obtained in the disparity map according to the two-dimensional position information of the tracked object, and the specific obtaining method has been described above.

在一种实施方式中，可以根据视觉跟踪结果中包含的跟踪对象的二维位置信息，对跟踪对象进行标记。具体标记的方式有多种，比如可以在摄像头拍摄的画面中(该画面可以通过图传技术显示在与可移动平台通信的移动终端或遥控器上)通过矩形框标记出跟踪对象。In one embodiment, the tracking object may be marked according to the two-dimensional position information of the tracking object included in the visual tracking result. There are various ways of marking, for example, the tracking object can be marked with a rectangular frame in the picture captured by the camera (the picture can be displayed on the mobile terminal or remote controller that communicates with the movable platform through the image transmission technology).

在一种实施方式中，还可以根据跟踪对象的位置信息确定可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式，其中，可移动平台的移动轨迹可以用于控制可移动平台按照移动轨迹进行运动，至少一个摄像头的姿态变化方式可以用于控制该摄像头运动到目标姿态。In one embodiment, the movement trajectory of the movable platform and/or the attitude change mode of at least one camera can also be determined according to the position information of the tracked object, wherein the movement trajectory of the movable platform can be used to control the movable platform to follow the movement The trajectory moves, and the attitude change mode of at least one camera can be used to control the camera to move to the target attitude.

在一个例子中，可移动平台可以是无人机，在无人机的一种应用中，无人机可以对跟踪对象进行轨迹跟踪，即无人机可以在真实空间中对跟踪对象进行飞行跟随。因此，在一种实施方式中，对跟踪对象的视觉跟踪结果可以用于指导无人机进行所述飞行跟随。具体的，可以根据跟踪对象的位置信息对无人机的飞行轨迹进行规划，从而可以根据该飞行轨迹控制无人机飞行，实现无人机对跟踪对象的飞行跟随。In one example, the movable platform can be a drone. In one application of the drone, the drone can track the tracked object, that is, the drone can follow the tracked object in real space. . Therefore, in one embodiment, the visual tracking result of the tracked object can be used to guide the drone to perform the flight following. Specifically, the flight trajectory of the UAV can be planned according to the position information of the tracking object, so that the flight of the UAV can be controlled according to the flight trajectory, so as to realize the flight following of the tracking object by the UAV.

在一个例子中，可移动平台可以配备有云台，云台可以搭载主摄像头。在一种应用中，可以使用主摄像头对跟踪对象进行跟随拍摄，即主摄像头可以在云台的驱动下调整拍摄角度，以保证所拍摄画面中跟踪对象为画面的主体。因此，对跟踪对象的视觉跟踪结果可以用于指导主摄像头进行所述跟随拍摄。具体的，可以根据跟踪对象的位置信息确定主摄像头的姿态变化方式，从而可以根据该姿态变化方式对云台进行控制，使主摄像头可以在云台的带动下运动到目标姿态，以对跟踪对象进行跟随拍摄。In one example, the movable platform can be equipped with a gimbal, which can carry the main camera. In one application, the main camera can be used to follow and shoot the tracking object, that is, the main camera can adjust the shooting angle under the driving of the gimbal to ensure that the tracking object in the captured picture is the main body of the picture. Therefore, the visual tracking result of the tracked object can be used to guide the main camera to perform the follow-up shooting. Specifically, the attitude change mode of the main camera can be determined according to the position information of the tracking object, so that the gimbal can be controlled according to the attitude change mode, so that the main camera can be moved to the target attitude under the driving of the gimbal, so as to monitor the tracking object. Take a follow shot.

对于跟踪对象的位置信息所包括的深度信息，在一种实施方式中，可以根据跟踪对象的深度信息变化确定可移动平台的运动速度，比如在跟踪对象的深度(距离)快速增大时，可以相应的提高可移动平台的运动速度，以使可移动平台能够紧密的跟随跟踪对象；而在跟踪对象的深度(距离)减小时，可以相应的降低可移动平台的运动速度，以使可移动平台与跟踪对象保持合适的距离。在一种实施方式中，深度信息也可以用于主摄像头根据跟踪对象的深度信息调整焦距，以使跟踪对象可以在画面中有更清晰的成像。For the depth information included in the position information of the tracked object, in one embodiment, the movement speed of the movable platform can be determined according to the change of the depth information of the tracked object. For example, when the depth (distance) of the tracked object increases rapidly, it can be Correspondingly increase the movement speed of the movable platform, so that the movable platform can closely follow the tracking object; and when the depth (distance) of the tracking object decreases, the movement speed of the movable platform can be correspondingly reduced, so that the movable platform can closely follow the tracking object. Keep a suitable distance from the tracked object. In one embodiment, the depth information can also be used for the main camera to adjust the focus according to the depth information of the tracked object, so that the tracked object can have a clearer image in the picture.

本申请实施例提供的视觉跟踪方法，可以利用至少两个摄像头拍摄的第一图像进行拼接，得到大视场的第二图像，使跟踪对象只要仍位于所述至少两个摄像头的总视场范围之内，就可以在大视场的第二图像中找到跟踪对象，降低了跟踪对象丢失的风险。并且，考虑到第二图像的尺寸较大，为节省算力，还可以基于预测的跟踪对象所在的区域对第二图像进行裁剪，从而可以基于尺寸较小的目标图像对跟踪对象进行视觉跟踪，达到节省算力的效果。此外，还可以结合跟踪对象的深度信息预测跟踪对象的三维运动轨迹，从而更准确的预测跟踪对象所在的区域，还可以根据深度信息在多个跟踪对象丢失时对丢失的跟踪对象进行区分，提高了视觉跟踪的准确性和鲁棒性。In the visual tracking method provided by the embodiment of the present application, first images captured by at least two cameras can be used for splicing to obtain a second image with a large field of view, so that the tracking object is still located within the total field of view of the at least two cameras , the tracked object can be found in the second image with a large field of view, reducing the risk of losing the tracked object. Moreover, considering that the size of the second image is relatively large, in order to save computing power, the second image can also be cropped based on the region where the predicted tracking object is located, so that the tracking object can be visually tracked based on the target image with a smaller size, To achieve the effect of saving computing power. In addition, the three-dimensional motion trajectory of the tracked object can be predicted in combination with the depth information of the tracked object, so as to more accurately predict the area where the tracked object is located. accuracy and robustness of visual tracking.

下面可以参见图6，图6是本申请实施例提供的视觉跟踪装置的结构示意图。该视觉跟踪装置可以包括：处理器610与存储有计算机指令的存储器620；Referring to FIG. 6 below, FIG. 6 is a schematic structural diagram of a visual tracking device provided by an embodiment of the present application. The visual tracking device may include: a processor 610 and a memory 620 storing computer instructions;

可选的，所述处理器在预测跟踪对象所在的区域时用于，根据所述跟踪对象的历史位置信息，预测所述跟踪对象的运动轨迹；基于所述运动轨迹，预测所述跟踪对象所在的区域。Optionally, when predicting the area where the tracking object is located, the processor is used to predict the movement track of the tracking object according to the historical position information of the tracking object; based on the movement track, predict where the tracking object is located. Area.

可选的，所述跟踪对象的历史位置信息包括二维位置信息与深度信息，预测出的所述运动轨迹是三维运动轨迹。Optionally, the historical position information of the tracking object includes two-dimensional position information and depth information, and the predicted motion trajectory is a three-dimensional motion trajectory.

可选的，所述处理器在确定跟踪对象的深度信息时用于，通过所述至少两个摄像头拍摄的第一图像获取视差图；根据所述视差图确定所述跟踪对象对应的深度信息。Optionally, when determining the depth information of the tracked object, the processor is configured to obtain a disparity map through the first images captured by the at least two cameras; and determine depth information corresponding to the tracked object according to the disparity map.

可选的，所述处理器在根据所述视差图确定所述跟踪对象对应的深度信息时用于，根据所述跟踪对象的二维位置信息，从所述视差图中确定所述跟踪对象的深度信息。Optionally, when determining the depth information corresponding to the tracking object according to the disparity map, the processor is configured to, according to the two-dimensional position information of the tracking object, determine the depth information of the tracking object from the disparity map. in-depth information.

可选的，所述视差图是利用所述至少两个摄像头拍摄的第一图像进行立体匹配得到的。Optionally, the disparity map is obtained by performing stereo matching on the first images captured by the at least two cameras.

可选的，所述视差图是根据机器学习算法，利用所述至少两个摄像头拍摄的第一图像进行计算得到的。Optionally, the disparity map is calculated by using a first image captured by the at least two cameras according to a machine learning algorithm.

可选的，所述深度信息包含在任一所述第一图像、所述第二图像或所述目标图像对应的视差图中。Optionally, the depth information is included in a disparity map corresponding to any one of the first image, the second image or the target image.

可选的，所述跟踪对象包括多个。Optionally, the tracking objects include multiple.

可选的，所述处理器还用于，当至少两个跟踪对象在同一区域丢失时，对所述同一区域进行目标检测；当检测到所述同一区域出现目标时，获取所述目标的深度信息；将所述目标的深度信息与丢失的跟踪对象的历史深度信息进行匹配，根据匹配结果确定所述目标对应的丢失的跟踪对象。Optionally, the processor is further configured to, when at least two tracking objects are lost in the same area, perform target detection on the same area; when detecting that a target appears in the same area, obtain the depth of the target. information; match the depth information of the target with the historical depth information of the lost tracking object, and determine the lost tracking object corresponding to the target according to the matching result.

可选的，所述跟踪对象对应的视觉跟踪结果中包括所述跟踪对象的位置信息。Optionally, the visual tracking result corresponding to the tracking object includes position information of the tracking object.

可选的，所述跟踪对象的位置信息包括二维位置信息，所述处理器还用于，根据所述二维位置信息对所述跟踪对象进行标记。Optionally, the position information of the tracking object includes two-dimensional position information, and the processor is further configured to mark the tracking object according to the two-dimensional position information.

可选的，所述处理器还用于，根据所述跟踪对象的位置信息确定可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式。Optionally, the processor is further configured to determine the movement trajectory of the movable platform and/or the attitude change mode of at least one camera according to the position information of the tracking object.

可选的，所述处理器还用于，根据所述可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式，控制所述可移动平台和/或所述至少一个摄像头运动，以对所述跟踪对象进行视觉跟踪。Optionally, the processor is further configured to control the movement of the movable platform and/or the at least one camera according to the movement trajectory of the movable platform and/or the attitude change mode of the at least one camera, so as to control the movement of the movable platform and/or the at least one camera. The tracking object is visually tracked.

可选的，所述第二图像是覆盖全方位视场的全景图像。Optionally, the second image is a panoramic image covering an omnidirectional field of view.

可选的，所述第二图像为曲面图像，所述处理器在对获取的所述第一图像进行拼接时用于，将各个所述第一图像反投影至指定的曲面模型，对所述第一图像之间的重叠区域进行融合。Optionally, the second image is a curved surface image, and the processor is configured to back-project each of the first images to a specified curved surface model when stitching the acquired first images, and perform the splicing of the acquired first images. The overlapping regions between the first images are fused.

可选的，所述指定的曲面模型包括圆柱体模型或球体模型。Optionally, the specified surface model includes a cylinder model or a sphere model.

可选的，所述处理器在从所述第二图像中裁剪出所述区域对应的目标图像时用于，将所述第二图像中所述区域对应的图像投影至平面，得到所述目标图像。Optionally, the processor is configured to, when cropping out the target image corresponding to the area from the second image, project the image corresponding to the area in the second image to a plane to obtain the target image.

可选的，所述至少两个摄像头之间的视场具有重叠区域。Optionally, the fields of view between the at least two cameras have an overlapping area.

本申请实施例提供的视觉跟踪装置，可以利用至少两个摄像头拍摄的第一图像进行拼接，从而可以得到大视场的第二图像。该第二图像所对应的视场等于所述至少两个摄像头的视场总和，因此，即便跟踪对象运动到其中一个摄像头的视场范围之外，但只要该跟踪对象仍位于所述至少两个摄像头的总视场范围之内，则在大视场的第二图像中仍然可以找到跟踪对象，确定出跟踪对象的位置，减少了跟踪对象丢失的发生。并且，考虑到第二图像的尺寸较大，若对第二图像整体进行视觉跟踪需要耗费较多的算力，因此，可以基于预测的跟踪对象所在的区域对第二图像进行裁剪，从而可以基于尺寸较小的目标图像对跟踪对象进行视觉跟踪，达到节省算力的效果。The visual tracking device provided by the embodiment of the present application can use the first images captured by at least two cameras to perform stitching, so that a second image with a large field of view can be obtained. The field of view corresponding to the second image is equal to the sum of the fields of view of the at least two cameras. Therefore, even if the tracking object moves out of the field of view of one of the cameras, as long as the tracking object is still located in the at least two cameras Within the total field of view of the camera, the tracking object can still be found in the second image with a large field of view, the position of the tracking object is determined, and the occurrence of the loss of the tracking object is reduced. In addition, considering that the size of the second image is relatively large, it will take more computing power to visually track the entire second image. Therefore, the second image can be cropped based on the region where the predicted tracking object is located, so that the The target image with smaller size performs visual tracking on the tracking object to achieve the effect of saving computing power.

本申请实施例还提供了一种可移动平台，可以参见图7，图7是本申请实施例提供的一种无人机的结构示意图。该可移动平台可以包括：An embodiment of the present application also provides a movable platform, and FIG. 7 is a schematic structural diagram of an unmanned aerial vehicle provided by an embodiment of the present application. The removable platform may include:

机体710； body 710;

与所述机体连接的驱动装置720，用于为所述可移动平台提供动力；a driving device 720 connected with the body, for providing power for the movable platform;

设置在所述机体上的多个摄像头730，所述多个摄像头对应不同的视场，所述视场之间具有重叠区域；a plurality of cameras 730 arranged on the body, the plurality of cameras correspond to different fields of view, and there are overlapping areas between the fields of view;

处理器740与存储有计算机指令的存储器750，其中，所述处理器在执行所述计算机指令时实现以下步骤：A processor 740 and a memory 750 storing computer instructions, wherein the processor implements the following steps when executing the computer instructions:

可选的，所述处理器还用于，根据所述跟踪对象的位置信息确定所述可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式。Optionally, the processor is further configured to determine the movement trajectory of the movable platform and/or the attitude change mode of at least one camera according to the position information of the tracking object.

本申请实施例提供的可移动平台，可以利用至少两个摄像头拍摄的第一图像进行拼接，从而可以得到大视场的第二图像。该第二图像所对应的视场等于所述至少两个摄像头的视场总和，因此，即便跟踪对象运动到其中一个摄像头的视场范围之外，但只要该跟踪对象仍位于所述至少两个摄像头的总视场范围之内，则在大视场的第二图像中仍然可以找到跟踪对象，确定出跟踪对象的位置，减少了跟踪对象丢失的发生。并且，考虑到第二图像的尺寸较大，若对第二图像整体进行视觉跟踪需要耗费较多的算力，因此，可以基于预测的跟踪对象所在的区域对第二图像进行裁剪，从而可以基于尺寸较小的目标图像对跟踪对象进行视觉跟踪，达到节省算力的效果。The movable platform provided by the embodiment of the present application can use the first images captured by at least two cameras for splicing, so that a second image with a large field of view can be obtained. The field of view corresponding to the second image is equal to the sum of the fields of view of the at least two cameras. Therefore, even if the tracking object moves out of the field of view of one of the cameras, as long as the tracking object is still located in the at least two cameras Within the total field of view of the camera, the tracking object can still be found in the second image with a large field of view, the position of the tracking object is determined, and the occurrence of the loss of the tracking object is reduced. In addition, considering that the size of the second image is relatively large, it will take a lot of computing power to visually track the entire second image. Therefore, the second image can be cropped based on the region where the predicted tracking object is located, so that it can be based on The target image with smaller size performs visual tracking on the tracking object to achieve the effect of saving computing power.

本申请实施例还提供了一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，所述计算机指令被处理器执行时实现本申请实施例提供的任一种视觉跟踪方法。Embodiments of the present application further provide a computer-readable storage medium, where computer instructions are stored in the computer-readable storage medium, and when the computer instructions are executed by a processor, any one of the visual tracking methods provided by the embodiments of the present application is implemented.

需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. The terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also other not expressly listed elements, or also include elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

以上针对每个保护主题均提供了多种实施方式，在不存在冲突或矛盾的基础上，本领域技术人员可以根据实际情况自由对各种实施方式进行组合，由此构成各种不同的技术方案。而本申请文件限于篇幅，未能对所有组合而得的技术方案展开说明，但可以理解的是，这些未能展开的技术方案也属于本申请实施例公开的范围。Various implementations are provided above for each protection subject. On the basis of no conflict or contradiction, those skilled in the art can freely combine various implementations according to the actual situation, thereby forming various technical solutions. . However, this application document is limited in space and cannot describe all the technical solutions obtained by combination, but it can be understood that these technical solutions that cannot be developed also belong to the scope disclosed in the embodiments of this application.

本申请实施例可采用在一个或多个其中包含有程序代码的存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机可用存储介质包括永久性和非永久性、可移动和非可移动媒体，可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括但不限于：相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。Embodiments of the present application may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

以上对本申请实施例所提供的方法和装置进行了详细介绍，本文中应用了具体个例对本申请的原理及实施方式进行了阐述，以上实施例的说明只是用于帮助理解本申请的方法及其核心思想；同时，对于本领域的一般技术人员，依据本申请的思想，在具体实施方式及应用范围上均会有改变之处，综上所述，本说明书内容不应理解为对本申请的限制。The methods and devices provided by the embodiments of the present application have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present application. At the same time, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application scope. In summary, the content of this specification should not be construed as a limitation to the application. .

Claims

一种视觉跟踪方法，其特征在于，包括：A visual tracking method, comprising:

获取可移动平台上至少两个摄像头拍摄的第一图像；acquiring a first image captured by at least two cameras on the movable platform;

对所述至少两个摄像头获取的所述第一图像进行拼接，得到第二图像；Stitching the first images obtained by the at least two cameras to obtain a second image;

预测跟踪对象所在的区域，并从所述第二图像中裁剪出所述区域对应的目标图像；predicting the area where the tracking object is located, and cropping out the target image corresponding to the area from the second image;

基于所述目标图像对所述跟踪对象进行视觉跟踪。The tracking object is visually tracked based on the target image.
根据权利要求1所述的方法，其特征在于，所述预测跟踪对象所在的区域，包括：The method according to claim 1, wherein the predicting the area where the tracking object is located comprises:

根据所述跟踪对象的历史位置信息，预测所述跟踪对象的运动轨迹；According to the historical position information of the tracking object, predict the motion trajectory of the tracking object;

基于所述运动轨迹，预测所述跟踪对象所在的区域。Based on the motion trajectory, a region where the tracked object is located is predicted.
根据权利要求2所述的方法，其特征在于，所述跟踪对象的历史位置信息包括二维位置信息与深度信息，预测出的所述运动轨迹是三维运动轨迹。The method according to claim 2, wherein the historical position information of the tracked object includes two-dimensional position information and depth information, and the predicted motion trajectory is a three-dimensional motion trajectory.
根据权利要求3所述的方法，其特征在于，所述跟踪对象的深度信息通过以下方式得到：The method according to claim 3, wherein the depth information of the tracked object is obtained in the following manner:

通过所述至少两个摄像头拍摄的第一图像获取视差图；Obtain a disparity map from the first image captured by the at least two cameras;

根据所述视差图确定所述跟踪对象对应的深度信息。Depth information corresponding to the tracking object is determined according to the disparity map.
根据权利要求4所述的方法，其特征在于，所述根据所述视差图确定所述跟踪对象对应的深度信息，包括：The method according to claim 4, wherein the determining the depth information corresponding to the tracking object according to the disparity map comprises:

根据所述跟踪对象的二维位置信息，从所述视差图中确定所述跟踪对象的深度信息。According to the two-dimensional position information of the tracking object, the depth information of the tracking object is determined from the disparity map.
根据权利要求4所述的方法，其特征在于，所述视差图是利用所述至少两个摄像头拍摄的第一图像进行立体匹配得到的。The method according to claim 4, wherein the disparity map is obtained by performing stereo matching on the first images captured by the at least two cameras.
根据权利要求4所述的方法，其特征在于，所述视差图是根据机器学习算法，利用所述至少两个摄像头拍摄的第一图像进行计算得到的。The method according to claim 4, wherein the disparity map is calculated by using a first image captured by the at least two cameras according to a machine learning algorithm.
根据权利要求4所述的方法，其特征在于，所述深度信息包含在任一所述第一图像、所述第二图像或所述目标图像对应的视差图中。The method according to claim 4, wherein the depth information is included in a disparity map corresponding to any one of the first image, the second image or the target image.
根据权利要求3所述的方法，其特征在于，所述跟踪对象包括多个。The method according to claim 3, wherein the tracking objects include a plurality of objects.
根据权利要求9所述的方法，其特征在于，所述方法还包括：The method according to claim 9, wherein the method further comprises:

当至少两个跟踪对象在同一区域丢失时，对所述同一区域进行目标检测；When at least two tracking objects are lost in the same area, perform target detection on the same area;

当检测到所述同一区域出现目标时，获取所述目标的深度信息；When detecting that a target appears in the same area, obtain the depth information of the target;

将所述目标的深度信息与丢失的跟踪对象的历史深度信息进行匹配，根据匹配结果确定所述目标对应的丢失的跟踪对象。The depth information of the target is matched with the historical depth information of the lost tracking object, and the lost tracking object corresponding to the target is determined according to the matching result.
根据权利要求1所述的方法，其特征在于，所述跟踪对象对应的视觉跟踪结果中包括所述跟踪对象的位置信息。The method according to claim 1, wherein the visual tracking result corresponding to the tracking object includes position information of the tracking object.
根据权利要求11所述的方法，其特征在于，所述跟踪对象的位置信息包括二维位置信息，所述方法还包括：The method according to claim 11, wherein the position information of the tracking object comprises two-dimensional position information, and the method further comprises:

根据所述二维位置信息对所述跟踪对象进行标记。The tracking object is marked according to the two-dimensional position information.
根据权利要求11所述的方法，其特征在于，所述方法还包括：The method according to claim 11, wherein the method further comprises:

根据所述跟踪对象的位置信息确定可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式。The movement trajectory of the movable platform and/or the attitude change mode of at least one camera is determined according to the position information of the tracked object.
根据权利要求13所述的方法，其特征在于，所述方法还包括：The method of claim 13, wherein the method further comprises:

根据所述可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式，控制所述可移动平台和/或所述至少一个摄像头运动，以对所述跟踪对象进行视觉跟踪。Control the movement of the movable platform and/or the at least one camera according to the movement trajectory of the movable platform and/or the attitude change manner of the at least one camera, so as to visually track the tracking object.
根据权利要求1所述的方法，其特征在于，所述第二图像是覆盖全方位视场的全景图像。The method of claim 1, wherein the second image is a panoramic image covering an omnidirectional field of view.
根据权利要求1所述的方法，其特征在于，所述第二图像为曲面图像，所述对获取的所述第一图像进行拼接，包括：The method according to claim 1, wherein the second image is a curved surface image, and the stitching of the acquired first image comprises:

将各个所述第一图像反投影至指定的曲面模型，对所述第一图像之间的重叠区域进行融合。Each of the first images is back-projected to a specified surface model, and the overlapping areas between the first images are fused.
根据权利要求16所述的方法，其特征在于，所述指定的曲面模型包括圆柱体模型或球体模型。The method of claim 16, wherein the specified surface model comprises a cylinder model or a sphere model.
根据权利要求16所述的方法，其特征在于，所述从所述第二图像中裁剪出所述区域对应的目标图像，包括：The method according to claim 16, wherein the cropping out the target image corresponding to the region from the second image comprises:

将所述第二图像中所述区域对应的图像投影至平面，得到所述目标图像。Projecting the image corresponding to the area in the second image to a plane to obtain the target image.
根据权利要求1所述的方法，其特征在于，所述至少两个摄像头之间的视场具有重叠区域。The method of claim 1, wherein the fields of view between the at least two cameras have an overlapping area.
一种视觉跟踪装置，其特征在于，包括：处理器与存储有计算机指令的存储器；A visual tracking device, comprising: a processor and a memory storing computer instructions;

所述处理器执行所述计算机指令时实现以下步骤：The processor implements the following steps when executing the computer instructions:

获取可移动平台上至少两个摄像头拍摄的第一图像；acquiring a first image captured by at least two cameras on the movable platform;

对所述至少两个摄像头获取的所述第一图像进行拼接，得到第二图像；Stitching the first images obtained by the at least two cameras to obtain a second image;

预测跟踪对象所在的区域，并从所述第二图像中裁剪出所述区域对应的目标图像；predicting the area where the tracking object is located, and cropping out the target image corresponding to the area from the second image;

基于所述目标图像对所述跟踪对象进行视觉跟踪。The tracking object is visually tracked based on the target image.
根据权利要求20所述的装置，其特征在于，所述处理器在预测跟踪对象所在的区域时用于，根据所述跟踪对象的历史位置信息，预测所述跟踪对象的运动轨迹；基于所述运动轨迹，预测所述跟踪对象所在的区域。The device according to claim 20, wherein when predicting the area where the tracking object is located, the processor is configured to predict the movement trajectory of the tracking object according to historical position information of the tracking object; Motion trajectory, predicting the area where the tracked object is located.
根据权利要求21所述的装置，其特征在于，所述跟踪对象的历史位置信息包括二维位置信息与深度信息，预测出的所述运动轨迹是三维运动轨迹。The device according to claim 21, wherein the historical position information of the tracked object includes two-dimensional position information and depth information, and the predicted motion trajectory is a three-dimensional motion trajectory.
根据权利要求22所述的装置，其特征在于，所述处理器在确定跟踪对象的深度信息时用于，通过所述至少两个摄像头拍摄的第一图像获取视差图；根据所述视差图确定所述跟踪对象对应的深度信息。The apparatus according to claim 22, wherein when the processor determines the depth information of the tracked object, the processor is configured to obtain a disparity map through the first images captured by the at least two cameras; and determine the disparity map according to the disparity map. depth information corresponding to the tracking object.
根据权利要求23所述的装置，其特征在于，所述处理器在根据所述视差图确定所述跟踪对象对应的深度信息时用于，根据所述跟踪对象的二维位置信息，从所述视差图中确定所述跟踪对象的深度信息。The apparatus according to claim 23, wherein when the processor determines the depth information corresponding to the tracking object according to the disparity map, according to the two-dimensional position information of the tracking object, from the tracking object Depth information of the tracked object is determined in the disparity map.
根据权利要求23所述的装置，其特征在于，所述视差图是利用所述至少两个摄像头拍摄的第一图像进行立体匹配得到的。The apparatus according to claim 23, wherein the disparity map is obtained by performing stereo matching on the first images captured by the at least two cameras.
根据权利要求23所述的装置，其特征在于，所述视差图是根据机器学习算法，利用所述至少两个摄像头拍摄的第一图像进行计算得到的。The apparatus according to claim 23, wherein the disparity map is calculated by using a first image captured by the at least two cameras according to a machine learning algorithm.
根据权利要求23所述的装置，其特征在于，所述深度信息包含在任一所述第一图像、所述第二图像或所述目标图像对应的视差图中。The device according to claim 23, wherein the depth information is included in a disparity map corresponding to any one of the first image, the second image or the target image.
根据权利要求22所述的装置，其特征在于，所述跟踪对象包括多个。The apparatus according to claim 22, wherein the tracking objects include a plurality of objects.
根据权利要求28所述的装置，其特征在于，所述处理器还用于，当至少两个跟踪对象在同一区域丢失时，对所述同一区域进行目标检测；当检测到所述同一区域出现目标时，获取所述目标的深度信息；将所述目标的深度信息与丢失的跟踪对象的历史深度信息进行匹配，根据匹配结果确定所述目标对应的丢失的跟踪对象。The apparatus according to claim 28, wherein the processor is further configured to perform target detection on the same area when at least two tracking objects are lost in the same area; when it is detected that the same area appears When the target is detected, the depth information of the target is obtained; the depth information of the target is matched with the historical depth information of the lost tracking object, and the lost tracking object corresponding to the target is determined according to the matching result.
根据权利要求20所述的装置，其特征在于，所述跟踪对象对应的视觉跟踪结果中包括所述跟踪对象的位置信息。The apparatus according to claim 20, wherein the visual tracking result corresponding to the tracking object includes position information of the tracking object.
根据权利要求30所述的装置，其特征在于，所述跟踪对象的位置信息包括二维位置信息，所述处理器还用于，根据所述二维位置信息对所述跟踪对象进行标记。The apparatus according to claim 30, wherein the position information of the tracking object includes two-dimensional position information, and the processor is further configured to mark the tracking object according to the two-dimensional position information.
根据权利要求30所述的装置，其特征在于，所述处理器还用于，根据所述跟踪对象的位置信息确定可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式。The apparatus according to claim 30, wherein the processor is further configured to determine the movement trajectory of the movable platform and/or the attitude change mode of the at least one camera according to the position information of the tracking object.
根据权利要求32所述的装置，其特征在于，所述处理器还用于，根据所述可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式，控制所述可移动平台和/或所述至少一个摄像头运动，以对所述跟踪对象进行视觉跟踪。The apparatus according to claim 32, wherein the processor is further configured to control the movable platform and/or the movable platform according to the movement trajectory of the movable platform and/or the attitude change mode of at least one camera The at least one camera moves to visually track the tracked object.
根据权利要求20所述的装置，其特征在于，所述第二图像是覆盖全方位视场的全景图像。The apparatus of claim 20, wherein the second image is a panoramic image covering an omnidirectional field of view.
根据权利要求20所述的装置，其特征在于，所述第二图像为曲面图像，所述处理器在对获取的所述第一图像进行拼接时用于，将各个所述第一图像反投影至指定的曲面模型，对所述第一图像之间的重叠区域进行融合。The device according to claim 20, wherein the second image is a curved surface image, and the processor is configured to back-project each of the first images when stitching the acquired first images To the specified surface model, the overlapping area between the first images is fused.
根据权利要求35所述的装置，其特征在于，所述指定的曲面模型包括圆柱体模型或球体模型。The apparatus of claim 35, wherein the specified surface model comprises a cylinder model or a sphere model.
根据权利要求35所述的装置，其特征在于，所述处理器在从所述第二图像中裁剪出所述区域对应的目标图像时用于，将所述第二图像中所述区域对应的图像投影至平面，得到所述目标图像。The device according to claim 35, wherein, when the processor is to crop out the target image corresponding to the area from the second image, the processor is configured to: The image is projected onto a plane to obtain the target image.
根据权利要求20所述的装置，其特征在于，所述至少两个摄像头之间的视场具有重叠区域。21. The apparatus of claim 20, wherein the fields of view between the at least two cameras have an overlapping area.
一种可移动平台，其特征在于，包括：A movable platform, characterized in that, comprising:

机体；body;

与所述机体连接的驱动装置，用于为所述可移动平台提供动力；a driving device connected with the body for powering the movable platform;

设置在所述机体上的多个摄像头，所述多个摄像头对应不同的视场，所述视场之间具有重叠区域；a plurality of cameras arranged on the body, the plurality of cameras correspond to different fields of view, and there are overlapping areas between the fields of view;

处理器与存储有计算机指令的存储器，其中，所述处理器在执行所述计算机指令时实现以下步骤：A processor and a memory storing computer instructions, wherein the processor implements the following steps when executing the computer instructions:

获取至少两个摄像头拍摄的第一图像；acquiring a first image captured by at least two cameras;

对所述至少两个摄像头获取的所述第一图像进行拼接，得到第二图像；Stitching the first images obtained by the at least two cameras to obtain a second image;

预测跟踪对象所在的区域，并从所述第二图像中裁剪出所述区域对应的目标图像；predicting the area where the tracking object is located, and cropping out the target image corresponding to the area from the second image;

基于所述目标图像对所述跟踪对象进行视觉跟踪。The tracking object is visually tracked based on the target image.
根据权利要求39所述的可移动平台，其特征在于，所述处理器在预测跟踪对象所在的区域时用于，根据所述跟踪对象的历史位置信息，预测所述跟踪对象的运动轨迹；基于所述运动轨迹，预测所述跟踪对象所在的区域。The movable platform according to claim 39, wherein, when predicting the area where the tracking object is located, the processor is used to predict the movement trajectory of the tracking object according to the historical position information of the tracking object; The motion track predicts the area where the tracking object is located.
根据权利要求40所述的可移动平台，其特征在于，所述跟踪对象的历史位置信息包括二维位置信息与深度信息，预测出的所述运动轨迹是三维运动轨迹。The movable platform according to claim 40, wherein the historical position information of the tracked object includes two-dimensional position information and depth information, and the predicted motion trajectory is a three-dimensional motion trajectory.
根据权利要求41所述的可移动平台，其特征在于，所述处理器在确定跟踪对象的深度信息时用于，通过所述至少两个摄像头拍摄的第一图像获取视差图；根据所述视差图确定所述跟踪对象对应的深度信息。The movable platform according to claim 41, wherein when the processor determines the depth information of the tracked object, the processor is configured to obtain a disparity map through the first images captured by the at least two cameras; The graph determines the depth information corresponding to the tracking object.
根据权利要求42所述的可移动平台，其特征在于，所述处理器在根据所述视差图确定所述跟踪对象对应的深度信息时用于，根据所述跟踪对象的二维位置信息，从所述视差图中确定所述跟踪对象的深度信息。The movable platform according to claim 42, wherein when the processor determines the depth information corresponding to the tracking object according to the disparity map, according to the two-dimensional position information of the tracking object, from Depth information of the tracked object is determined in the disparity map.
根据权利要求42所述的可移动平台，其特征在于，所述视差图是利用所述至少两个摄像头拍摄的第一图像进行立体匹配得到的。The movable platform according to claim 42, wherein the disparity map is obtained by stereo matching using the first images captured by the at least two cameras.
根据权利要求42所述的可移动平台，其特征在于，所述视差图是根据机器学习算法，利用所述至少两个摄像头拍摄的第一图像进行计算得到的。The movable platform according to claim 42, wherein the disparity map is calculated by using a first image captured by the at least two cameras according to a machine learning algorithm.
根据权利要求42所述的可移动平台，其特征在于，所述深度信息包含在任一所述第一图像、所述第二图像或所述目标图像对应的视差图中。The movable platform according to claim 42, wherein the depth information is included in a disparity map corresponding to any one of the first image, the second image or the target image.
根据权利要求41所述的可移动平台，其特征在于，所述跟踪对象包括多个。The movable platform of claim 41, wherein the tracking objects comprise a plurality of objects.
根据权利要求47所述的可移动平台，其特征在于，所述处理器还用于，当至少两个跟踪对象在同一区域丢失时，对所述同一区域进行目标检测；当检测到所述同一区域出现目标时，获取所述目标的深度信息；将所述目标的深度信息与丢失的跟踪对象的历史深度信息进行匹配，根据匹配结果确定所述目标对应的丢失的跟踪对象。The movable platform according to claim 47, wherein the processor is further configured to perform target detection on the same area when at least two tracking objects are lost in the same area; when the same area is detected When a target appears in the area, the depth information of the target is obtained; the depth information of the target is matched with the historical depth information of the lost tracking object, and the lost tracking object corresponding to the target is determined according to the matching result.
根据权利要求39所述的可移动平台，其特征在于，所述跟踪对象对应的视觉跟踪结果中包括所述跟踪对象的位置信息。The movable platform according to claim 39, wherein the visual tracking result corresponding to the tracking object includes position information of the tracking object.
根据权利要求49所述的可移动平台，其特征在于，所述跟踪对象的位置信息包括二维位置信息，所述处理器还用于，根据所述二维位置信息对所述跟踪对象进行标记。The movable platform according to claim 49, wherein the position information of the tracking object includes two-dimensional position information, and the processor is further configured to mark the tracking object according to the two-dimensional position information .
根据权利要求49所述的可移动平台，其特征在于，所述处理器还用于，根据所述跟踪对象的位置信息确定所述可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式。The movable platform according to claim 49, wherein the processor is further configured to determine the movement trajectory of the movable platform and/or the attitude change mode of at least one camera according to the position information of the tracking object .
根据权利要求51所述的可移动平台，其特征在于，所述处理器还用于，根据所述可移动平台的移动轨迹和/或至少一个摄像头的姿态变化方式，控制所述可移动平台和/或所述至少一个摄像头运动，以对所述跟踪对象进行视觉跟踪。The movable platform according to claim 51, wherein the processor is further configured to control the movable platform and the movable platform according to the movement trajectory of the movable platform and/or the attitude change mode of at least one camera. /or movement of the at least one camera to visually track the tracked object.
根据权利要求39所述的可移动平台，其特征在于，所述第二图像是覆盖全方位视场的全景图像。The movable platform of claim 39, wherein the second image is a panoramic image covering an omnidirectional field of view.
根据权利要求39所述的可移动平台，其特征在于，所述第二图像为曲面图像，所述处理器在对获取的所述第一图像进行拼接时用于，将各个所述第一图像反投影至指定的曲面模型，对所述第一图像之间的重叠区域进行融合。The movable platform according to claim 39, wherein the second image is a curved surface image, and the processor is configured to combine each of the first images when stitching the acquired first images. Back-projecting to the specified surface model, and merging the overlapping areas between the first images.
根据权利要求54所述的可移动平台，其特征在于，所述指定的曲面模型包括圆柱体模型或球体模型。The movable platform of claim 54, wherein the designated surface model comprises a cylinder model or a sphere model.
根据权利要求54所述的可移动平台，其特征在于，所述处理器在从所述第二图像中裁剪出所述区域对应的目标图像时用于，将所述第二图像中所述区域对应的图像投影至平面，得到所述目标图像。The movable platform according to claim 54, wherein the processor is configured to, when cropping out the target image corresponding to the region from the second image, convert the region in the second image The corresponding image is projected onto the plane to obtain the target image.
一种计算机可读存储介质，其特征在于，所述计算机可读存储介质存储有计算机指令，所述计算机指令被处理器执行时实现如权利要求1-19任一项所述的视觉跟踪方法。A computer-readable storage medium, characterized in that, the computer-readable storage medium stores computer instructions, and when the computer instructions are executed by a processor, the visual tracking method according to any one of claims 1-19 is implemented.