WO2020168515A1

WO2020168515A1 - Image processing method and apparatus, image capture processing system, and carrier

Info

Publication number: WO2020168515A1
Application number: PCT/CN2019/075707
Authority: WO
Inventors: 薛立君; 克拉夫琴科·费奥多尔
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2019-02-21
Filing date: 2019-02-21
Publication date: 2020-08-27
Also published as: CN111247790A

Abstract

Disclosed in embodiments of the present invention are an image processing method and apparatus, an image capture and processing system, and a carrier; the method comprises: acquiring an image frame sequence that is captured on a delay; in the image frame sequence, determining a target frame having a target object; cropping from the target frame an image area in which the target object is present; and filling in the image region after the target object is cropped, thus a target object in an image may be effectively removed, and the efficiency of playing back an image frame sequence captured on a delay is increased.

Description

一种图像处理方法、装置、图像拍摄和处理***及载体Image processing method, device, image shooting and processing system and carrier

本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或该专利披露。The content disclosed in this patent document contains copyrighted material. The copyright belongs to the copyright owner. The copyright owner does not object to anyone copying the patent document or the patent disclosure in the official records and archives of the Patent and Trademark Office.

技术领域Technical field

本发明实施例涉及图像处理技术领域，尤其涉及一种图像处理方法、装置、图像拍摄和处理***及载体。The embodiments of the present invention relate to the field of image processing technology, and in particular to an image processing method, device, image shooting and processing system and carrier.

背景技术Background technique

延时拍摄是指以一种将时间压缩的拍摄技术，在拍摄得到一组照片或者视频后，可后期通过照片串联或者视频抽帧将较长时间的过程压缩在较短时间内，并以视频的方式播出。Time-lapse shooting refers to a time-compression shooting technology. After a group of photos or videos is captured, the longer process can be compressed in a shorter time through photo series or video frame extraction. Way to broadcast.

随着无人机航拍技术的发展，越来越多的用户使用无人机实施延时拍摄。但是在无人机进行延时拍摄时，鸟类等昆虫很容易被无人机吸引，例如无人机在进行航拍时，常常有鸟类会伴随在无人机四周飞行，导致无人机的拍摄镜头中频繁出现鸟类或者鸟类的部分身体。除此之外，在使用手持云台等进行延时拍摄时也会出现类型情况，例如，在人流密度较大的景区，常常会有游客进入到延迟拍摄的视场范围。基于延时拍摄的特性，上述过程中，鸟类身体、游人的影像会突兀的出现在影像中，严重影响延时拍摄得到的视频的播放效果。With the development of drone aerial photography technology, more and more users use drones to implement time-lapse photography. However, when the drone is performing time-lapse shooting, birds and other insects are easily attracted by the drone. For example, when the drone is performing aerial photography, birds will often fly around the drone, causing the drone to fail. Birds or parts of birds frequently appear in the shots. In addition, there are also types of situations when using hand-held gimbals for time-lapse shooting. For example, in scenic spots with high crowd density, tourists often enter the field of view for delayed shooting. Based on the characteristics of time-lapse shooting, in the above process, images of bird bodies and tourists will suddenly appear in the images, which seriously affects the playback effect of the video obtained by time-lapse shooting.

发明内容Summary of the invention

有鉴于此，本发明实施例提供一种图像处理方法、装置、图像拍摄和处理***及装置，可有效去除图像中的异常物体，提高延时拍摄得到的图像帧序列的播放效果。In view of this, the embodiments of the present invention provide an image processing method, device, image shooting and processing system and device, which can effectively remove abnormal objects in images and improve the playback effect of image frame sequences obtained by time-lapse shooting.

本发明实施例的第一方面是提供的一种图像处理方法，包括：The first aspect of the embodiments of the present invention is to provide an image processing method, including:

获取延时拍摄的图像帧序列；Obtain a sequence of time-lapse images;

在所述图像帧序列中，确定具有目标物体的目标帧；In the sequence of image frames, determining a target frame with a target object;

抠除所述目标帧中存在目标物体的图像区域；Cut out the image area where the target object exists in the target frame;

填充抠除所述目标物体后的图像区域。Fill the image area after removing the target object.

本发明实施例的第二方面是提供的一种图像处理装置，包括存储器、处理器；The second aspect of the embodiments of the present invention is to provide an image processing device, including a memory and a processor;

所述存储器用于存储程序代码；The memory is used to store program codes;

所述处理器，调用所述程序代码，当程序代码被执行时，用于执行以下操作：The processor calls the program code, and when the program code is executed, is used to perform the following operations:

获取延时拍摄的图像帧序列；Obtain a sequence of time-lapse images;

本发明实施例的第三方面是提供的一种图像拍摄和处理***，其特征在于，包括拍摄装置和一个或多个处理器，其中：A third aspect of the embodiments of the present invention is to provide an image shooting and processing system, which is characterized in that it includes a shooting device and one or more processors, wherein:

所述拍摄装置，用于延时拍摄得到图像帧序列，并将所述图像帧序列发送给所述一个或多个处理器；The photographing device is configured to obtain an image frame sequence by time-lapse photographing, and send the image frame sequence to the one or more processors;

所述一个或多个处理器，被配置用于在所述图像帧序列中，确定具有目标物体的目标帧，抠除所述目标帧中存在目标物体的图像区域，并填充抠除所述目标物体后的图像区域。The one or more processors are configured to determine a target frame with a target object in the sequence of image frames, cut out the image area where the target object exists in the target frame, and fill and cut out the target The image area behind the object.

本发明实施例的第四方面是提供的一种载体，其特征在于，图像拍摄和处理装置，其中，所述图像拍摄和处理装置被配置用于：The fourth aspect of the embodiments of the present invention is to provide a carrier, which is characterized by an image capturing and processing device, wherein the image capturing and processing device is configured to:

获取延时拍摄的图像帧序列；Obtain a sequence of time-lapse images;

在本发明实施例中，控制终端可先获取延时拍摄的图像帧序列，从而可在所述图像帧序列中确定有目标物体的目标帧，通过对所述目标帧中存在目标物体的图像区域进行抠除，并通过对抠除所述目标物体后的图像区域的填充，可有效去除所述目标帧中目标物体对应的图像，从而提升延时拍摄的播放效果。In the embodiment of the present invention, the control terminal may first acquire the time-lapse captured image frame sequence, so that the target frame with the target object can be determined in the image frame sequence, and the image area where the target object exists in the target frame Cut out, and fill in the image area after the target object is cut out, which can effectively remove the image corresponding to the target object in the target frame, thereby improving the playback effect of time-lapse shooting.

附图说明Description of the drawings

为了更清楚地说明本发明实施例技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present invention more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present invention. Ordinary technicians can obtain other drawings based on these drawings without creative work.

图1是本发明实施例提供的一种图像处理场景的示意图；FIG. 1 is a schematic diagram of an image processing scene provided by an embodiment of the present invention;

图2是本发明另一实施例提供的一种图像处理场景的示意图；2 is a schematic diagram of an image processing scene provided by another embodiment of the present invention;

图3是本发明实施例提供的一种图像帧序列的示意图；3 is a schematic diagram of an image frame sequence provided by an embodiment of the present invention;

图4a是本发明实施例提供的一种具有目标物体的目标帧的示意图；Figure 4a is a schematic diagram of a target frame with a target object provided by an embodiment of the present invention;

图4b是本发明实施例提供的一种对如图4a所示的目标帧中的目标物体进行扣除后的示意图；FIG. 4b is a schematic diagram after subtracting the target object in the target frame shown in FIG. 4a according to an embodiment of the present invention;

图4c是本发明实施例提供的一种将如图4b所示的图像进行填充后的示意图；FIG. 4c is a schematic diagram after filling the image shown in FIG. 4b according to an embodiment of the present invention;

图5是本发明实施例提供的一种图像处理方法的示意流程图；FIG. 5 is a schematic flowchart of an image processing method provided by an embodiment of the present invention;

图6是本发明另一实施例提供的一种图像处理方法的示意流程图；FIG. 6 is a schematic flowchart of an image processing method according to another embodiment of the present invention;

图7是本发明实施例提供的一种目标物体为局部物体的示意图；FIG. 7 is a schematic diagram of a target object provided by an embodiment of the present invention as a local object;

图8是本发明实施例提供的一种图像拍摄和处理***的示意性框图；FIG. 8 is a schematic block diagram of an image shooting and processing system according to an embodiment of the present invention;

图9a是本发明实施例提供的一种局部掩膜的结构示意图；Fig. 9a is a schematic structural diagram of a partial mask provided by an embodiment of the present invention;

图9b是本发明实施例提供的一种部分卷积的神经网络结构示意图。Figure 9b is a schematic diagram of a partially convolutional neural network structure provided by an embodiment of the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

当前为了消除延时拍摄得到的图像帧序列中的目标物体，所述目标物体包括在延时拍摄过程中所述目标帧中包括的异常物体，或者，所述目标帧中用户指定抠除的物体，在得到基于延时拍摄的所述图像帧序列后可对所述目标物体进行手工处理，以消除序列中的目标物体，但是，采用手工处理的方式，可能存在遗漏的风险，而且对所述图像帧序列中的目标物体的去除效率较低，不能有效消除图像中的目标物体。基于此，本申请提出了一种图像处理方法，可自动对延时拍摄的图像帧序列进行识别，并可将该图像帧序列中存在目标物体的图像区域进行扣除，以及对扣除后的图像区域进行填充，可提高对图像帧序列中目标物体的去除效率，也可同时提高对图像帧序列中目标物体对应图像区域的消除效果，以提升延时拍摄得到的图像帧序列的播放效果。Currently, in order to eliminate the target object in the sequence of image frames obtained by time-lapse shooting, the target object includes an abnormal object included in the target frame during the time-lapse shooting process, or an object designated by the user to be removed in the target frame After obtaining the image frame sequence based on time-lapse shooting, the target object can be manually processed to eliminate the target object in the sequence. However, if the manual processing method is adopted, there may be a risk of omission. The removal efficiency of the target object in the image frame sequence is low, and the target object in the image cannot be effectively eliminated. Based on this, this application proposes an image processing method that can automatically recognize the time-lapse captured image frame sequence, and can subtract the image area where the target object exists in the image frame sequence, and the subtracted image area Filling can improve the removal efficiency of the target object in the image frame sequence, and can also improve the removal effect of the image area corresponding to the target object in the image frame sequence to improve the playback effect of the image frame sequence obtained by time-lapse shooting.

在一个实施例中，该图像处理方法可应用于如图1所示的图像处理场景中，其中，所述图像处理方法可具体应用于如图1所示的图像拍摄和处理***，其中，该***包括拍摄装置和一个或多个处理器，在本应用场景中，所述拍摄装置和所述一个或多个处理器集成在同一实体设备中，此时，所述图像拍摄和处理***仅包括该实体设备。如图中所示的无人机中，所述一个或多个处理器被配置在所述无人机内部，所述拍摄装置安装在所述无人机上，并用于延时拍摄得到图像帧序列，将所述图像帧序列发送给所述无人机，其中，具体发送给所述无人机中集成的一个或多个处理器，所述一个或多个处理器则可获取所述拍摄装置发送的延时拍摄的图像帧序列，具体地，可由所述拍摄装置按照预设的时间间隔进行图像采集，并发送给所述一个或多个处理器，以使得所述一个或多个处理器可将采集到的图像按照时间序列进行排序，从而可进一步将排序后的图像压缩成图像帧序列，从而可获取到基于延时拍摄的图像帧序列。In one embodiment, the image processing method may be applied to the image processing scene shown in FIG. 1, wherein the image processing method may be specifically applied to the image shooting and processing system shown in FIG. 1, wherein the The system includes a shooting device and one or more processors. In this application scenario, the shooting device and the one or more processors are integrated in the same physical device. At this time, the image shooting and processing system only includes The physical device. In the drone as shown in the figure, the one or more processors are configured inside the drone, and the photographing device is installed on the drone and used for time-lapse photography to obtain a sequence of image frames , Sending the sequence of image frames to the drone, where specifically sent to one or more processors integrated in the drone, and the one or more processors can acquire the camera The sent image frame sequence of time-lapse shooting can be specifically configured by the shooting device to collect images at preset time intervals and send to the one or more processors, so that the one or more processors The collected images can be sorted in a time sequence, so that the sorted images can be further compressed into a sequence of image frames, so that a sequence of image frames based on time-lapse shooting can be obtained.

在另一实施例中，所述拍摄装置和所述一个或多个处理器还可集成在不同的实体设备中，对应的所述图像拍摄和处理***由多台实体设备构成，所述拍摄装置例如可集成在手机、相机等设备中，所述一个或多个服务器处可集成在上述实体设备中之外，还可集成在地面站或者遥控设备中，其中，集成所述拍摄装置和所述一个或多个处理器的实体设备之间，可基于预先建立的通信连接进行图像传输，以实现对存在目标物体的图像的处理。In another embodiment, the photographing device and the one or more processors may also be integrated in different physical devices, and the corresponding image photographing and processing system is composed of multiple physical devices, and the photographing device For example, it can be integrated in devices such as mobile phones, cameras, etc. The one or more servers can be integrated in the above-mentioned physical equipment, and can also be integrated in a ground station or remote control device, where the camera and the camera are integrated. The physical devices of one or more processors may perform image transmission based on a pre-established communication connection, so as to realize the processing of the image of the target object.

在一个实施例中，所述图像处理方法还可应用于如图2所示的图像处理场景中，其中，所述图像处理方法可具体应用于如图2所示的载体中，所述载体包括图像拍摄和处理装置，所述图像拍摄和处理装置可搭载在所述载体上，该载体包括无人机、无人车或者具有云台的手持装置或者载体装置等。在本应用场景中，可以载体为手持云台为例进行说明，所述手持云台中配置有上述的图像拍摄和处理装置，其中，所述图像拍摄和处理装置可被配置用于执行：获取延时拍摄的图像帧序列，并对所述图像帧序列进行处理，得到扣除所述图像帧序列中异常物体的目标图像。图像拍摄装置可以载体的一部分，也可以固定安装于载体。图像处理装置以有线后者无线的方式与图像拍摄装置通信连接，用于接收来自图像处理装置捕获的图像数据。In one embodiment, the image processing method may also be applied to the image processing scene shown in FIG. 2, wherein the image processing method may be specifically applied to the carrier shown in FIG. 2, and the carrier includes An image capturing and processing device, the image capturing and processing device can be mounted on the carrier, and the carrier includes an unmanned aerial vehicle, an unmanned vehicle, or a handheld device or carrier device with a pan/tilt. In this application scenario, the carrier is a handheld pan/tilt as an example. The handheld pan/tilt is equipped with the above-mentioned image capturing and processing device, wherein the image capturing and processing device can be configured to perform: The sequence of image frames captured at time, and the sequence of image frames are processed to obtain a target image from which abnormal objects in the sequence of image frames are subtracted. The image capturing device may be a part of the carrier, or it may be fixedly installed on the carrier. The image processing device communicates with the image capturing device in a wired or wireless manner for receiving image data captured by the image processing device.

在一个实施例中，所述无人机或者所述手持云台可按照时间顺序对采集到的图像进行排序，下面以图1所示的拍摄场景为例对本方案进行详细阐述，在图2所示的拍摄场景下实施该图像处理方法的具体实现方式可参见本发明实施例。具体地，无人机，即所述无人机中的一个或多个处理器可按照如图3中箭头所指示的时间顺序对采集到的图像进行排序，排序后进一步压缩得到的图像帧序列可如图3所示。进一步地，在得到所述图像帧序列后，可对所述图像帧序列中的各帧图像进行识别，以从所述图像帧序列中确定出包括目标物体的目标帧，其中，所述目标物体是在延时拍摄过程中出现的异常物体，所述异常物体是所述目标帧和与所述目标帧的相邻帧在同一位置对应不同像素值的像素点所组成图像对应的物体，其中，对图3所示的图像帧序列的识别，确定出的包括目标物体对应的图像是图示中标记序号为2的图像。In one embodiment, the drone or the handheld pan/tilt may sort the collected images in chronological order. The following takes the shooting scene shown in FIG. 1 as an example to describe this solution in detail, as shown in FIG. 2 For a specific implementation manner of implementing the image processing method in the shooting scene shown, see the embodiment of the present invention. Specifically, the drone, that is, one or more processors in the drone can sort the collected images according to the time sequence indicated by the arrow in FIG. 3, and further compress the resulting image frame sequence after sorting Can be shown in Figure 3. Further, after the image frame sequence is obtained, each frame image in the image frame sequence may be identified to determine a target frame including a target object from the image frame sequence, wherein the target object Is an abnormal object that appears in the time-lapse shooting process, and the abnormal object is an object corresponding to the image composed of the target frame and the pixel points corresponding to different pixel values at the same position of the adjacent frame of the target frame, wherein, Recognizing the image frame sequence shown in FIG. 3, the determined image including the target object is the image with the number 2 in the figure.

在所述无人机获取到延时拍摄的图像帧序列后，可在所述图像帧序列中确定具有目标物体的目标帧，可假设所述无人机从所述图像帧序列中确定的具有目标物体的目标帧如图4a所示，其中，4a所示的图像即是上述图3中用序号2所标记的图像。其中所述目标物体假设为图中401区域标识的异常物体。其中，所述目标物体可以是所述无人机中预设的作为异常物体的鸟等干扰物，也可以是用户选择的需要消除的物体等。在无人机确定出目标物体时，所述无人机可在检测到所述目标帧中包括预设的干扰物时，从所述目标帧中确定出目标物体；或者所述无人机还可对任一图像帧中包括的物体的种类进行识别，并确定出任一图像帧中包括的物体种类，从而可基于该图像帧中包括的各种类物体的数量，确定出目标物体，例如，可将所述图像帧中包括的各种类物体的数量最少的确定为目标物体。After the drone acquires the time-lapse captured image frame sequence, the target frame with the target object can be determined in the image frame sequence, and it can be assumed that the drone has determined from the image frame sequence with The target frame of the target object is shown in FIG. 4a, where the image shown in 4a is the image marked with the serial number 2 in FIG. 3 above. The target object is assumed to be an abnormal object identified by area 401 in the figure. Wherein, the target object may be an interference object such as a bird as an abnormal object preset in the UAV, or an object selected by the user to be eliminated. When the drone determines the target object, the drone may determine the target object from the target frame when it detects that the target frame includes a preset interference; or the drone also The types of objects included in any image frame can be recognized, and the types of objects included in any image frame can be determined, so that the target object can be determined based on the number of various types of objects included in the image frame, for example, The least number of objects of various types included in the image frame can be determined as the target object.

在所述无人机从所述图像帧序列中确定出具有目标物体的目标帧后，可扣除所述目标帧中存在目标物体的图像区域，即将图4a中401区域标识的图像进行扣除，扣除后的目标帧可如图4b所示。进一步地，在扣除目标帧中存在目标物体的图像区域后，可对该区域进行填充，填充后的图像可如图4c所示，从而可避免目标物体在进行图像帧序列播放时，对播放效果产生的影响，提高用户的观影质量。在如图2所示的拍摄场景中，安置在手持云台上的所述目标物体(即异常物体)为拍摄过程中突然出现的游客，即图2中的人。After the drone determines the target frame with the target object from the image frame sequence, the image area where the target object exists in the target frame can be subtracted, that is, the image identified by area 401 in FIG. The subsequent target frame can be shown in Figure 4b. Further, after subtracting the image area of the target object in the target frame, the area can be filled. The filled image can be shown in Figure 4c, which can prevent the target object from affecting the playback effect when the image frame sequence is played. The resulting impact improves the user’s viewing quality. In the shooting scene shown in FIG. 2, the target object (that is, the abnormal object) placed on the handheld PTZ is a tourist who suddenly appears during the shooting, that is, the person in FIG.

请参见图5，是本发明实施例提出的一种图像处理方法的示意流程图，所述图像处理方法可具体应用于上述的图像拍摄和处理***以及载体中，在本发明实施例中，以执行主体为在图像拍摄和处理***为例，对该图像处理方法进行具体阐述，其中，如图5所示，所述方法包括：Refer to FIG. 5, which is a schematic flowchart of an image processing method proposed by an embodiment of the present invention. The image processing method can be specifically applied to the above-mentioned image capturing and processing system and carrier. In the embodiment of the present invention, The execution subject takes the image shooting and processing system as an example, and the image processing method is described in detail. As shown in FIG. 5, the method includes:

S501，获取延时拍摄的图像帧序列。S501: Acquire a time-lapse shot image frame sequence.

在一个实施例中，所述图像拍摄和处理***中的拍摄装置用于延时拍摄得到图像帧序列，并将该图像帧系序列发送到该***包括的一个或多个处理器中，所述处理器则可被配置获取得到延时拍摄的图像帧序列。In one embodiment, the photographing device in the image photographing and processing system is used for time-lapse photographing to obtain an image frame sequence, and the image frame sequence is sent to one or more processors included in the system. The processor can be configured to obtain a sequence of time-lapse images.

具体地，所述拍摄装置可通过按照预设的时间间隔拍摄得到多帧图像，所述预设时间间隔例如可以是30分钟，2小时等，所述拍摄装置例如可以是摄像头等图像采集设备。在所述拍摄装置拍摄得到多帧图像后，可直接对所述多帧图像基于时间序列进行排序，得到初始图像序列，进一步地，可将所述初始图像序列进行压缩，以生成图像帧序列。Specifically, the photographing device may obtain multiple frames of images by photographing according to a preset time interval, the preset time interval may be, for example, 30 minutes, 2 hours, etc., and the photographing device may be, for example, an image acquisition device such as a camera. After the multiple frames of images are captured by the photographing device, the multiple frames of images can be directly sorted based on a time sequence to obtain an initial image sequence, and further, the initial image sequence can be compressed to generate an image frame sequence.

在另一实施例中，所述图像帧序列还可以是集成有所述一个或多个处理器的设备生成的。具体地，集成有所述一个或多个处理器的设备例如可以是上述的无人机、无人汽车、地面站以及遥控设备等。所述拍摄装置在拍摄到多帧图像后，可将所述多帧图像直接发送给一个或多个处理器，以便于所述一个或多个处理器将所述多帧图像基于时间序列进行排序得到初始图像序列，并对所述初始图像序列进行压缩，从而生成所述图像帧序列。In another embodiment, the sequence of image frames may also be generated by a device integrated with the one or more processors. Specifically, the device integrated with the one or more processors may be, for example, the aforementioned drone, unmanned vehicle, ground station, and remote control device. After capturing multiple frames of images, the photographing device may directly send the multiple frames of images to one or more processors, so that the one or more processors can sort the multiple frames of images based on a time sequence The initial image sequence is obtained, and the initial image sequence is compressed to generate the image frame sequence.

S502，在所述图像帧序列中，确定具有目标物体的目标帧。S502: Determine a target frame with a target object in the sequence of image frames.

在图像拍摄和处理***获取到图像帧序列之后，其中，具体由所述一个或多个处理器获取到该图像帧序列，为了从所述图像帧序列中确定出具有目标物体的目标帧，可先对所述图像帧序列进行预处理，即可先将所述图像帧序列拆分为按照时间序列排序的图像组，从而可从所述图像组中确定出具有目标物体的目标帧。After the image capture and processing system acquires the image frame sequence, where the image frame sequence is specifically acquired by the one or more processors, in order to determine the target frame with the target object from the image frame sequence, The image frame sequence is preprocessed first, and the image frame sequence can be split into image groups sorted according to a time sequence, so that a target frame with a target object can be determined from the image group.

在所述图像拍摄和处理***确定目标物体时，可基于所述图像拍摄和处理***中预先设定的目标物体确定，具体地，可先确定所述图像拍摄和处理***预设的目标物体的种类，从而可对该图像帧序列的任一帧图像进行图像识别，以从所述任一帧图像中确定出包括的物体的类别，从而可将所述包括的物体的类别和所述预设的目标物体的种类进行对比，从而可根据对比结果，确定出包括所述目标物体的种类的目标帧。When the image capturing and processing system determines the target object, it may be determined based on the target object preset in the image capturing and processing system. Specifically, the target object preset by the image capturing and processing system may be determined first. Type, so that image recognition can be performed on any frame of the image frame sequence to determine the category of the included object from the any frame of image, so that the category of the included object can be compared with the preset According to the comparison result, a target frame including the type of the target object can be determined.

在一个实施例中，所述图像拍摄和处理***预设的目标物体的种类，可基于拍摄场景的不同，而预设不同的目标物体的种类，所述拍摄场景例如可以是自然风光、城市生活、生物演变等，例如所述拍摄场景为自然风光时，预设的目标物体的种类可以是鸟类等；当所述拍摄场景为生物演变时，预设的目标物体的种类可以是人类等。其中，所述图像拍摄和处理***预设的目标物体的种类可以是一种，也可以是多种。In one embodiment, the types of target objects preset by the image shooting and processing system may be preset based on different shooting scenes, and different types of target objects may be preset. The shooting scene may be, for example, natural scenery or urban life. , Biological evolution, etc., for example, when the shooting scene is natural scenery, the preset target object type may be birds, etc.; when the shooting scene is biological evolution, the preset target object type may be humans, etc. Wherein, the type of the target object preset by the image capturing and processing system may be one type or multiple types.

S503，抠除所述目标帧中存在目标物体的图像区域。S503: Cut out the image area where the target object exists in the target frame.

图像拍摄和处理***在扣除所述目标帧中存在目标物体的图像区域之前，可先基于预设网络模型确定所述目标物体对应于所述目标帧的图像区域，其中，所述预设网络模型例如可以是区域卷积神经网络(Region CNN，RCNN)网络模型，具体地，所述图像拍摄和处理***可将所述目标帧输入所述RCNN模型，从而可基于所述RCNN模型的输出确定所述目标物体对应于所述目标帧的图像区域。Before deducting the image area of the target object in the target frame, the image capturing and processing system may first determine the image area of the target object corresponding to the target frame based on a preset network model, wherein the preset network model For example, it can be a regional convolutional neural network (Region CNN, RCNN) network model. Specifically, the image capturing and processing system can input the target frame into the RCNN model, so that the target frame can be determined based on the output of the RCNN model. The target object corresponds to the image area of the target frame.

所述RCNN模型在基于输入的目标帧确定目标物体对应的图像区域时，可先对输入的目标帧进行特征提取，从而可基于特征提取结果确定所述目标帧中包括物体的类别，进一步地，可将所述目标帧中包括物体的类别和所述图像拍摄和处理***预设的目标物体的种类进行对比，从而可根据对比结果，确定出所述目标帧中目标物体对应的图像区域。When the RCNN model determines the image area corresponding to the target object based on the input target frame, it can first perform feature extraction on the input target frame, so that the category of the object included in the target frame can be determined based on the feature extraction result. Further, The category of the object included in the target frame can be compared with the category of the target object preset by the image capturing and processing system, so that the image area corresponding to the target object in the target frame can be determined according to the comparison result.

在确定出所述目标帧中目标物体对应的图像区域后，可在所述图像区域生成对应的局部掩膜图形，即掩膜图像用于对所述图像的局部区域进行标识，从而可对所述局部掩膜图像所标识的局部图像区域进行抠除，从而实现对所述目标帧中存在目标物体的图像区域进行抠除。在抠除所述目标帧中存在目标物体的图像区域后，可在图像中用白色区域进行表示，并执行步骤S504。After the image area corresponding to the target object in the target frame is determined, the corresponding partial mask graphic can be generated in the image area, that is, the mask image is used to identify the partial area of the image, so that all The partial image area identified by the partial mask image is cut out, so as to realize the cut out of the image area where the target object exists in the target frame. After the image area where the target object exists in the target frame is cut out, a white area may be used in the image to represent it, and step S504 is executed.

S504，填充抠除所述目标物体后的图像区域。S504: Fill the image area after the target object is cut out.

在一个实施例中，所述图像拍摄和处理***在将目标帧中存在目标物体的图像区域进行抠除后，需对扣除所述目标物体后的图像区域进行填充，以保持目标帧图像的连续性，并可同时保证所述图像帧序列在进行播放时的播放效果。In one embodiment, after the image capturing and processing system cuts out the image area where the target object exists in the target frame, it needs to fill the image area after the target object is subtracted to maintain the continuity of the target frame image It can also guarantee the playback effect of the image frame sequence during playback.

在对扣除所述目标物体后的图像区域进行填充时，可基于所述目标帧的前一帧图像以及后一帧图像对所述图像区域进行填充；或者，可将抠除存在目标物体的图像区域的目标帧和所述目标帧中存在异常图像的图像区域对应的单元图像输入卷积神经网络模型，从而实现对扣除目标物体的图像区域的填充，其中，所述卷积神经网络模型的输出图像即为进行填充后的目标帧，在利用所述卷积神经网络模型进行像素填充时，具体可以是由卷积神经网络结构中部分卷积(Partial Convolutions)层的卷积神经(UNet)结构网格进行填充的；又或者，可将所述目标帧的前一帧图像、后一帧图像、抠除存在目标物体的图像区域的目标帧，以及所述目标帧中存在异常图像的图像区域对应的单元图像输入卷积神经网络模型进行像素填充。When filling the image area after the target object is subtracted, the image area may be filled based on the previous frame image and the next frame image of the target frame; or, the image with the target object may be cut out The target frame of the region and the unit image corresponding to the image region of the abnormal image in the target frame are input into the convolutional neural network model, so as to realize the filling of the image region of the deducted target object, wherein the output of the convolutional neural network model The image is the target frame after filling. When using the convolutional neural network model for pixel filling, it can specifically be a convolutional neural (UNet) structure of a partial convolution (Partial Convolutions) layer in the convolutional neural network structure. The grid is filled; or, the previous frame image, the next frame image of the target frame, the target frame that cuts out the image area of the target object, and the image area of the abnormal image in the target frame The corresponding unit image is input to the convolutional neural network model for pixel filling.

在本发明实施例中，图像拍摄和处理***可先获取延时拍摄的图像帧序列，从而可在所述图像帧序列中确定有目标物体的目标帧，通过对所述目标帧中存在目标物体的图像区域进行抠除，并通过对抠除所述目标物体后的图像区域的填充，可有效去除所述目标帧中目标物体对应的图像，从而提升延时拍摄的播放效果。In the embodiment of the present invention, the image capturing and processing system may first obtain the sequence of time-lapse captured image frames, so that the target frame with the target object can be determined in the sequence of image frames, and the target object may be detected in the target frame. By cutting out the image area of the target object, and by filling the image area after the target object is cut out, the image corresponding to the target object in the target frame can be effectively removed, thereby improving the playback effect of time-lapse shooting.

请参见图6，是本发明另一实施例提出的一种图像处理方法的示意流程图，所述图像处理方法也可具体应用于上述的图像拍摄和处理***以及载体中，在本发明实施例中，同样以执行主体为在图像拍摄和处理***为例，对该图像处理方法进行具体阐述，其中，如图6所示，所述方法包括：Refer to FIG. 6, which is a schematic flowchart of an image processing method according to another embodiment of the present invention. The image processing method can also be specifically applied to the aforementioned image capturing and processing system and carrier. In the embodiment of the present invention, In the example, the execution subject is the image capturing and processing system as an example, and the image processing method is described in detail. As shown in FIG. 6, the method includes:

S601，获取延时拍摄的图像帧序列。S601: Acquire a sequence of time-lapsed image frames.

在一个实施例中，图像拍摄和处理***中的拍摄装置在确定图像帧序列时，可先获取拍摄装置拍摄的至少一帧初始图像，从而可将所述至少一帧图像基于时间序列进行排序，得到初始图像序列，进一步地，可通过将所述初始图像序列进行压缩，从而可得到图像帧序列。其中，在对所述初始图像帧序列进行压缩时，可先删除基于所述初始图像序列中的空白图像，并可通过修改初始图像对应的时间，从而得到所述图像帧序列，得到的所述图像帧序列的每一帧图像都是连续的非空白图像。In one embodiment, when determining the sequence of image frames, the photographing device in the image capturing and processing system may first obtain at least one frame of initial images photographed by the photographing device, so that the at least one frame of images may be sorted based on the time sequence, The initial image sequence is obtained, and further, the image frame sequence can be obtained by compressing the initial image sequence. Wherein, when the initial image frame sequence is compressed, blank images based on the initial image sequence may be deleted first, and the time corresponding to the initial image may be modified to obtain the image frame sequence, and the obtained Each frame of the image frame sequence is a continuous non-blank image.

S602，在所述图像帧序列中，确定具有目标物体的目标帧。S602: Determine a target frame with a target object in the sequence of image frames.

在一个实施例中，所述图像拍摄和处理***在确定具有目标物体的目标帧时，可先确定所述目标帧的相邻帧，并将所述目标帧和所述相邻帧进行对比，从而可从所述目标帧中确定出与所述相邻帧在同一位置对应不同像素值的像素点组成的目标图像，所述目标图像对应的物体即为所述目标物体，进一步地，可从所述图像帧序列中确定出包括所述目标物体的目标帧。In one embodiment, when the image capturing and processing system determines a target frame with a target object, it may first determine adjacent frames of the target frame, and compare the target frame with the adjacent frames, Therefore, a target image composed of pixels corresponding to different pixel values at the same position of the adjacent frame can be determined from the target frame, and the object corresponding to the target image is the target object. A target frame including the target object is determined in the sequence of image frames.

所述目标物体一般为移动物体，且可以为移动物体的全部或者局部，当所述移动物体的全部被拍摄到目标帧中时，所述目标物体为所述移动物体的全部，如图4a所示，目标帧中目标物体为鸟的全部；当所述移动物体只有部分被拍摄到目标帧中时，所述目标物体为所述移动物体的局部，即所述移动物体被拍摄到目标帧中的部分，如图7所示，目标帧中目标物体为鸟的局部(脚)，即用701标识的局部图像。The target object is generally a moving object, and may be all or part of a moving object. When all of the moving object is photographed in the target frame, the target object is all of the moving object, as shown in FIG. 4a Shows that the target object in the target frame is all of the bird; when only part of the moving object is captured in the target frame, the target object is a part of the moving object, that is, the moving object is captured in the target frame As shown in Figure 7, the target object in the target frame is the part (foot) of the bird, that is, the part image identified by 701.

在一个实施例中，图像拍摄和处理***可基于上述的卷积神经网络结构对所述目标帧中的图像边缘进行识别，可提高对目标帧中目标物体对应图像的识别速度，具体地，可由卷积神经网络结构中Partial Convolutions层的UNet结构对图像边缘进行识别，并根据识别结果确定出目标帧中目标物体对应图像的边缘。具体地，采用Partial Convolutions层对图像边缘进行识别时，可确定出属于同一语义的像素点集合，从而可将所述属于同一语义的像素点集合所构成的图像区域作为目标物体对应图像的图像区域，进一步地，可确定出目标帧。其中，属于同一语义的像素点是指：用于描述同一物体特征的像素点，例如，图4a中鸟的翅膀对应的像素点和鸟的脚对应的像素点都是用于描述鸟的特征的像素点，因此，该翅膀对应的像素点和脚对应的像素点是属于同一语义的像素点集合。而车门对应的像素点不是用于描述鸟的特征的像素点，因此，车门对应的像素点和翅膀对应的像素点不属于同一语义的像素点集合。In one embodiment, the image capturing and processing system can recognize the edge of the image in the target frame based on the above-mentioned convolutional neural network structure, which can improve the recognition speed of the image corresponding to the target object in the target frame. Specifically, The UNet structure of the Partial Convolutions layer in the convolutional neural network structure recognizes the edge of the image, and determines the edge of the image corresponding to the target object in the target frame according to the recognition result. Specifically, when the Partial Convolutions layer is used to recognize the edge of an image, a set of pixels belonging to the same semantic can be determined, so that the image area formed by the set of pixels belonging to the same semantic can be used as the image area of the image corresponding to the target object Further, the target frame can be determined. Among them, the pixels belonging to the same semantics refer to the pixels used to describe the characteristics of the same object. For example, the pixels corresponding to the wings of the bird and the pixels corresponding to the feet of the bird in Figure 4a are used to describe the characteristics of the bird Pixels, therefore, the pixels corresponding to the wings and the feet are a set of pixels belonging to the same semantics. The pixel points corresponding to the car door are not the pixels used to describe the characteristics of the bird. Therefore, the pixel points corresponding to the car door and the pixels corresponding to the wings do not belong to the same semantic pixel point set.

为了增强部分卷积算法对图像边缘识别的可靠性，在确定出属于同一语义的像素点集合后，还可进一步确定出各属于同一语义的像素点所构成的图像区域对应的物体与图像拍摄和处理***之间的距离，从而可通过距离比较确定出目标物体，并进一步确定出具有目标物体的目标帧。例如，如果在图4a中基于同一语义确定出的鸟与图像拍摄和处理***之间的距离为a，而基于同一语义确定出的车与图像拍摄和处理***之间的距离为b，且a小于b，则说明车和鸟不在同一平面上，因此，确定出的目标物体则为鸟。In order to enhance the reliability of image edge recognition by some convolution algorithms, after determining the set of pixels that belong to the same semantics, it can further determine the objects and image capture and image captures corresponding to the image area composed of pixels that belong to the same semantics. The distance between the systems is processed so that the target object can be determined by distance comparison, and the target frame with the target object can be further determined. For example, if the distance between a bird and the image capturing and processing system determined based on the same semantics in Figure 4a is a, and the distance between a car and the image capturing and processing system determined based on the same semantics is b, and a If it is less than b, it means that the car and the bird are not on the same plane. Therefore, the determined target object is a bird.

所述图像拍摄和处理***在确定具有目标物体的目标帧时，可先基于神经网络模型对所述图像帧序列进行处理，基于对所述图像帧序列的处理，可确定所述图像帧序列中各图像包括的物体的类别。具体地，所述图像拍摄和处理***可先将所述图像帧序列包括的多张图像输入所述神经网络模型；从而可调用所述神经网络模型对所述图像帧序列中的任一图像进行特征提取，得到特征提取结果；基于所述特征提取结果，可确定所述图像帧序列中各图像包括物体的类别。在一个实施例中，所述特征例如可以是颜色特征以及纹理特征等。When the image shooting and processing system determines the target frame with the target object, it may first process the image frame sequence based on the neural network model. Based on the processing of the image frame sequence, it may determine that the image frame sequence is The category of objects included in each image. Specifically, the image capturing and processing system may first input multiple images included in the image frame sequence into the neural network model; thus, the neural network model may be called to perform processing on any image in the image frame sequence. Feature extraction to obtain a feature extraction result; based on the feature extraction result, it can be determined that each image in the image frame sequence includes the category of the object. In one embodiment, the features may be color features, texture features, and the like, for example.

所述图像拍摄和处理***在基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别时，可先调用神经网络模型对所述特征提取结果进行特征汇总，得到特征汇总结果，在一个实施例中，所述图像拍摄和处理***可针对不同特征分别进行汇总，得到对应的特征汇总结果，例如分别对纹理特征进行汇总，得到纹理特征汇总结果，以及对颜色特征进行汇总，得到颜色特征汇总结果。在另一实施例中，所述图像拍摄和处理***还可将所有特征提取结果进行汇总，例如可基于所述纹理特征和所述颜色特征进行汇总，得到特征汇总结果。在对特征提取结果进行汇总时，可将所述特征提取结果直接相加，得到所述特征汇总结果；也可将所述特征提取结果进行加权计算，从而得到特征汇总结果。When the image shooting and processing system determines that each image in the image frame sequence includes the category of the object based on the feature extraction result, it may first call the neural network model to perform feature summary on the feature extraction result to obtain the feature summary result In one embodiment, the image shooting and processing system may separately summarize different features to obtain corresponding feature summary results, for example, separately summarize texture features to obtain texture feature summary results, and summarize color features, Get the color feature summary result. In another embodiment, the image shooting and processing system may also summarize all the feature extraction results, for example, it may summarize based on the texture feature and the color feature to obtain a feature summary result. When summarizing the feature extraction results, the feature extraction results can be directly added to obtain the feature summary result; or the feature extraction results can be weighted and calculated to obtain the feature summary result.

在得到所述特征汇总结果后，则图像拍摄和处理***可基于所述特征汇总结果，确定所述图像帧序列中各图像包括物体的类别，具体地，所述图像拍摄和处理***可将所述特征汇总结果和各物体种类对应的预设特征值进行匹配，从而可根据匹配结果，确定所述各图像包括物体的类别。After obtaining the feature summary result, the image shooting and processing system can determine that each image in the image frame sequence includes the category of the object based on the feature summary result. Specifically, the image shooting and processing system can The feature summary result is matched with the preset feature value corresponding to each object type, so that the category of the object can be determined in each image according to the matching result.

在所述图像拍摄和处理***基于所述神经网络模型对所述图像帧序列进行处理，确定出所述图像帧序列中各图像包括物体的类别后，进一步地，可基于所述神经网络模型对所述图像帧序列进行处理的输出结果确定目标帧。具体地，所述图像拍摄和处理***可基于所述神经网络模型输出的各图像包括的物体的类别，从所述图像帧序列中确定出目标帧，其中，所述图像拍摄和处理***可将所述各图像包括的物体的类别和预设的目标物体的类别进行匹配，从而可判断所述图像帧序列中各图像是否包含目标物体，并将包含所述目标物体的图像帧确定为目标帧。After the image shooting and processing system processes the image frame sequence based on the neural network model, and determines that each image in the image frame sequence includes the category of the object, further, the neural network model can be used to The output result of processing the image frame sequence determines the target frame. Specifically, the image capturing and processing system may determine a target frame from the sequence of image frames based on the category of objects included in each image output by the neural network model, wherein the image capturing and processing system may The types of objects included in each image are matched with the preset types of target objects, so that it can be determined whether each image in the sequence of image frames contains a target object, and the image frame containing the target object is determined as the target frame .

在另一实施例中，所述图像拍摄和处理***从图像帧序列中确定具有目标物体的目标帧时，还可针对所述图像帧序列包括多帧图像的任一帧图像，并对所述图像进行区域划分，得到多个区域图像；从而可获取各个所述区域图像的特征参数，并基于所述特征参数，确定图像帧序列中具有目标物体的目标帧，其中，在获取各个区域图像的特征参数后，可基于所述特征参数，确定所述图像帧序列中的任一帧图像包括的物体类别，从而可根据所述物体类别，确定所述图像帧序列中具有目标物体的目标帧。In another embodiment, when the image capturing and processing system determines a target frame with a target object from the sequence of image frames, it may also target any one of the multiple frames of images in the sequence of image frames, and compare the The image is divided into regions to obtain multiple regional images; thereby, the characteristic parameters of each of the regional images can be obtained, and based on the characteristic parameters, the target frame with the target object in the image frame sequence can be determined. After the feature parameter, the object category included in any frame image in the image frame sequence can be determined based on the feature parameter, so that the target frame with the target object in the image frame sequence can be determined according to the object category.

S603，确定所述目标帧中存在目标物体的图像区域。S603: Determine an image area where the target object exists in the target frame.

S604，基于所述图像区域，生成和所述图像区域对应的局部掩膜图形。S604: Based on the image area, generate a partial mask pattern corresponding to the image area.

S605，根据所述局部掩膜图形和所述目标帧，对所述目标帧中存在目标物体的图像区域进行抠除。S605, according to the partial mask graphic and the target frame, cut out an image area in the target frame where the target object exists.

步骤S603-步骤S605，是对上述实施例中步骤S503的具体细化。图像拍摄和处理***在抠除所述目标帧中存在目标物体的图像区域时，可先确定所述目标帧中存在目标物体的图像区域，在一个实施例中，图像拍摄和处理***可调用预设网络模型确定所述目标帧中存在目标物体的图像区域，所述预设网络模型例如可以是上述的RCNN模型。Step S603-Step S605 are specific details of step S503 in the foregoing embodiment. When the image shooting and processing system cuts out the image area where the target object exists in the target frame, it may first determine the image area where the target object exists in the target frame. In one embodiment, the image shooting and processing system may call the preset Assuming that the network model determines the image area where the target object exists in the target frame, the preset network model may be the aforementioned RCNN model, for example.

在确定所述目标帧中存在目标物体的图像区域后，可基于所述图像区域，生成和所述图像区域对应的局部掩膜图形，即掩膜图形，所述局部掩膜图形用于标记所述目标帧中存在目标物体的局部图像区域，基于所述掩膜图形和所述目标帧，可对所述目标帧中存在目标物体的图像区域进行抠除，抠除所述存在目标物体的图像区域后在所述目标帧中用白色区域进行表示。After determining the image area of the target object in the target frame, a partial mask pattern corresponding to the image area, that is, a mask pattern, can be generated based on the image area, and the partial mask pattern is used to mark all The partial image area of the target object exists in the target frame, and based on the mask image and the target frame, the image area where the target object exists in the target frame may be cut out, and the image with the target object may be cut out After the area, it is represented by a white area in the target frame.

举例来说，如果目标帧为图4a所示的图像，所述目标帧中存在目标物体的图像区域对应的可以是401所标识的区域，基于所述图像区域可生成对应的局部掩膜图形，生成局部掩膜图形后的所述目标帧可如图4b所示，在生成所述掩膜图形后，可对处于所述掩膜图形内的图像进行抠除，抠除处于所述掩膜内的图像后，所述目标帧中存在目标物体的图像区域可用白色进行表示，如图4c所示。For example, if the target frame is the image shown in FIG. 4a, the image area where the target object exists in the target frame may correspond to the area identified by 401, and the corresponding partial mask pattern may be generated based on the image area, The target frame after the partial mask pattern is generated can be as shown in FIG. 4b. After the mask pattern is generated, the image in the mask pattern can be cut out, and the cut out is in the mask. After the image, the image area where the target object exists in the target frame can be represented in white, as shown in Fig. 4c.

在一个实施例中，可基于局部掩膜图形的周围图像信息对扣除目标物体后的图像区域进行填充，具体地，可先确定所述目标帧中存在目标物体的图像区域的周围图像域，该周围图像域中的像素点和该存在目标物体的图像区域的像素点之间的距离小于或等于预设距离阈值，进一步地，则可基于所述周围图像域对扣除所述目标物体后的图像区域进行填充。In one embodiment, the image area after the target object is subtracted may be filled based on the surrounding image information of the partial mask graphic. Specifically, the surrounding image domain of the image area where the target object exists in the target frame may be determined first. The distance between the pixel in the surrounding image domain and the pixel in the image area where the target object exists is less than or equal to a preset distance threshold. Further, the image after the target object can be subtracted based on the surrounding image domain The area is filled.

在基于所述周围图像域对扣除所述目标物体后的图像区域进行填充时，可先从所述图像帧序列中确定出参考帧，所述参考帧为所述目标帧的前M帧中的任一帧，其中，M为大于1的整数；进一步地，所述图像拍摄和处理***可确定所述参考帧的曝光强度，以使得图像拍摄和处理***可采用白平衡算法，并基于所述参考帧的曝光强度以及所述周围图像域对扣除所述目标物体后的图像区域进行填充。When filling the image area after subtracting the target object based on the surrounding image domain, a reference frame may be determined from the sequence of image frames first, and the reference frame is one of the first M frames of the target frame. Any frame, where M is an integer greater than 1; further, the image capturing and processing system can determine the exposure intensity of the reference frame, so that the image capturing and processing system can use a white balance algorithm based on the The exposure intensity of the reference frame and the surrounding image domain fill the image area after the target object is subtracted.

S606，填充抠除所述目标物体后的图像区域。S606: Fill the image area after the target object is cut out.

在一个实施例中，图像拍摄和处理***在填充抠除所述目标物体后的图像区域时，可先获取所述目标帧包括的第一单元图像，其中所述第一单元图像为所述目标帧中存在目标物体的图像区域，从而可将抠除存在目标物体的图像区域后的目标帧和所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像即为进行填充后的目标帧。In one embodiment, when the image capturing and processing system fills in the image area after the target object is cut out, it may first acquire the first unit image included in the target frame, where the first unit image is the target There is an image area of the target object in the frame, so that the target frame after the image area where the target object exists and the first unit image can be input into the convolutional neural network model, and the output image of the convolutional neural network model can be obtained , The output image is the target frame after filling.

在另一实施例中，所述图像拍摄和处理***在填充抠除所述目标物体后的图像区域时，还可先获取所述目标帧包括的第一单元图像，以及所述目标帧的前一帧图像和后一帧图像，其中，所述第一单元图像为所述目标帧中存在目标物体的图像区域，从而可将所述前一帧图像，所述后一帧图像，所述抠除存在目标物体的图像区域后的目标帧以及所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像即为进行填充后的目标帧。其中，所述卷积神经网络模型具体可以是上述的Partial Convolutions层的卷积神经UNet结构。In another embodiment, when the image capturing and processing system fills in the image area after the target object is cut out, it may also first acquire the first unit image included in the target frame and the front of the target frame. One frame of image and the next frame of image, where the first unit image is an image area where the target object exists in the target frame, so that the previous frame image, the next frame image, and the The target frame after removing the image area where the target object exists and the first unit image are input to the convolutional neural network model, and the output image of the convolutional neural network model is obtained, and the output image is the target frame after filling . Wherein, the convolutional neural network model may specifically be the convolutional neural UNet structure of the Partial Convolutions layer described above.

在另一实施例中，所述图像拍摄和处理***在填充抠除所述目标物体后的图像区域时，还可先获取所述目标帧的前一帧图像和后一帧图像，从而可基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像进行填充，得到填充后的目标帧。具体地，所述图像拍摄和处理***可先获取所述前一帧图像中的第二单元图像，所述第二单元图像为所述前一帧图像中和所述目标帧中国存在目标物体的图像区域对应相同位置的图像；并获取所述后一帧图像中的第三单元图像，所述第三单元图像为所述后一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像，进一步地，可获取所述第二单元图像中所包含的各个像素点的第一数值，并获取所述第三单元图像所包含的各个像素点的第二数值，从而针对任一像素点，可计算所述第一数值和所述第二数值的平均值，从而可基于所述平均值对所述抠除存在目标物体的图像区域的目标帧进行像素填充，得到填充后的目标帧。In another embodiment, when the image capturing and processing system fills in the image area after the target object is cut out, it may also first obtain the previous frame image and the next frame image of the target frame, which can be based on The previous frame image and the next frame image are filled in the image after the target object is cut out to obtain a filled target frame. Specifically, the image capturing and processing system may first obtain a second unit image in the previous frame of image, where the second unit image is the presence of a target object in the previous frame of image and the target frame. The image area corresponds to the image at the same position; and the third unit image in the next frame image is acquired, where the third unit image corresponds to the image area in the next frame image and the target object in the target frame For images at the same position, further, the first value of each pixel contained in the second unit image can be obtained, and the second value of each pixel contained in the third unit image can be obtained, so as to target any For one pixel, the average value of the first value and the second value can be calculated, so that based on the average value, pixel filling can be performed on the target frame of the image area where the target object is removed, to obtain the filled Target frame.

需要说明的是，由于图像拍摄和处理***在进行像素填充时，参考了所述目标帧的前一帧图像以及所述后一帧图像中和所述目标物体对应相同位置的图像的像素，因此，可保证进行填充后的目标帧的内容和色彩在时序上的连贯性。It should be noted that, when the image capturing and processing system performs pixel filling, it refers to the pixels of the previous frame of the target frame and the image of the next frame of the image corresponding to the target object at the same position, so , Which can ensure the continuity of the content and color of the target frame after filling.

在本发明实施例中，图像拍摄和处理***先获取延时拍摄的图像帧序列，并在所述图像帧序列中，确定具有目标物体的目标帧，进一步地可确定所述目标帧中存在目标物体的图像区域，从而可基于所述图像区域，生成和所述图像区域对应的局部掩膜图像，并基于所述局部掩膜图形和所述目标帧，对所述目标帧中存在目标物体的图像区域进行扣除，并填充扣除所述目标物体后的图像区域，在去除所述目标帧中目标物体对应的图像的同时，可保证进行填充后的目标帧子的内容和色彩在时序上的连贯性，可有效提升对延时拍摄的播放效果。In the embodiment of the present invention, the image capturing and processing system first obtains a sequence of time-lapse captured image frames, and in the sequence of image frames, determines a target frame with a target object, and can further determine that there is a target in the target frame The image area of the object, so that a partial mask image corresponding to the image area can be generated based on the image area, and based on the partial mask pattern and the target frame, it is possible to determine whether the target object exists in the target frame. The image area is subtracted, and the image area after subtracting the target object is filled. While removing the image corresponding to the target object in the target frame, it can ensure that the content and color of the filled target frame are consistent in time sequence It can effectively improve the playback effect of time-lapse shooting.

本发明实施例提供了一种图像拍摄和处理***，图8是本发明实施例提供的图像拍摄和处理***的结构图，如图8所示，图像拍摄和处理***800包括拍摄装置801和一个或多个处理器802，可具体应用在如图1所示的图像处理场景中，其中，The embodiment of the present invention provides an image shooting and processing system. FIG. 8 is a structural diagram of the image shooting and processing system provided by an embodiment of the present invention. As shown in FIG. 8, the image shooting and processing system 800 includes a shooting device 801 and a Or multiple processors 802, which can be specifically applied in the image processing scene as shown in FIG. 1, where:

所述拍摄装置801，用于延时拍摄得到图像帧序列，并将所述图像帧序列发送给所述一个或多个处理器；The photographing device 801 is configured to obtain an image frame sequence by time-lapse photographing, and send the image frame sequence to the one or more processors;

所述一个或多个处理器802，被配置用于获取延时拍摄的图像帧序列，在所述图像帧序列中，确定具有目标物体的目标帧，抠除所述目标帧中存在目标物体的图像区域，并填充抠除所述目标物体后的图像区域。The one or more processors 802 are configured to obtain a sequence of time-lapse photographed image frames, and in the sequence of image frames, determine a target frame with a target object, and cut out the target object in the target frame. The image area is filled with the image area after the target object is cut out.

在一个实施例中，所述一个或多个处理器802在获取延时拍摄的图像帧序列时，具体用于：In an embodiment, the one or more processors 802 are specifically configured to: when acquiring the time-lapse captured image frame sequence:

获取拍摄的至少一帧初始图像；Acquiring at least one initial image taken;

将所述至少一帧初始图像基于时间序列进行排序，得到初始图像序列；Sort the at least one frame of initial image based on a time sequence to obtain an initial image sequence;

将所述初始图像序列进行压缩，得到图像帧序列。The initial image sequence is compressed to obtain an image frame sequence.

在一个实施例中，所述一个或多个处理器802在所述图像帧序列中，确定具有目标物体的目标帧时，具体用于：In one embodiment, when the one or more processors 802 determine a target frame with a target object in the image frame sequence, it is specifically configured to:

基于神经网络模型对所述图像帧序列进行处理；Processing the image frame sequence based on a neural network model;

根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧。The target frame is determined according to an output result of processing the image frame sequence by the neural network model.

在一个实施例中，所述一个或多个处理器802在基于神经网络模型对所述图像帧序列进行处理时，具体用于：In an embodiment, the one or more processors 802 are specifically configured to: when processing the image frame sequence based on the neural network model:

将所述图像帧序列输入所述神经网络模型，所述图像帧序列包括多张图像；Inputting the sequence of image frames into the neural network model, where the sequence of image frames includes a plurality of images;

针对任一图像，调用所述神经网络模型对所述图像进行特征提取，得到特提取结果；For any image, call the neural network model to perform feature extraction on the image to obtain a special extraction result;

基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别。Based on the feature extraction result, it is determined that each image in the image frame sequence includes the category of the object.

在一个实施例中，所述一个或多个处理器802在基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别时，具体用于：In an embodiment, the one or more processors 802 are specifically configured to: when determining that each image in the image frame sequence includes the category of the object based on the feature extraction result:

调用所述神经网络模型对所述特征提取结果进行特征汇总，得到特征汇总结果；Calling the neural network model to perform feature summarization on the feature extraction result to obtain a feature summarization result;

根据所述特征汇总结果，确定所述图像帧序列中各图像包括物体的类别。According to the feature summary result, it is determined that each image in the image frame sequence includes the category of the object.

在一个实施例中，所述一个或多个处理器802在根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧时，具体用于：In an embodiment, the one or more processors 802 are specifically configured to determine the target frame according to the output result of processing the image frame sequence on the neural network model:

根据所述神经网络模型对所述图像帧序列进行处理的输出结果，判断所述图像帧序列中各图像中是否包含目标物体；Judging whether each image in the image frame sequence contains a target object according to the output result of processing the image frame sequence by the neural network model;

将包含所述目标物体的图像帧确定为目标帧。The image frame containing the target object is determined as the target frame.

针对所述图像帧序列包括多帧图像的任一帧图像，对所述图像进行区域划分，得到多个区域图像；For any frame image of the image frame sequence including multiple frames of images, performing region division on the image to obtain multiple region images;

获取各个所述区域图像的特征参数，并基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧。Obtain feature parameters of each of the regional images, and based on the feature parameters, determine a target frame with a target object in the image frame sequence.

在一个实施例中，所述一个或多个处理器802在基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧时，具体用于：In one embodiment, the one or more processors 802 are specifically configured to: when determining a target frame with a target object in the image frame sequence based on the characteristic parameter:

基于所述特征参数，确定所述图像帧序列中的任一帧图像包括的物体类别；Determine the object category included in any frame image in the image frame sequence based on the characteristic parameter;

根据所述物体类别，确定所述图像帧序列中具有目标物体的目标帧。According to the object category, a target frame with a target object in the sequence of image frames is determined.

在一个实施例中，所述一个或多个处理器802在抠除所述目标帧中存在目标物体的图像区域时，具体用于：In an embodiment, when the one or more processors 802 cut out the image area where the target object exists in the target frame, it is specifically configured to:

确定所述目标帧中存在目标物体的图像区域；Determining an image area where a target object exists in the target frame;

基于所述图像区域，生成和所述图像区域对应的局部掩膜图形；Based on the image area, generating a partial mask pattern corresponding to the image area;

根据所述局部掩膜图形和所述目标帧，对所述目标帧中存在目标物体的图像区域进行抠除。According to the partial mask graphic and the target frame, cut out the image area where the target object exists in the target frame.

在一个实施例中，所述一个或多个处理器802在填充抠除所述目标物体后的图像区域时，具体用于：In an embodiment, when the one or more processors 802 fill in the image area after the target object is cut out, it is specifically configured to:

确定所述目标帧中存在目标物体的图像区域的周围图像域，所述周围图像域中的像素点和所述存在目标物体的图像区域的像素点之间的距离小于或等于预设距离阈值；Determining the surrounding image domain of the image area where the target object exists in the target frame, and the distance between the pixel points in the surrounding image domain and the pixel point of the image area where the target object exists is less than or equal to a preset distance threshold;

基于所述周围图像域对扣除所述目标物体后的图像区域进行填充。Filling the image area after subtracting the target object based on the surrounding image domain.

在一个实施例中，所述一个或多个处理器802在基于所述周围图像域对扣除所述目标物体后的图像区域进行填充时，具体用于：In an embodiment, the one or more processors 802 are specifically configured to: when filling the image area after the target object is subtracted based on the surrounding image domain:

从所述图像帧序列中确定出参考帧，所述参考帧为所述目标帧的前M帧中的任一帧，其中，M为大于1的整数；Determine a reference frame from the image frame sequence, where the reference frame is any one of the first M frames of the target frame, where M is an integer greater than 1;

确定所述参考帧的曝光强度；Determining the exposure intensity of the reference frame;

采用白平衡算法，基于所述参考帧的曝光强度以及所述周围图像域对扣除所述目标物体后的图像区域进行填充。A white balance algorithm is used to fill the image area after deducting the target object based on the exposure intensity of the reference frame and the surrounding image domain.

获取所述目标帧包括的第一单元图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, where the first unit image is an image area where a target object exists in the target frame;

将抠除存在目标物体的图像区域后的目标帧和所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the target frame after cutting out the image area where the target object exists and the first unit image into the convolutional neural network model, and obtain the output image of the convolutional neural network model, and the output image is the filled target frame.

获取所述目标帧包括的第一单元图像，以及所述目标帧的前一帧图像和后一帧图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, and a previous frame image and a next frame image of the target frame, the first unit image being an image area in the target frame where a target object exists;

将所述前一帧图像，所述后一帧图像，所述抠除存在目标物体的图像区域后的目标帧以及所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the previous frame image, the next frame image, the target frame after the image area where the target object exists, and the first unit image into the convolutional neural network model, and obtain the convolutional neural network The output image of the network model, where the output image is the target frame after filling.

获取所述目标帧的前一帧图像和后一帧图像；Acquiring the previous frame image and the next frame image of the target frame;

基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到所述填充后的目标帧。Based on the previous frame of image and the next frame of image, filling the image area after the target object is cut out to obtain the filled target frame.

在一个实施例中，所述一个或多个处理器802在基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到填充后的目标帧时，具体用于：In one embodiment, the one or more processors 802 fill in the image area after the target object is cut out based on the previous frame of image and the next frame of image, to obtain the filled When the target frame is specifically used for:

获取所述前一帧图像中的第二单元图像，所述第二单元图像为所述前一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a second unit image in the previous frame image, where the second unit image is an image corresponding to the same position in the previous frame image and the image area in the target frame where the target object exists;

获取所述后一帧图像中的第三单元图像，所述第三单元图像为所述后一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a third unit image in the next frame of image, where the third unit image is an image in the next frame of image that corresponds to the same position as the image area in the target frame where the target object exists;

获取所述第二单元图像所包含的各个像素点的第一数值，并获取所述第三单元图像所包含的各个像素点的第二数值；Acquiring a first value of each pixel included in the second unit image, and acquiring a second value of each pixel included in the third unit image;

针对任一像素点，计算所述第一数值和所述第二数值的平均值；For any pixel, calculating an average value of the first value and the second value;

基于所述平均值对所述抠除存在目标物体的图像区域的目标帧进行像素填充，得到所述填充后的目标帧。Based on the average value, pixel filling is performed on the target frame from which the image area where the target object exists is removed, to obtain the filled target frame.

在一个实施例中，所述目标物体是所述拍摄装置在延时拍摄过程中，所述目标帧包括的异常物体。In an embodiment, the target object is an abnormal object included in the target frame during the time-lapse photographing process of the photographing device.

在一个实施例中，所述一个或多个处理器802在确定具有目标物体的目标帧时，具体用于：In an embodiment, the one or more processors 802 are specifically configured to: when determining a target frame with a target object:

确定所述目标帧的相邻帧；Determining adjacent frames of the target frame;

将所述目标帧和所述相邻帧进行对比，从所述目标帧中确定出与所述相邻帧在同一位置对应不同像素值的像素点组成的目标图像，所述目标图像对应的物体为所述目标物体；The target frame is compared with the adjacent frame, and a target image composed of pixels corresponding to different pixel values at the same position of the adjacent frame is determined from the target frame, and the object corresponding to the target image Is the target object;

将所述图像帧序列中包括所述目标物体的帧作为目标帧。The frame including the target object in the sequence of image frames is used as a target frame.

在一个实施例中，所述目标物体为移动物体的全部或者局部。In one embodiment, the target object is all or part of a moving object.

本实施例提供的图像拍摄和处理***能够执行前述实施例提供的如图5和图6所示的图像处理方法，其执行方式和有益效果类似，在这里不再赘述。The image shooting and processing system provided in this embodiment can execute the image processing methods shown in FIG. 5 and FIG. 6 provided in the foregoing embodiment, and the execution mode and beneficial effects are similar, and details are not repeated here.

本发明实施例了提供一种载体，所述载体包括图像拍摄和处理装置，所述载体可具体应用在如图2所示的图像处理场景中，其中，所述图像拍摄和处理装置被配置用于：The embodiment of the present invention provides a carrier, the carrier includes an image shooting and processing device, the carrier can be specifically applied to the image processing scene shown in FIG. 2, wherein the image shooting and processing device is configured to in:

获取延时拍摄的图像帧序列；Obtain a sequence of time-lapse images;

在一个实施例中，所述图像拍摄和处理装置在获取延时拍摄的图像帧序列时，具体用于：In one embodiment, the image capturing and processing device is specifically used to:

在一个实施例中，所述图像拍摄和处理装置在所述图像帧序列中，确定具有目标物体的目标帧时，具体用于：In one embodiment, when the image capturing and processing device determines a target frame with a target object in the image frame sequence, it is specifically used to:

在一个实施例中，所述图像拍摄和处理装置在基于神经网络模型对所述图像帧序列进行处理时，具体用于：In an embodiment, when the image shooting and processing device processes the image frame sequence based on the neural network model, it is specifically configured to:

在一个实施例中，所述图像拍摄和处理装置在基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别时，具体用于：In one embodiment, when the image capturing and processing device determines that each image in the image frame sequence includes the category of the object based on the feature extraction result, it is specifically configured to:

在一个实施例中，所述图像拍摄和处理装置在根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧时，具体用于：In an embodiment, when the image capturing and processing device determines the target frame according to the output result of processing the image frame sequence on the neural network model, it is specifically configured to:

在一个实施例中，所述图像拍摄和处理装置在基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧时，具体用于：In one embodiment, when the image capturing and processing device determines the target frame with the target object in the image frame sequence based on the characteristic parameter, it is specifically configured to:

在一个实施例中，所述图像拍摄和处理装置在抠除所述目标帧中存在目标物体的图像区域时，具体用于：In an embodiment, when the image capturing and processing device cuts out the image area where the target object exists in the target frame, it is specifically configured to:

在一个实施例中，所述图像拍摄和处理装置在填充抠除所述目标物体后的图像区域时，具体用于：In an embodiment, when the image capturing and processing device fills the image area after the target object is cut out, it is specifically configured to:

在一个实施例中，所述图像拍摄和处理装置在基于所述周围图像域对扣除所述目标物体后的图像区域进行填充时，具体用于：In one embodiment, when the image capturing and processing device fills the image area after subtracting the target object based on the surrounding image domain, it is specifically configured to:

在一个实施例中，所述图像拍摄和处理装置在基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到填充后的目标帧时，具体用于：In one embodiment, the image capturing and processing device fills the image area after the target object is cut out based on the previous frame image and the next frame image, to obtain the filled target Frame, specifically used for:

在一个实施例中，所述图像拍摄和处理装置在确定具有目标物体的目标帧时，具体用于：In an embodiment, the image capturing and processing device is specifically configured to:

本实施例提供的载体能够执行前述实施例提供的如图5和图6所示的图像处理方法，其执行方式和有益效果类似，在这里不再赘述。The carrier provided in this embodiment can execute the image processing methods as shown in FIG. 5 and FIG. 6 provided in the foregoing embodiment, and the execution mode and beneficial effects are similar, and details are not repeated here.

在一个实施例中，将对本说明书中所述的局部掩膜(Mask)和部分卷积(Partial Convolution)进行说明。在图像处理过程中，常常使用全卷积神经网络(Fully Convolutional Network)。然而，全卷积神经网络需要对整个输入图像进行遍历性卷积，资源耗费大，且在一定程度上降低处理的速度。而局部掩膜则只对兴趣区域进行卷积，对兴趣区中逐个像素的语义进行识别，同时对局部掩膜的边界框进行回归处理，以获得局部掩膜边界框周围的像素特征。In one embodiment, the partial mask (Mask) and partial convolution (Partial Convolution) described in this specification will be described. In the image processing process, a Fully Convolutional Network (Fully Convolutional Network) is often used. However, a fully convolutional neural network needs to perform ergodic convolution on the entire input image, which consumes a lot of resources and reduces the processing speed to a certain extent. The local mask only convolves the region of interest to identify the semantics of each pixel in the region of interest, and at the same time, performs regression processing on the bounding box of the local mask to obtain the pixel features around the bounding box of the local mask.

如图9a所示，输入图像的框选区域为兴趣区(RoI，Region of interest)，在CNN的卷积层，只对RoI区域进行部分卷积，通过分类器(Class Box)通过对每个像素的语义分析输出RoI区域的语义分类。对于损失函数，对于RoI区域中的每一个采样，可以定义一个多任务损失函数：As shown in Figure 9a, the framed area of the input image is the region of interest (RoI, Region of interest). In the convolutional layer of CNN, only the RoI region is partially convolved, and the classifier (ClassBox) passes each The semantic analysis of pixels outputs the semantic classification of the RoI area. For the loss function, for each sample in the RoI area, a multi-task loss function can be defined:

L＝Lcls+Lbox+LmaskL=Lcls+Lbox+Lmask

其中Lcls和Lbox可以通过一般快速R-CNN的损失函数进行定义；对于每一个RoI掩膜都具有一个一个Km ²维度的分支，K为二值化编码，m*m为分辨率，对于每一个K都会具有一个分类。对于此，我们对每个像素提供sigmoid()函数，定义Lmask为平均二值化交叉熵损失。 Among them, Lcls and Lbox can be defined by the loss function of general fast R-CNN; for each RoI mask, there is a branch of Km ² dimensions, K is the binary coding, m*m is the resolution, for each K will have a category. For this, we provide a sigmoid() function for each pixel, and define Lmask as the average binary cross-entropy loss.

如图9b所示，示例性地给出了部分卷积(Partial Convolution)所使用的U-net神经网络模型。网络中包含对输入图像进行多次下卷积和上卷积的过程。As shown in Fig. 9b, the U-net neural network model used in Partial Convolution is exemplarily given. The network includes the process of performing multiple down-convolution and up-convolution on the input image.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions recorded in the foregoing embodiments can still be modified, or some or all of the technical features can be equivalently replaced; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention. range.

Claims

一种图像处理方法，其特征在于，包括：An image processing method, characterized by comprising:

获取延时拍摄的图像帧序列；Obtain a sequence of time-lapse images;

在所述图像帧序列中，确定具有目标物体的目标帧；In the sequence of image frames, determining a target frame with a target object;

抠除所述目标帧中存在目标物体的图像区域；Cut out the image area where the target object exists in the target frame;

填充抠除所述目标物体后的图像区域。Fill the image area after removing the target object.
根据权利要求1所述的方法，其特征在于，所述获取延时拍摄的图像帧序列，包括：The method according to claim 1, wherein said acquiring a sequence of time-lapse photographed image frames comprises:

获取拍摄的至少一帧初始图像；Acquiring at least one initial image taken;

将所述至少一帧初始图像基于时间序列进行排序，得到初始图像序列；Sort the at least one frame of initial image based on a time sequence to obtain an initial image sequence;

将所述初始图像序列进行压缩，得到图像帧序列。The initial image sequence is compressed to obtain an image frame sequence.
根据权利要求1所述的方法，其特征在于，所述在所述图像帧序列中，确定具有目标物体的目标帧，包括：The method according to claim 1, wherein, in the sequence of image frames, determining a target frame with a target object comprises:

基于神经网络模型对所述图像帧序列进行处理；Processing the image frame sequence based on a neural network model;

根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧。The target frame is determined according to an output result of processing the image frame sequence by the neural network model.
根据权利要求3所述的方法，其特征在于，所述基于神经网络模型对所述图像帧序列进行处理，包括：The method according to claim 3, wherein the processing the sequence of image frames based on a neural network model comprises:

将所述图像帧序列输入所述神经网络模型，所述图像帧序列包括多张图像；Inputting the sequence of image frames into the neural network model, where the sequence of image frames includes a plurality of images;

针对任一图像，调用所述神经网络模型对所述图像进行特征提取，得到特提取结果；For any image, call the neural network model to perform feature extraction on the image to obtain a special extraction result;

基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别。Based on the feature extraction result, it is determined that each image in the image frame sequence includes the category of the object.
根据权利要求4所述的方法，其特征在于，所述基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别，包括：The method according to claim 4, wherein the determining that each image in the image frame sequence includes a category of an object based on the feature extraction result comprises:

调用所述神经网络模型对所述特征提取结果进行特征汇总，得到特征汇总结果；Calling the neural network model to perform feature summarization on the feature extraction result to obtain a feature summarization result;

根据所述特征汇总结果，确定所述图像帧序列中各图像包括物体的类别。According to the feature summary result, it is determined that each image in the image frame sequence includes the category of the object.
根据权利要求3-5任一项所述的方法，其特征在于，所述根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧，包括：The method according to any one of claims 3-5, wherein the determining the target frame according to an output result of processing the image frame sequence according to the neural network model comprises:

根据所述神经网络模型对所述图像帧序列进行处理的输出结果，判断所述图像帧序列中各图像中是否包含目标物体；Judging whether each image in the image frame sequence contains a target object according to the output result of processing the image frame sequence by the neural network model;

将包含所述目标物体的图像帧确定为目标帧。The image frame containing the target object is determined as the target frame.
根据权利要求1所述的方法，其特征在于，所述在所述图像帧序列中，确定具有目标物体的目标帧，包括：The method according to claim 1, wherein, in the sequence of image frames, determining a target frame with a target object comprises:

针对所述图像帧序列包括多帧图像的任一帧图像，对所述图像进行区域划分，得到多个区域图像；For any frame image of the image frame sequence including multiple frames of images, performing region division on the image to obtain multiple region images;

获取各个所述区域图像的特征参数，并基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧。Obtain feature parameters of each of the regional images, and based on the feature parameters, determine a target frame with a target object in the image frame sequence.
根据权利要求7所述的方法，其特征在于，所述基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧，包括：The method according to claim 7, wherein the determining a target frame with a target object in the image frame sequence based on the characteristic parameter comprises:

基于所述特征参数，确定所述图像帧序列中的任一帧图像包括的物体类别；Determine the object category included in any frame image in the image frame sequence based on the characteristic parameter;

根据所述物体类别，确定所述图像帧序列中具有目标物体的目标帧。According to the object category, a target frame with a target object in the sequence of image frames is determined.
根据权利要求1所述的方法，其特征在于，所述抠除所述目标帧中存在目标物体的图像区域，包括：The method according to claim 1, wherein the cutting out an image area where a target object exists in the target frame comprises:

确定所述目标帧中存在目标物体的图像区域；Determining an image area where a target object exists in the target frame;

基于所述图像区域，生成和所述图像区域对应的局部掩膜图形；Based on the image area, generating a partial mask pattern corresponding to the image area;

根据所述局部掩膜图形和所述目标帧，对所述目标帧中存在目标物体的图像区域进行抠除。According to the partial mask graphic and the target frame, cut out the image area where the target object exists in the target frame.
根据权利要求9所述的方法，其特征在于，所述填充抠除所述目标物体后的图像区域，包括：The method according to claim 9, wherein the filling the image area after the target object is cut out comprises:

确定所述目标帧中存在目标物体的图像区域的周围图像域，所述周围图像域中的像素点和所述存在目标物体的图像区域的像素点之间的距离小于或等于预设距离阈值；Determining the surrounding image domain of the image area where the target object exists in the target frame, and the distance between the pixel points in the surrounding image domain and the pixel point of the image area where the target object exists is less than or equal to a preset distance threshold;

基于所述周围图像域对扣除所述目标物体后的图像区域进行填充。Filling the image area after subtracting the target object based on the surrounding image domain.
根据权利要求10所述的方法，其特征在于，所述基于所述周围图像域对扣除所述目标物体后的图像区域进行填充，包括：The method according to claim 10, wherein the filling the image area after subtracting the target object based on the surrounding image domain comprises:

从所述图像帧序列中确定出参考帧，所述参考帧为所述目标帧的前M帧中的任一帧，其中，M为大于1的整数；Determine a reference frame from the image frame sequence, where the reference frame is any one of the first M frames of the target frame, where M is an integer greater than 1;

确定所述参考帧的曝光强度；Determining the exposure intensity of the reference frame;

采用白平衡算法，基于所述参考帧的曝光强度以及所述周围图像域对扣除所述目标物体后的图像区域进行填充。A white balance algorithm is used to fill the image area after deducting the target object based on the exposure intensity of the reference frame and the surrounding image domain.
根据权利要求1所述的方法，其特征在于，所述填充抠除所述目标物体后的图像区域，包括：The method according to claim 1, wherein said filling the image area after removing the target object comprises:

获取所述目标帧包括的第一单元图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, where the first unit image is an image area where a target object exists in the target frame;

将抠除存在目标物体的图像区域后的目标帧和所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the target frame after cutting out the image area where the target object exists and the first unit image into the convolutional neural network model, and obtain the output image of the convolutional neural network model, and the output image is the filled target frame.
根据权利要求1所述的方法，其特征在于，所述填充抠除所述目标物体后的图像区域，包括：The method according to claim 1, wherein said filling the image area after removing the target object comprises:

获取所述目标帧包括的第一单元图像，以及所述目标帧的前一帧图像和后一帧图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, and a previous frame image and a next frame image of the target frame, the first unit image being an image area in the target frame where a target object exists;

将所述前一帧图像，所述后一帧图像，所述抠除存在目标物体的图像区域后的目标帧以及所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the previous frame image, the next frame image, the target frame after the image area where the target object exists, and the first unit image into the convolutional neural network model, and obtain the convolutional neural network The output image of the network model, where the output image is the target frame after filling.
根据权利要求1所述的方法，其特征在于，所述填充抠除所述目标物体后的图像区域，包括：The method according to claim 1, wherein the filling and cutting out the image area of the target object comprises:

获取所述目标帧的前一帧图像和后一帧图像；Acquiring the previous frame image and the next frame image of the target frame;

基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到所述填充后的目标帧。Based on the previous frame of image and the next frame of image, filling the image area after the target object is cut out to obtain the filled target frame.
根据权利要求14所述的方法，其特征在于，所述基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到填充后的目标帧，包括：The method according to claim 14, wherein the image area after the cutout of the target object is filled based on the previous frame image and the next frame image, to obtain a filled Target frame, including:

获取所述前一帧图像中的第二单元图像，所述第二单元图像为所述前一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a second unit image in the previous frame image, where the second unit image is an image corresponding to the same position in the previous frame image and the image area in the target frame where the target object exists;

获取所述后一帧图像中的第三单元图像，所述第三单元图像为所述后一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a third unit image in the next frame of image, where the third unit image is an image in the next frame of image that corresponds to the same position as the image area in the target frame where the target object exists;

获取所述第二单元图像所包含的各个像素点的第一数值，并获取所述第三单元图像所包含的各个像素点的第二数值；Acquiring a first value of each pixel included in the second unit image, and acquiring a second value of each pixel included in the third unit image;

针对任一像素点，计算所述第一数值和所述第二数值的平均值；For any pixel, calculating an average value of the first value and the second value;

基于所述平均值对所述抠除存在目标物体的图像区域的目标帧进行像素填充，得到所述填充后的目标帧。Based on the average value, pixel filling is performed on the target frame from which the image area where the target object exists is removed, to obtain the filled target frame.
根据权利要求1所述的方法，其特征在于，所述目标物体是在延时拍摄过程中，所述目标帧包括的异常物体。The method according to claim 1, wherein the target object is an abnormal object included in the target frame during a time-lapse shooting process.
根据权利要求16所述的方法，其特征在于，所述确定具有目标物体的目标帧，包括：The method according to claim 16, wherein the determining a target frame with a target object comprises:

确定所述目标帧的相邻帧；Determining adjacent frames of the target frame;

将所述目标帧和所述相邻帧进行对比，从所述目标帧中确定出与所述相邻帧在同一位置对应不同像素值的像素点组成的目标图像，所述目标图像对应的物体为所述目标物体；The target frame is compared with the adjacent frame, and a target image composed of pixels corresponding to different pixel values at the same position of the adjacent frame is determined from the target frame, and the object corresponding to the target image Is the target object;

将所述图像帧序列中包括所述目标物体的帧作为目标帧。The frame including the target object in the sequence of image frames is used as a target frame.
根据权利要求16所述的方法，其特征在于，所述目标物体为移动物体的全部或者局部。The method according to claim 16, wherein the target object is all or part of a moving object.
一种图像处理装置，其特征在于，包括存储器、处理器；An image processing device, characterized by comprising a memory and a processor;

所述存储器用于存储程序代码；The memory is used to store program codes;

所述处理器，调用所述程序代码，当程序代码被执行时，用于执行以下操作：The processor calls the program code, and when the program code is executed, is used to perform the following operations:

获取延时拍摄的图像帧序列；Obtain a sequence of time-lapse images;

在所述图像帧序列中，确定具有目标物体的目标帧；In the sequence of image frames, determining a target frame with a target object;

抠除所述目标帧中存在目标物体的图像区域；Cut out the image area where the target object exists in the target frame;

填充抠除所述目标物体后的图像区域。Fill the image area after removing the target object.
根据权利要求19所述的装置，其特征在于，所述处理器获取延时拍摄的图像帧序列时，执行如下操作：The device according to claim 19, wherein the processor performs the following operations when acquiring a sequence of time-lapsed image frames:

获取拍摄的至少一帧初始图像；Acquiring at least one initial image taken;

将所述至少一帧初始图像基于时间序列进行排序，得到初始图像序列；Sort the at least one frame of initial image based on a time sequence to obtain an initial image sequence;

将所述初始图像序列进行压缩，得到图像帧序列。The initial image sequence is compressed to obtain an image frame sequence.
根据权利要求19所述的装置，其特征在于，所述处理器在所述图像帧序列中，确定具有目标物体的目标帧时，执行如下操作：The apparatus according to claim 19, wherein the processor performs the following operations when determining a target frame with a target object in the image frame sequence:

基于神经网络模型对所述图像帧序列进行处理；Processing the image frame sequence based on a neural network model;

根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧。The target frame is determined according to an output result of processing the image frame sequence by the neural network model.
根据权利要求21所述的装置，其特征在于，所述处理器基于神经网络模型对所述图像帧序列进行处理时，执行如下操作：The device according to claim 21, wherein the processor performs the following operations when processing the sequence of image frames based on a neural network model:

将所述图像帧序列输入所述神经网络模型，所述图像帧序列包括多张图像；Inputting the sequence of image frames into the neural network model, where the sequence of image frames includes a plurality of images;

针对任一图像，调用所述神经网络模型对所述图像进行特征提取，得到特提取结果；For any image, call the neural network model to perform feature extraction on the image to obtain a special extraction result;

基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别。Based on the feature extraction result, it is determined that each image in the image frame sequence includes the category of the object.
根据权利要求22所述的装置，其特征在于，所述处理器基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别时，执行如下操作：The apparatus according to claim 22, wherein the processor performs the following operations when determining that each image in the sequence of image frames includes the category of the object based on the feature extraction result:

调用所述神经网络模型对所述特征提取结果进行特征汇总，得到特征汇总结果；Calling the neural network model to perform feature summarization on the feature extraction result to obtain a feature summarization result;

根据所述特征汇总结果，确定所述图像帧序列中各图像包括物体的类别。According to the feature summary result, it is determined that each image in the image frame sequence includes the category of the object.
根据权利要求21-23任一项所述的装置，其特征在于，所述处理器根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧时，执行如下操作：The device according to any one of claims 21-23, wherein when the processor determines the target frame according to an output result of processing the image frame sequence by the neural network model, the following operations are performed:

根据所述神经网络模型对所述图像帧序列进行处理的输出结果，判断所述图像帧序列中各图像中是否包含目标物体；Judging whether each image in the image frame sequence contains a target object according to the output result of processing the image frame sequence by the neural network model;

将包含所述目标物体的图像帧确定为目标帧。The image frame containing the target object is determined as the target frame.
根据权利要求19所述的装置，其特征在于，所述处理器在所述图像帧序列中，确定具有目标物体的目标帧时，执行如下操作：The apparatus according to claim 19, wherein the processor performs the following operations when determining a target frame with a target object in the image frame sequence:

针对所述图像帧序列包括多帧图像的任一帧图像，对所述图像进行区域划分，得到多个区域图像；For any frame image of the image frame sequence including multiple frames of images, performing region division on the image to obtain multiple region images;

获取各个所述区域图像的特征参数，并基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧。Obtain feature parameters of each of the regional images, and based on the feature parameters, determine a target frame with a target object in the image frame sequence.
根据权利要求25所述的装置，其特征在于，所述处理器基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧时，执行如下操作：The apparatus according to claim 25, wherein the processor performs the following operations when determining a target frame with a target object in the image frame sequence based on the characteristic parameter:

基于所述特征参数，确定所述图像帧序列中的任一帧图像包括的物体类别；Determine the object category included in any frame image in the image frame sequence based on the characteristic parameter;

根据所述物体类别，确定所述图像帧序列中具有目标物体的目标帧。According to the object category, a target frame with a target object in the sequence of image frames is determined.
根据权利要求19所述的装置，其特征在于，所述处理器抠除所述目标帧中存在目标物体的图像区域时，执行如下操作：The device according to claim 19, wherein the processor performs the following operations when cutting out the image area where the target object exists in the target frame:

确定所述目标帧中存在目标物体的图像区域；Determining an image area where a target object exists in the target frame;

基于所述图像区域，生成和所述图像区域对应的局部掩膜图形；Based on the image area, generating a partial mask pattern corresponding to the image area;

根据所述局部掩膜图形和所述目标帧，对所述目标帧中存在目标物体的图像区域进行抠除。According to the partial mask graphic and the target frame, cut out the image area where the target object exists in the target frame.
根据权利要求27所述的装置，其特征在于，所述处理器填充抠除所述目标物体后的图像区域时，执行如下操作：28. The device of claim 27, wherein the processor performs the following operations when filling the image area after the target object is removed:

确定所述目标帧中存在目标物体的图像区域的周围图像域，所述周围图像域中的像素点和所述存在目标物体的图像区域的像素点之间的距离小于或等于预设距离阈值；Determining the surrounding image domain of the image area where the target object exists in the target frame, and the distance between the pixel points in the surrounding image domain and the pixel point of the image area where the target object exists is less than or equal to a preset distance threshold;

基于所述周围图像域对扣除所述目标物体后的图像区域进行填充。Filling the image area after subtracting the target object based on the surrounding image domain.
根据权利要求28所述的装置，其特征在于，所述处理器基于所述周围图像域对扣除所述目标物体后的图像区域进行填充时，执行如下操作：The device according to claim 28, wherein the processor performs the following operations when filling the image area after the target object is subtracted based on the surrounding image domain:

从所述图像帧序列中确定出参考帧，所述参考帧为所述目标帧的前M帧中的任一帧，其中，M为大于1的整数；Determine a reference frame from the image frame sequence, where the reference frame is any one of the first M frames of the target frame, where M is an integer greater than 1;

确定所述参考帧的曝光强度；Determining the exposure intensity of the reference frame;

采用白平衡算法，基于所述参考帧的曝光强度以及所述周围图像域对扣除所述目标物体后的图像区域进行填充。A white balance algorithm is used to fill the image area after deducting the target object based on the exposure intensity of the reference frame and the surrounding image domain.
根据权利要求19所述的装置，其特征在于，所述处理器填充抠除所述目标物体后的图像区域时，执行如下操作：21. The device of claim 19, wherein the processor performs the following operations when filling the image area after the target object is removed:

获取所述目标帧包括的第一单元图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, where the first unit image is an image area where a target object exists in the target frame;

将抠除存在目标物体的图像区域后的目标帧和所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the target frame after cutting out the image area where the target object exists and the first unit image into the convolutional neural network model, and obtain the output image of the convolutional neural network model, and the output image is the filled target frame.
根据权利要求19所述的装置，其特征在于，所述处理器填充抠除所述目标物体后的图像区域时，执行如下操作：21. The device of claim 19, wherein the processor performs the following operations when filling the image area after the target object is removed:

获取所述目标帧包括的第一单元图像，以及所述目标帧的前一帧图像和后一帧图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, and a previous frame image and a next frame image of the target frame, the first unit image being an image area in the target frame where a target object exists;

将所述前一帧图像，所述后一帧图像，所述抠除存在目标物体的图像区域后的目标帧以及所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the previous frame image, the next frame image, the target frame after the image area where the target object exists, and the first unit image into the convolutional neural network model, and obtain the convolutional neural network The output image of the network model, where the output image is the target frame after filling.
根据权利要求19所述的装置，其特征在于，所述处理器填充抠除所述目标物体后的图像区域时，执行如下操作：21. The device of claim 19, wherein the processor performs the following operations when filling the image area after the target object is removed:

获取所述目标帧的前一帧图像和后一帧图像；Acquiring the previous frame image and the next frame image of the target frame;

基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到所述填充后的目标帧。Based on the previous frame of image and the next frame of image, filling the image area after the target object is cut out to obtain the filled target frame.
根据权利要求32所述的装置，其特征在于，所述处理器基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到填充后的目标帧时，执行如下操作：The device according to claim 32, wherein the processor fills the image area after the target object is cut out based on the previous frame image and the next frame image to obtain a filling After the target frame, perform the following operations:

获取所述前一帧图像中的第二单元图像，所述第二单元图像为所述前一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a second unit image in the previous frame image, where the second unit image is an image corresponding to the same position in the previous frame image and the image area in the target frame where the target object exists;

获取所述后一帧图像中与第三单元图像，所述第三单元图像为所述后一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a third unit image in the next frame of image, where the third unit image is an image corresponding to the same position in the next frame of image and an image area in the target frame where the target object exists;

获取所述第二单元图像所包含的各个像素点的第一数值，并获取所述第三单元图像所包含的各个像素点的第二数值；Acquiring a first value of each pixel included in the second unit image, and acquiring a second value of each pixel included in the third unit image;

针对任一像素点，计算所述第一数值和所述第二数值的平均值；For any pixel, calculating an average value of the first value and the second value;

基于所述平均值对所述抠除存在目标物体的图像区域的目标帧进行像素填充，得到所述填充后的目标帧。Based on the average value, pixel filling is performed on the target frame from which the image area where the target object exists is removed, to obtain the filled target frame.
根据权利要求1所述的装置，其特征在于，所述目标物体是在延时拍摄过程中，所述目标帧包括的异常物体。The apparatus according to claim 1, wherein the target object is an abnormal object included in the target frame during a time-lapse shooting process.
根据权利要求34所述的装置，其特征在于，所述处理器确定具有目标物体的目标帧时，执行如下操作：The apparatus according to claim 34, wherein the processor performs the following operations when determining the target frame with the target object:

确定所述目标帧的相邻帧；Determining adjacent frames of the target frame;

将所述目标帧和所述相邻帧进行对比，从所述目标帧中确定出与所述相邻帧在同一位置对应不同像素值的像素点组成的目标图像，所述目标图像对应的物体为所述目标物体；The target frame is compared with the adjacent frame, and a target image composed of pixels corresponding to different pixel values at the same position of the adjacent frame is determined from the target frame, and the object corresponding to the target image Is the target object;

将所述图像帧序列中包括所述目标物体的帧作为目标帧。The frame including the target object in the sequence of image frames is used as a target frame.
根据权利要求35所述的装置，其特征在于，所述目标物体为移动物体的全部或者局部。The device according to claim 35, wherein the target object is all or part of a moving object.
一种图像拍摄和处理***，其特征在于，包括拍摄装置和一个或多个处理器，其中：An image shooting and processing system, which is characterized by comprising a shooting device and one or more processors, wherein:

所述拍摄装置，用于延时拍摄得到图像帧序列，并将所述图像帧序列发送给所述一个或多个处理器；The photographing device is configured to obtain an image frame sequence by time-lapse photographing, and send the image frame sequence to the one or more processors;

所述一个或多个处理器，被配置用于获取延时拍摄的图像帧序列，在所述图像帧序列中，确定具有目标物体的目标帧，抠除所述目标帧中存在目标物体的图像区域，并填充抠除所述目标物体后的图像区域。The one or more processors are configured to obtain a sequence of time-lapse captured image frames, in the sequence of image frames, a target frame with a target object is determined, and an image with a target object in the target frame is cut out Area, and fill the image area after cutting out the target object.
根据权利要求37所述的图像拍摄和处理***，其特征在于，所述拍摄装置在获取延时拍摄的图像帧序列时，具体用于：The image shooting and processing system according to claim 37, wherein the shooting device is specifically configured to:

获取拍摄的至少一帧初始图像；Acquiring at least one initial image taken;

将所述至少一帧初始图像基于时间序列进行排序，得到初始图像序列；Sort the at least one frame of initial image based on a time sequence to obtain an initial image sequence;

将所述初始图像序列进行压缩，得到图像帧序列。The initial image sequence is compressed to obtain an image frame sequence.
根据权利要求37所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在所述图像帧序列中，确定具有目标物体的目标帧时，具体用于：The image capturing and processing system according to claim 37, wherein the one or more processors are specifically configured to: when determining a target frame with a target object in the image frame sequence:

基于神经网络模型对所述图像帧序列进行处理；Processing the image frame sequence based on a neural network model;

根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧。The target frame is determined according to an output result of processing the image frame sequence by the neural network model.
根据权利要求39所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在基于神经网络模型对所述图像帧序列进行处理时，具体用于：The image capturing and processing system according to claim 39, wherein the one or more processors are specifically configured to: when processing the sequence of image frames based on a neural network model:

将所述图像帧序列输入所述神经网络模型，所述图像帧序列包括多张图像；Inputting the sequence of image frames into the neural network model, where the sequence of image frames includes a plurality of images;

针对任一图像，调用所述神经网络模型对所述图像进行特征提取，得到特提取结果；For any image, call the neural network model to perform feature extraction on the image to obtain a special extraction result;

基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别。Based on the feature extraction result, it is determined that each image in the image frame sequence includes the category of the object.
根据权利要求40所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别时，具体用于：The image shooting and processing system according to claim 40, wherein the one or more processors are specifically used when determining that each image in the image frame sequence includes the category of the object based on the feature extraction result. in:

调用所述神经网络模型对所述特征提取结果进行特征汇总，得到特征汇总结果；Calling the neural network model to perform feature summarization on the feature extraction result to obtain a feature summarization result;

根据所述特征汇总结果，确定所述图像帧序列中各图像包括物体的类别。According to the feature summary result, it is determined that each image in the image frame sequence includes the category of the object.
根据权利要求39-41任一项所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧时，具体用于：The image capturing and processing system according to any one of claims 39-41, wherein the one or more processors determine the output result of processing the sequence of image frames according to the neural network model. When describing the target frame, it is specifically used for:

根据所述神经网络模型对所述图像帧序列进行处理的输出结果，判断所述图像帧序列中各图像中是否包含目标物体；Judging whether each image in the image frame sequence contains a target object according to the output result of processing the image frame sequence by the neural network model;

将包含所述目标物体的图像帧确定为目标帧。The image frame containing the target object is determined as the target frame.
根据权利要求37所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在所述图像帧序列中，确定具有目标物体的目标帧时，具体用于：The image capturing and processing system according to claim 37, wherein the one or more processors are specifically configured to: when determining a target frame with a target object in the image frame sequence:

针对所述图像帧序列包括多帧图像的任一帧图像，对所述图像进行区域划分，得到多个区域图像；For any frame image of the image frame sequence including multiple frames of images, performing region division on the image to obtain multiple region images;

获取各个所述区域图像的特征参数，并基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧。Obtain feature parameters of each of the regional images, and based on the feature parameters, determine a target frame with a target object in the image frame sequence.
根据权利要求43所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧时，具体用于：The image shooting and processing system according to claim 43, wherein the one or more processors are specifically configured to determine a target frame with a target object in the image frame sequence based on the characteristic parameter :

基于所述特征参数，确定所述图像帧序列中的任一帧图像包括的物体类别；Determine the object category included in any frame image in the image frame sequence based on the characteristic parameter;

根据所述物体类别，确定所述图像帧序列中具有目标物体的目标帧。According to the object category, a target frame with a target object in the sequence of image frames is determined.
根据权利要求37所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在抠除所述目标帧中存在目标物体的图像区域时，具体用于：The image capturing and processing system according to claim 37, wherein the one or more processors are specifically configured to: when cutting out an image area in the target frame where the target object exists in the target frame:

确定所述目标帧中存在目标物体的图像区域；Determining an image area where a target object exists in the target frame;

基于所述图像区域，生成和所述图像区域对应的局部掩膜图形；Based on the image area, generating a partial mask pattern corresponding to the image area;

根据所述局部掩膜图形和所述目标帧，对所述目标帧中存在目标物体的图像区域进行抠除。According to the partial mask graphic and the target frame, cut out the image area where the target object exists in the target frame.
根据权利要求45所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在填充抠除所述目标物体后的图像区域时，具体用于：The image shooting and processing system according to claim 45, wherein the one or more processors are specifically configured to: when filling the image area after the target object is cut out:

确定所述目标帧中存在目标物体的图像区域的周围图像域，所述周围图像域中的像素点和所述存在目标物体的图像区域的像素点之间的距离小于或等于预设距离阈值；Determining the surrounding image domain of the image area where the target object exists in the target frame, and the distance between the pixel points in the surrounding image domain and the pixel point of the image area where the target object exists is less than or equal to a preset distance threshold;

基于所述周围图像域对扣除所述目标物体后的图像区域进行填充。Filling the image area after subtracting the target object based on the surrounding image domain.
根据权利要求46所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在基于所述周围图像域对扣除所述目标物体后的图像区域进行填充时，具体用于：The image shooting and processing system according to claim 46, wherein the one or more processors are specifically configured to:

从所述图像帧序列中确定出参考帧，所述参考帧为所述目标帧的前M帧中的任一帧，其中，M为大于1的整数；Determine a reference frame from the image frame sequence, where the reference frame is any one of the first M frames of the target frame, where M is an integer greater than 1;

确定所述参考帧的曝光强度；Determining the exposure intensity of the reference frame;

采用白平衡算法，基于所述参考帧的曝光强度以及所述周围图像域对扣除所述目标物体后的图像区域进行填充。A white balance algorithm is used to fill the image area after deducting the target object based on the exposure intensity of the reference frame and the surrounding image domain.
根据权利要求37所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在填充抠除所述目标物体后的图像区域时，具体用于：The image capturing and processing system according to claim 37, wherein the one or more processors are specifically configured to: when filling the image area after the target object is cut out:

获取所述目标帧包括的第一单元图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, where the first unit image is an image area where a target object exists in the target frame;

将抠除存在目标物体的图像区域后的目标帧和所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the target frame after cutting out the image area where the target object exists and the first unit image into the convolutional neural network model, and obtain the output image of the convolutional neural network model, and the output image is the filled target frame.
根据权利要求37所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在填充抠除所述目标物体后的图像区域时，具体用于：The image capturing and processing system according to claim 37, wherein the one or more processors are specifically configured to: when filling the image area after the target object is cut out:

获取所述目标帧包括的第一单元图像，以及所述目标帧的前一帧图像和后一帧图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, and a previous frame image and a next frame image of the target frame, the first unit image being an image area in the target frame where a target object exists;

将所述前一帧图像，所述后一帧图像，所述抠除存在目标物体的图像区域后的目标帧以及所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the previous frame image, the next frame image, the target frame after the image area where the target object exists, and the first unit image into the convolutional neural network model, and obtain the convolutional neural network The output image of the network model, where the output image is the target frame after filling.
根据权利要求37所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在填充抠除所述目标物体后的图像区域时，具体用于：The image capturing and processing system according to claim 37, wherein the one or more processors are specifically configured to: when filling the image area after the target object is cut out:

获取所述目标帧的前一帧图像和后一帧图像；Acquiring the previous frame image and the next frame image of the target frame;

基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到所述填充后的目标帧。Based on the previous frame of image and the next frame of image, filling the image area after the target object is cut out to obtain the filled target frame.
根据权利要求50所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到填充后的目标帧时，具体用于：The image capturing and processing system according to claim 50, wherein the one or more processors perform the removal of the target object based on the previous frame image and the next frame image. After filling the image area, when the filled target frame is obtained, it is specifically used for:

获取所述前一帧图像中的第二单元图像，所述第二单元图像为所述前一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a second unit image in the previous frame image, where the second unit image is an image corresponding to the same position in the previous frame image and the image area in the target frame where the target object exists;

获取所述后一帧图像中的第三单元图像，所述第三单元图像为所述后一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a third unit image in the next frame of image, where the third unit image is an image in the next frame of image that corresponds to the same position as the image area in the target frame where the target object exists;

获取所述第二单元图像所包含的各个像素点的第一数值，并获取所述第三单元图像所包含的各个像素点的第二数值；Acquiring a first value of each pixel included in the second unit image, and acquiring a second value of each pixel included in the third unit image;

针对任一像素点，计算所述第一数值和所述第二数值的平均值；For any pixel, calculating an average value of the first value and the second value;

基于所述平均值对所述抠除存在目标物体的图像区域的目标帧进行像素填充，得到所述填充后的目标帧。Based on the average value, pixel filling is performed on the target frame from which the image area where the target object exists is removed, to obtain the filled target frame.
根据权利要求37所述的图像拍摄和处理***，其特征在于，所述目标物体是所述拍摄装置在延时拍摄过程中，所述目标帧包括的异常物体。The image shooting and processing system according to claim 37, wherein the target object is an abnormal object included in the target frame during the time-lapse shooting process of the shooting device.
根据权利要求52所述的图像拍摄和处理***，其特征在于，所述一个或多个处理器在确定具有目标物体的目标帧时，具体用于：The image capturing and processing system according to claim 52, wherein the one or more processors are specifically configured to: when determining a target frame with a target object:

确定所述目标帧的相邻帧；Determining adjacent frames of the target frame;

将所述目标帧和所述相邻帧进行对比，从所述目标帧中确定出与所述相邻帧在同一位置对应不同像素值的像素点组成的目标图像，所述目标图像对应的物体为所述目标物体；The target frame is compared with the adjacent frame, and a target image composed of pixels corresponding to different pixel values at the same position of the adjacent frame is determined from the target frame, and the object corresponding to the target image Is the target object;

将所述图像帧序列中包括所述目标物体的帧作为目标帧。The frame including the target object in the sequence of image frames is used as a target frame.
根据权利要求52所述的图像拍摄和处理***，其特征在于，所述目标物体为移动物体的全部或者局部。The image capturing and processing system according to claim 52, wherein the target object is all or part of a moving object.
一种载体，用于搭载图像拍摄装置，所述图像拍摄装置与图像处理装置可通信耦合连接，其特征在于，所述图像处理装置被配置用于：A carrier for carrying an image capturing device, the image capturing device and the image processing device can be communicatively coupled and connected, wherein the image processing device is configured to:

获取延时拍摄的图像帧序列；Obtain a sequence of time-lapse images;

在所述图像帧序列中，确定具有目标物体的目标帧；In the sequence of image frames, determining a target frame with a target object;

抠除所述目标帧中存在目标物体的图像区域；Cut out the image area where the target object exists in the target frame;

填充抠除所述目标物体后的图像区域。Fill the image area after removing the target object.
根据权利要求55所述的载体，其特征在于，所述图像拍摄和处理装置在获取延时拍摄的图像帧序列时，具体用于：The carrier according to claim 55, wherein the image shooting and processing device is specifically configured to:

获取拍摄的至少一帧初始图像；Acquiring at least one initial image taken;

将所述至少一帧初始图像基于时间序列进行排序，得到初始图像序列；Sort the at least one frame of initial image based on a time sequence to obtain an initial image sequence;

将所述初始图像序列进行压缩，得到图像帧序列。The initial image sequence is compressed to obtain an image frame sequence.
根据权利要求55所述的载体，其特征在于，所述图像拍摄和处理装置在所述图像帧序列中，确定具有目标物体的目标帧时，具体用于：The carrier according to claim 55, wherein the image capturing and processing device is specifically configured to: when determining a target frame with a target object in the image frame sequence:

基于神经网络模型对所述图像帧序列进行处理；Processing the image frame sequence based on a neural network model;

根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧。The target frame is determined according to an output result of processing the image frame sequence by the neural network model.
根据权利要求57所述的载体，其特征在于，所述图像拍摄和处理装置在基于神经网络模型对所述图像帧序列进行处理时，具体用于：The carrier according to claim 57, wherein the image capturing and processing device is specifically configured to: when processing the sequence of image frames based on a neural network model:

将所述图像帧序列输入所述神经网络模型，所述图像帧序列包括多张图像；Inputting the sequence of image frames into the neural network model, where the sequence of image frames includes a plurality of images;

针对任一图像，调用所述神经网络模型对所述图像进行特征提取，得到特提取结果；For any image, call the neural network model to perform feature extraction on the image to obtain a special extraction result;

基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别。Based on the feature extraction result, it is determined that each image in the image frame sequence includes the category of the object.
根据权利要求58所述的载体，其特征在于，所述图像拍摄和处理装置在基于所述特征提取结果，确定所述图像帧序列中各图像包括物体的类别时，具体用于：The carrier according to claim 58, wherein the image capturing and processing device is specifically configured to: when determining that each image in the image frame sequence includes the category of the object based on the feature extraction result:

调用所述神经网络模型对所述特征提取结果进行特征汇总，得到特征汇总结果；Calling the neural network model to perform feature summarization on the feature extraction result to obtain a feature summarization result;

根据所述特征汇总结果，确定所述图像帧序列中各图像包括物体的类别。According to the feature summary result, it is determined that each image in the image frame sequence includes the category of the object.
根据权利要求57-59任一项所述的载体，其特征在于，所述图像拍摄和处理装置在根据所述神经网络模型对所述图像帧序列进行处理的输出结果确定所述目标帧时，具体用于：The carrier according to any one of claims 57-59, wherein when the image capturing and processing device determines the target frame according to the output result of processing the image frame sequence on the neural network model, Specifically used for:

根据所述神经网络模型对所述图像帧序列进行处理的输出结果，判断所述图像帧序列中各图像中是否包含目标物体；Judging whether each image in the image frame sequence contains a target object according to the output result of processing the image frame sequence by the neural network model;

将包含所述目标物体的图像帧确定为目标帧。The image frame containing the target object is determined as the target frame.
根据权利要求55所述的载体，其特征在于，所述图像拍摄和处理装置在所述图像帧序列中，确定具有目标物体的目标帧时，具体用于：The carrier according to claim 55, wherein the image capturing and processing device is specifically configured to: when determining a target frame with a target object in the image frame sequence:

针对所述图像帧序列包括多帧图像的任一帧图像，对所述图像进行区域划分，得到多个区域图像；For any frame image of the image frame sequence including multiple frames of images, performing region division on the image to obtain multiple region images;

获取各个所述区域图像的特征参数，并基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧。The characteristic parameters of each of the regional images are acquired, and based on the characteristic parameters, a target frame with a target object in the sequence of image frames is determined.
根据权利要求61所述的载体，其特征在于，所述图像拍摄和处理装置在基于所述特征参数，确定所述图像帧序列中具有目标物体的目标帧时，具体用于：The carrier according to claim 61, wherein the image capturing and processing device is specifically configured to: when determining a target frame with a target object in the image frame sequence based on the characteristic parameter:

基于所述特征参数，确定所述图像帧序列中的任一帧图像包括的物体类别；Determine the object category included in any frame image in the image frame sequence based on the characteristic parameter;

根据所述物体类别，确定所述图像帧序列中具有目标物体的目标帧。According to the object category, a target frame with a target object in the sequence of image frames is determined.
根据权利要求55所述的载体，其特征在于，所述图像拍摄和处理装置在抠除所述目标帧中存在目标物体的图像区域时，具体用于：The carrier according to claim 55, wherein when the image capturing and processing device cuts out the image area where the target object exists in the target frame, it is specifically configured to:

确定所述目标帧中存在目标物体的图像区域；Determining an image area where a target object exists in the target frame;

基于所述图像区域，生成和所述图像区域对应的局部掩膜图形；Based on the image area, generating a partial mask pattern corresponding to the image area;

根据所述局部掩膜图形和所述目标帧，对所述目标帧中存在目标物体的图像区域进行抠除。According to the partial mask graphic and the target frame, cut out the image area where the target object exists in the target frame.
根据权利要求63所述的载体，其特征在于，所述图像拍摄和处理装置在填充抠除所述目标物体后的图像区域时，具体用于：The carrier according to claim 63, wherein the image shooting and processing device is specifically configured to: when filling the image area after the target object is cut out:

确定所述目标帧中存在目标物体的图像区域的周围图像域，所述周围图像域中的像素点和所述存在目标物体的图像区域的像素点之间的距离小于或等于预设距离阈值；Determining the surrounding image domain of the image area where the target object exists in the target frame, and the distance between the pixel points in the surrounding image domain and the pixel point of the image area where the target object exists is less than or equal to a preset distance threshold;

基于所述周围图像域对扣除所述目标物体后的图像区域进行填充。Filling the image area after subtracting the target object based on the surrounding image domain.
根据权利要求64所述的载体，其特征在于，所述图像拍摄和处理装置在基于所述周围图像域对扣除所述目标物体后的图像区域进行填充时，具体用于：The carrier according to claim 64, wherein the image capturing and processing device is specifically configured to: when filling the image area after subtracting the target object based on the surrounding image domain:

从所述图像帧序列中确定出参考帧，所述参考帧为所述目标帧的前M帧中的任一帧，其中，M为大于1的整数；Determine a reference frame from the image frame sequence, where the reference frame is any one of the first M frames of the target frame, where M is an integer greater than 1;

确定所述参考帧的曝光强度；Determining the exposure intensity of the reference frame;

采用白平衡算法，基于所述参考帧的曝光强度以及所述周围图像域对扣除所述目标物体后的图像区域进行填充。A white balance algorithm is used to fill the image area after deducting the target object based on the exposure intensity of the reference frame and the surrounding image domain.
根据权利要求55所述的载体，其特征在于，所述图像拍摄和处理装置在填充抠除所述目标物体后的图像区域时，具体用于：The carrier according to claim 55, wherein the image capturing and processing device is specifically used for filling the image area after the target object is cut out:

获取所述目标帧包括的第一单元图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, where the first unit image is an image area where a target object exists in the target frame;

将抠除存在目标物体的图像区域后的目标帧和所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the target frame after cutting out the image area where the target object exists and the first unit image into the convolutional neural network model, and obtain the output image of the convolutional neural network model, and the output image is the filled target frame.
根据权利要求55所述的载体，其特征在于，所述图像拍摄和处理装置在填充抠除所述目标物体后的图像区域时，具体用于：The carrier according to claim 55, wherein the image capturing and processing device is specifically used for filling the image area after the target object is cut out:

获取所述目标帧包括的第一单元图像，以及所述目标帧的前一帧图像和后一帧图像，所述第一单元图像为所述目标帧中存在目标物体的图像区域；Acquiring a first unit image included in the target frame, and a previous frame image and a next frame image of the target frame, the first unit image being an image area in the target frame where a target object exists;

将所述前一帧图像，所述后一帧图像，所述抠除存在目标物体的图像区域后的目标帧以及所述第一单元图像输入卷积神经网络模型，并获取所述卷积神经网络模型的输出图像，所述输出图像为进行填充后的目标帧。Input the previous frame image, the next frame image, the target frame after the image area where the target object exists, and the first unit image into the convolutional neural network model, and obtain the convolutional neural network The output image of the network model, where the output image is the target frame after filling.
根据权利要求55所述的载体，其特征在于，所述图像拍摄和处理装置在填充抠除所述目标物体后的图像区域时，具体用于：The carrier according to claim 55, wherein the image capturing and processing device is specifically used for filling the image area after the target object is cut out:

获取所述目标帧的前一帧图像和后一帧图像；Acquiring the previous frame image and the next frame image of the target frame;

基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到所述填充后的目标帧。Based on the previous frame of image and the next frame of image, filling the image area after the target object is cut out to obtain the filled target frame.
根据权利要求58所述的载体，其特征在于，所述图像拍摄和处理装置在基于所述前一帧图像和所述后一帧图像，对所述抠除所述目标物体后的图像区域进行填充，得到填充后的目标帧时，具体用于：The carrier according to claim 58, wherein the image capturing and processing device performs processing on the image area after removing the target object based on the previous frame image and the next frame image. Filling, when the filled target frame is obtained, it is specifically used for:

获取所述前一帧图像中的第二单元图像，所述第二单元图像为所述前一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a second unit image in the previous frame image, where the second unit image is an image corresponding to the same position in the previous frame image and the image area in the target frame where the target object exists;

获取所述后一帧图像中的第三单元图像，所述第三单元图像为所述后一帧图像中和所述目标帧中存在目标物体的图像区域对应相同位置的图像；Acquiring a third unit image in the next frame of image, where the third unit image is an image in the next frame of image that corresponds to the same position as the image area in the target frame where the target object exists;

获取所述第二单元图像所包含的各个像素点的第一数值，并获取所述第三单元图像所包含的各个像素点的第二数值；Acquiring a first value of each pixel included in the second unit image, and acquiring a second value of each pixel included in the third unit image;

针对任一像素点，计算所述第一数值和所述第二数值的平均值；For any pixel, calculating an average value of the first value and the second value;

基于所述平均值对所述抠除存在目标物体的图像区域的目标帧进行像素填充，得到所述填充后的目标帧。Based on the average value, pixel filling is performed on the target frame from which the image area where the target object exists is removed, to obtain the filled target frame.
根据权利要求55所述的载体，其特征在于，所述目标物体是所述拍摄装置在延时拍摄过程中，所述目标帧包括的异常物体。The carrier according to claim 55, wherein the target object is an abnormal object included in the target frame during the time-lapse shooting process of the shooting device.
根据权利要求70所述的载体，其特征在于，所述图像拍摄和处理装置在确定具有目标物体的目标帧时，具体用于：The carrier according to claim 70, wherein the image capturing and processing device is specifically configured to: when determining a target frame with a target object:

确定所述目标帧的相邻帧；Determining adjacent frames of the target frame;

将所述目标帧和所述相邻帧进行对比，从所述目标帧中确定出与所述相邻帧在同一位置对应不同像素值的像素点组成的目标图像，所述目标图像对应的物体为所述目标物体；The target frame is compared with the adjacent frame, and a target image composed of pixels corresponding to different pixel values at the same position of the adjacent frame is determined from the target frame, and the object corresponding to the target image Is the target object;

将所述图像帧序列中包括所述目标物体的帧作为目标帧。The frame including the target object in the sequence of image frames is used as a target frame.
根据权利要求70所述的载体，其特征在于，所述目标物体为移动物体的全部或者局部。The carrier according to claim 70, wherein the target object is all or part of a moving object.
一种计算机存储介质，其特征在于，所述计算机存储介质中存储有计算机程序指令，所述计算机程序指令被处理器执行时，用于执行如权利要求1-18任一项所述的图像处理方法。A computer storage medium, wherein the computer storage medium stores computer program instructions, and when the computer program instructions are executed by a processor, they are used to perform the image processing according to any one of claims 1-18 method.