WO2019047983A1

WO2019047983A1 - Image processing method and device, electronic device and computer readable storage medium

Info

Publication number: WO2019047983A1
Application number: PCT/CN2018/105102
Authority: WO
Inventors: 张学勇
Original assignee: Oppo广东移动通信有限公司
Priority date: 2017-09-11
Filing date: 2018-09-11
Publication date: 2019-03-14

Abstract

An image processing method, an image processor device (100), an electronic device (1000) and a computer readable storage medium. The image processing method is used for processing a combined image, the combined image being formed by fusing a pre-determined background image and a figure region image in a scene image of a current user in a real scene. The image processing method comprises: (03) identifying a specific object in a combined image; (04) fusing a pre-determined sound model matching the specific object with the combined image so as to output an audio image.

Description

图像处理方法及装置、电子装置和计算机可读存储介质Image processing method and device, electronic device and computer readable storage medium

优先权信息Priority information

本申请请求2017年9月11日向中国国家知识产权局提交的、专利申请号为201710814395.2和201710813594.1的专利申请的优先权和权益，并且通过参照将其全文并入此处。The present application claims priority to and the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit of the benefit.

技术领域Technical field

本发明涉及图像处理技术领域，特别涉及一种图像处理方法及装置、电子装置和计算机可读存储介质。The present invention relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a computer readable storage medium.

背景技术Background technique

现有的图像融合通常是将真实人物与背景进行融合，但此种融合方式的趣味性较低。The existing image fusion usually combines real people with the background, but the fusion method is less interesting.

发明内容Summary of the invention

本发明的实施例提供一种图像处理方法、图像处理装置、电子装置和计算机可读存储介质。Embodiments of the present invention provide an image processing method, an image processing apparatus, an electronic device, and a computer readable storage medium.

本发明实施方式的图像处理方法用于处理合并图像，所述合并图像由预定背景图像与当前用户在真实场景下的场景图像中的人物区域图像融合而成。所述图像处理方法包括：识别所述合并图像中的特定物体；将与所述特定物体匹配的预定声音模型与所述合并图像融合以输出有声图像。The image processing method of the embodiment of the present invention is for processing a merged image, which is formed by fusing a predetermined background image with a person region image in a scene image of a current user in a real scene. The image processing method includes: identifying a specific object in the merged image; fusing a predetermined sound model that matches the specific object with the merged image to output a sound image.

本发明实施方式的图像处理装置用于处理合并图像，所述合并图像由所述预定背景图像与当前用户在真实场景下的场景图像中的人物区域图像融合而成。所述图像处理装置包括处理器，所述处理器用于：识别所述合并图像中的特定物体；将与所述特定物体匹配的预定声音模型与所述合并图像融合以输出有声图像。An image processing apparatus according to an embodiment of the present invention is configured to process a merged image, the merged image being formed by fusing the predetermined background image with a person region image in a scene image of a current user in a real scene. The image processing apparatus includes a processor for: identifying a specific object in the merged image; fusing a predetermined sound model that matches the specific object with the merged image to output a sound image.

本发明实施方式的电子装置包括一个或多个处理器、存储器和一个或多个程序。其中所述一个或多个程序被存储在所述存储器中，并且被配置成由所述一个或多个处理器执行，所述程序包括用于执行上述的图像处理方法的指令。An electronic device of an embodiment of the invention includes one or more processors, a memory, and one or more programs. Wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, the program comprising instructions for performing the image processing method described above.

本发明实施方式的计算机可读存储介质包括与能够摄像的电子装置结合使用的计算机程序，所述计算机程序可被处理器执行以完成上述的图像处理方法。A computer readable storage medium in accordance with an embodiment of the present invention includes a computer program for use in conjunction with an electronic device capable of imaging, the computer program being executable by a processor to perform the image processing method described above.

本发明实施方式的图像处理方法、图像处理装置、电子装置和计算机可读存储介质将人物区域图像与预定背景图像融合形成合并图像时，对合并图像中的预定背景图像进行特定物体的识别，并根据识别到的特定物体确定与特定物体相匹配的预定声音模型，以将预定声音模型与合并图像融合输出有声图像，从而用户在观看合并图像的同时还可听到声音，增强图像融合的趣味性，使用户有身临其境之感，改善用户的使用体验。When the image processing method, the image processing apparatus, the electronic device, and the computer readable storage medium of the embodiment of the present invention fuse the person region image with the predetermined background image to form a merged image, the predetermined background image in the merged image is identified by the specific object, and Determining a predetermined sound model matching the specific object according to the identified specific object to fuse the predetermined sound model and the merged image to output the sound image, so that the user can also hear the sound while viewing the merged image, enhancing the interest of the image fusion. To make users feel immersive and improve the user experience.

本发明的实施方式的附加方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实施方式的实践了解到。The additional aspects and advantages of the embodiments of the present invention will be set forth in part in the description which follows.

附图说明DRAWINGS

本发明的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from

图1是本发明某些实施方式的图像处理方法的流程示意图。1 is a flow chart of an image processing method according to some embodiments of the present invention.

图2是本发明某些实施方式的图像处理装置的示意图。2 is a schematic diagram of an image processing apparatus in accordance with some embodiments of the present invention.

图3是本发明某些实施方式的电子装置的结构示意图。3 is a schematic structural view of an electronic device according to some embodiments of the present invention.

图4是本发明某些实施方式的图像处理方法的流程示意图。4 is a flow chart of an image processing method according to some embodiments of the present invention.

图5是本发明某些实施方式的图像处理方法的流程示意图。FIG. 5 is a schematic flow chart of an image processing method according to some embodiments of the present invention.

图6是本发明某些实施方式的图像处理方法的流程示意图。6 is a flow chart of an image processing method according to some embodiments of the present invention.

图7(a)至图7(e)是根据本发明一个实施例的结构光测量的场景示意图。7(a) through 7(e) are schematic diagrams of scenes of structured light measurement in accordance with one embodiment of the present invention.

图8(a)和图8(b)是根据本发明一个实施例的结构光测量的场景示意图。8(a) and 8(b) are schematic diagrams of scenes of structured light measurement in accordance with one embodiment of the present invention.

图9是本发明某些实施方式的图像处理方法的流程示意图。9 is a flow chart of an image processing method according to some embodiments of the present invention.

图10是本发明某些实施方式的图像处理方法的流程示意图。10 is a flow chart of an image processing method according to some embodiments of the present invention.

图11是本发明某些实施方式的图像处理方法的流程示意图。11 is a flow chart of an image processing method according to some embodiments of the present invention.

图12是本发明某些实施方式的图像处理方法的流程示意图。12 is a flow chart of an image processing method according to some embodiments of the present invention.

图13是本发明某些实施方式的图像处理方法的流程示意图。13 is a flow chart of an image processing method according to some embodiments of the present invention.

图14是本发明某些实施方式的图像处理装置的示意图。14 is a schematic diagram of an image processing apparatus according to some embodiments of the present invention.

图15是本发明某些实施方式的电子装置的示意图。15 is a schematic diagram of an electronic device in accordance with some embodiments of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。The embodiments of the present invention are described in detail below, and the examples of the embodiments are illustrated in the drawings, wherein the same or similar reference numerals are used to refer to the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the drawings are intended to be illustrative of the invention and are not to be construed as limiting.

请参阅图1，本发明实施方式的图像处理方法用于处理合并图像。合并图像由预定背景图像与人物区域图像融合而成，人物区域图像为当前用户在真实场景下拍摄得的场景图像中当前用户所在区域的图像。图像处理方法包括：Referring to FIG. 1, an image processing method according to an embodiment of the present invention is used to process a merged image. The merged image is formed by fusing a predetermined background image and a person region image, and the character region image is an image of a region where the current user is located in the scene image captured by the current user in the real scene. Image processing methods include:

03：识别合并图像中的特定物体；03: Identify a specific object in the merged image;

04：将与特定物体匹配的预定声音模型与合并图像融合以输出有声图像。04: The predetermined sound model matched with the specific object is merged with the merged image to output the sound image.

请参阅图2，本发明实施方式的图像处理方法可以由本发明实施方式的图像处理装置100实现。图像处理装置100用于处理合并图像。合并图像由预定背景图像与人物区域图像融合而成。人物区域图像为当前用户在真实场景下拍摄得的场景图像中当前用户所在区域的图像。图像处理装置100包括处理器20。步骤03和步骤04均可以由处理器20实现。也即是说，处理器20可用于识别合并图像中的特定物体、以及将与特定物体匹配的预定声音模型与合并图像融合以输出有声图像。Referring to FIG. 2, an image processing method according to an embodiment of the present invention may be implemented by the image processing apparatus 100 of the embodiment of the present invention. The image processing apparatus 100 is for processing a merged image. The merged image is formed by fusing a predetermined background image and a human region image. The character area image is an image of the area where the current user is located in the scene image captured by the current user in the real scene. The image processing apparatus 100 includes a processor 20. Both step 03 and step 04 can be implemented by processor 20. That is, the processor 20 can be used to identify a particular object in the merged image, and fuse the predetermined sound model that matches the particular object with the merged image to output the sound image.

请参阅图3，本发明实施方式的图像处理装置100可以应用于本发明实施方式的电子装置1000。也即是说，本发明实施方式的电子装置1000包括本发明实施方式的图像处理装置100。Referring to FIG. 3, an image processing apparatus 100 according to an embodiment of the present invention may be applied to an electronic apparatus 1000 according to an embodiment of the present invention. That is, the electronic device 1000 of the embodiment of the present invention includes the image processing device 100 of the embodiment of the present invention.

在某些实施方式中，电子装置1000包括手机、平板电脑、笔记本电脑、智能手环、智能手表、智能头盔、智能眼镜等。In some embodiments, the electronic device 1000 includes a mobile phone, a tablet computer, a notebook computer, a smart bracelet, a smart watch, a smart helmet, smart glasses, and the like.

在某些实施方式中，预定背景图像可以是预定二维背景图像，也可以是预定三维背景图像。预定背景图像可以由处理器20随机分配，或者由当前用户自行选定。In some embodiments, the predetermined background image may be a predetermined two-dimensional background image or a predetermined three-dimensional background image. The predetermined background image may be randomly assigned by the processor 20 or selected by the current user.

本发明实施方式的图像处理方法、图像处理装置100和电子装置1000将人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)融合形成合并图像时，对合并图像中的预定背景图像(预定二维背景图像或预定三维背景图像)进行特定物体的识别，并根据识别到的特定物体确定与特定物体相匹配的预定声音模型，以将预定声音模型与合并图像融合输出有声图像，从而用户在观看合并图像的同时还可听到声音，增强图像融合的趣味性，使用户有身临其境之感，改善用户的使用体验。When the image processing method, the image processing apparatus 100, and the electronic device 1000 of the embodiment of the present invention fuse the person region image with the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) to form a merged image, the predetermined background in the merged image The image (predetermined two-dimensional background image or predetermined three-dimensional background image) performs recognition of a specific object, and determines a predetermined sound model matching the specific object according to the identified specific object, to fuse the predetermined sound model and the merged image to output the sound image, Therefore, the user can also hear the sound while viewing the merged image, enhance the interest of the image fusion, and make the user feel immersive and improve the user experience.

在某些实施方式中，特定物体包括动物、植物、流水、雨滴、乐器、火焰、天空、道路、汽车等。例如，合并图像中的预定背景图像(预定二维背景图像或预定三维背景图像)包括树木，则处理器20可通过以下方法对合并图像中的树木进行识别：处理器20首先基于RGB空间的颜色直方图对合并图像或预定背景图像(预定二维背景图像或预定三维背景图像)进行颜色特征提取，然后基于Gabor滤波器进行纹理特征提取，最后根据颜色特征和纹理特征结合的信息确定合并图像中存在树木。随后，处理器20可选取一段风吹树木后，树木的叶子沙沙响的预定声音模型与合并图像进行融合。再例如，合并图像中包含动物，则处理器20可将RGB空间的合并图像转换为HSV空间的合并图像，再对HSV空间的合并图像进行颜色直方图计算，通过颜色直方图特性调整低阶统计矩数值作为特征描述量，最后再通过K邻近方法判断合并图像的类别，即合并图像中是否存在动物，并在存在动物时判断该动物的种类，最终选择与识别到的动物匹配的预定声音模型。In certain embodiments, specific objects include animals, plants, running water, raindrops, musical instruments, fire, sky, roads, automobiles, and the like. For example, the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) in the merged image includes trees, and the processor 20 may identify the trees in the merged image by the following method: the processor 20 first based on the color of the RGB space The histogram performs color feature extraction on the merged image or the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image), then performs texture feature extraction based on the Gabor filter, and finally determines the merged image according to the combination of the color feature and the texture feature. There are trees. Subsequently, the processor 20 may select a wind-blown tree, and the predetermined sound model of the rustling of the leaves of the tree merges with the merged image. For another example, if the merged image includes an animal, the processor 20 may convert the merged image of the RGB space into a merged image of the HSV space, perform a color histogram calculation on the merged image of the HSV space, and adjust the low-order statistics by the color histogram characteristic. The moment value is used as the feature description quantity. Finally, the K-proximity method is used to judge the category of the merged image, that is, whether there is an animal in the merged image, and the animal is judged when the animal is present, and finally the predetermined sound model matching the recognized animal is selected. .

请参阅图4，在某些实施方式中，本发明实施方式的图像处理方法还包括：Referring to FIG. 4, in some embodiments, the image processing method of the embodiment of the present invention further includes:

021：获取当前用户的场景图像；021: Obtain a scene image of the current user.

022：获取当前用户的深度图像；022: Obtain a depth image of the current user;

023：处理场景图像和深度图像以提取当前用户在场景图像中的人物区域而获得人物区域图像；和023: processing the scene image and the depth image to extract a character area of the current user in the scene image to obtain a character area image; and

024：将人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)融合以得到合并图像。024: merging the person region image with a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) to obtain a merged image.

请再参阅图3，在某些实施方式中，图像处理装置100还包括可见光摄像头11和深度图像采集组件12。步骤021可以由可见光摄像头11实现，步骤022可以由深度图像采集组件12实现。步骤023和步骤024可以由处理器20实现。Referring to FIG. 3 again, in some embodiments, the image processing apparatus 100 further includes a visible light camera 11 and a depth image acquisition component 12. Step 021 can be implemented by visible light camera 11, and step 022 can be implemented by depth image acquisition component 12. Step 023 and step 024 can be implemented by processor 20.

也即是说，可见光摄像头11可用于获取当前用户的场景图像。深度图像采集组件12可用于获取当前用户的深度图像。处理器20可用于处理场景图像和深度图像以提取当前用户在场景图像中的人物区域而获得人物区域图像、以及将人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)融合以得到合并图像。That is to say, the visible light camera 11 can be used to acquire a scene image of the current user. The depth image acquisition component 12 can be used to acquire a depth image of the current user. The processor 20 is operable to process the scene image and the depth image to extract the current user's character area in the scene image to obtain the character area image, and fuse the person area image with the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) To get a merged image.

其中，场景图像可以是灰度图像或彩色图像，深度图像表征包含当前用户所处真实场景中各个人或物体的深度信息。场景图像的场景范围与深度图像的场景范围一致，且场景图像中的各个像素均能在深度图像中找到对应该像素的深度信息。The scene image may be a grayscale image or a color image, and the depth image represents depth information of each person or object in the real scene in which the current user is located. The scene range of the scene image is consistent with the scene range of the depth image, and each pixel in the scene image can find the depth information corresponding to the pixel in the depth image.

现有的分割人物与背景的方法主要根据相邻像素在像素值方面的相似性和不连续性进行人物与背景的分割，但这种分割方法易受外界光照等环境因素的影响。本发明实施方式的图像处理方法通过获取当前用户的深度图像以将场景图像中的人物区域提取出来。由于深度图像的获取不易受光照、场景中色彩分布等因素的影响，因此，通过深度图像提取到的人物区域更加准确，尤其可以准确标定出人物区域的边界。进一步地，较为精准的人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)融合后的合并图像的效果更佳。The existing methods of segmenting characters and background mainly divide the characters and backgrounds according to the similarity and discontinuity of adjacent pixels in pixel values, but this segmentation method is susceptible to environmental factors such as external illumination. The image processing method of the embodiment of the present invention extracts a character region in the scene image by acquiring a depth image of the current user. Since the acquisition of the depth image is not easily affected by factors such as illumination and color distribution in the scene, the character region extracted by the depth image is more accurate, and in particular, the boundary of the person region can be accurately calibrated. Further, the effect of the merged image in which the more accurate character region image is merged with the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) is better.

请参阅图5，在某些实施方式中，步骤022获取当前用户的深度图像包括：Referring to FIG. 5, in some embodiments, obtaining the depth image of the current user in step 022 includes:

0221：向当前用户投射结构光；0221: Projecting structured light to the current user;

0222：拍摄经当前用户调制的结构光图像；和0222: capturing a structured light image modulated by a current user; and

0223：解调结构光图像的各个像素对应的相位信息以得到深度图像。0223: Demodulate phase information corresponding to each pixel of the structured light image to obtain a depth image.

请再参阅图2，在某些实施方式中，深度图像采集组件12包括结构光投射器121和结构光摄像头 122。步骤0221可以由结构光投射器121实现，步骤0222和步骤0223可以由结构光摄像头122实现。Referring again to FIG. 2, in some embodiments, depth image acquisition assembly 12 includes a structured light projector 121 and a structured light camera 122. Step 0221 can be implemented by structured light projector 121, and step 0222 and step 0223 can be implemented by structured light camera 122.

也即是说，结构光投射器121可以向当前用户投射结构光，结构光摄像头122可用于拍摄经当前用户调制的结构光图像，以及解调结构光图像的各个像素对应的相位信息以得到深度图像。That is to say, the structured light projector 121 can project structured light to the current user, and the structured light camera 122 can be used to capture the structured light image modulated by the current user, and demodulate the phase information corresponding to each pixel of the structured light image to obtain the depth. image.

具体地，结构光投射器121将一定模式的结构光投射到当前用户的面部及躯体上后，在当前用户的面部及躯体的表面会形成由当前用户调制后的结构光图像。结构光摄像头122拍摄经调制后的结构光图像，再对结构光图像进行解调以得到深度图像。其中，结构光的模式可以是激光条纹、格雷码、正弦条纹、非均匀散斑等。Specifically, after the structured light projector 121 projects the structured light of a certain pattern onto the face and the body of the current user, a structured light image modulated by the current user is formed on the surface of the current user's face and the body. The structured light camera 122 captures the modulated structured light image and demodulates the structured light image to obtain a depth image. The pattern of the structured light may be a laser stripe, a Gray code, a sine stripe, a non-uniform speckle, or the like.

请参阅图6，在某些实时方式中，步骤0223解调结构光图像的各个像素对应的相位信息以得到深度图像包括：Referring to FIG. 6, in some real-time modes, step 0223 demodulates phase information corresponding to each pixel of the structured light image to obtain a depth image, including:

02231：解调结构光图像中各个像素对应的相位信息；02231: Demodulate phase information corresponding to each pixel in the structured light image;

02232：将相位信息转化为深度信息；和02232: Convert phase information into depth information; and

02233：根据深度信息生成深度图像。02233: Generate a depth image based on the depth information.

请再参阅图2，在某些实施方式中，步骤02231、步骤02232和步骤02233均可以由结构光摄像头122实现。Referring to FIG. 2 again, in some embodiments, step 02231, step 02232, and step 02233 can all be implemented by the structured light camera 122.

也即是说，结构光摄像头122还可用于解调结构光图像中各个像素对应的相位信息，将相位信息转化为深度信息，以及根据深度信息生成深度图像。That is to say, the structured light camera 122 can also be used to demodulate phase information corresponding to each pixel in the structured light image, convert the phase information into depth information, and generate a depth image based on the depth information.

具体地，与未经调制的结构光相比，调制后的结构光的相位信息发生了变化，在结构光图像中呈现出的结构光是产生了畸变之后的结构光，其中，变化的相位信息即可表征物体的深度信息。因此，结构光摄像头122首先解调出结构光图像中各个像素对应的相位信息，再根据相位信息计算出深度信息，从而得到最终的深度图像。Specifically, the phase information of the modulated structured light is changed compared to the unmodulated structured light, and the structured light presented in the structured light image is the structured light after the distortion is generated, wherein the changed phase information It can represent the depth information of the object. Therefore, the structured optical camera 122 first demodulates the phase information corresponding to each pixel in the structured light image, and then calculates the depth information based on the phase information, thereby obtaining a final depth image.

为了使本领域的技术人员更加清楚的了解根据结构光来采集当前用户的面部及躯体的深度图像的过程，下面以一种应用广泛的光栅投影技术(条纹投影技术)为例来阐述其具体原理。其中，光栅投影技术属于广义上的面结构光。In order to make the process of collecting the depth image of the face and the body of the current user according to the structured light more clearly, the following is an example of a widely used grating projection technology (stripion projection technology). . Among them, the grating projection technique belongs to the surface structure light in a broad sense.

如图7(a)所示，在使用面结构光投影的时候，首先通过计算机编程产生正弦条纹，并将正弦条纹通过结构光投射器121投射至被测物，再利用结构光摄像头122拍摄条纹受物体调制后的弯曲程度，随后解调该弯曲条纹得到相位，再将相位转化为深度信息即可获取深度图像。为避免产生误差或误差耦合的问题，使用结构光进行深度信息采集前需对深度图像采集组件12进行参数标定，标定包括几何参数(例如，结构光摄像头122与结构光投射器121之间的相对位置参数等)的标定、结构光摄像头122的内部参数以及结构光投射器121的内部参数的标定等。As shown in FIG. 7(a), when surface structure light projection is used, sinusoidal stripes are first generated by computer programming, and sinusoidal stripes are projected to the object to be measured through the structured light projector 121, and then the stripes are photographed by the structured light camera 122. The degree of bending after being modulated by the object is then demodulated to obtain a phase, and then the phase is converted into depth information to obtain a depth image. In order to avoid the problem of error or error coupling, the depth image acquisition component 12 needs to be parameterized before using the structured light for depth information acquisition. The calibration includes geometric parameters (for example, the relative relationship between the structured light camera 122 and the structured light projector 121). Calibration of positional parameters, etc., internal parameters of the structured light camera 122, calibration of internal parameters of the structured light projector 121, and the like.

具体而言，第一步，计算机编程产生正弦条纹。由于后续需要利用畸变的条纹获取相位，比如采用四步移相法获取相位，因此这里产生四幅相位差为π/2的条纹，然后结构光投射器121将该四幅条纹分时投射到被测物(图7(b)所示的面具)上，结构光摄像头122采集到如图7(b)左边的图，同时要读取如图7(b)右边所示的参考面的条纹。Specifically, in the first step, computer programming produces sinusoidal stripes. Since the phase needs to be acquired by using the distorted stripes, for example, the phase is acquired by the four-step phase shift method, four stripes having a phase difference of π/2 are generated here, and then the structured light projector 121 projects the four stripes into the measured object in time division. (The mask shown in Fig. 7(b)), the structured light camera 122 collects the image on the left side of Fig. 7(b) while reading the stripe of the reference plane as shown on the right side of Fig. 7(b).

第二步，进行相位恢复。结构光摄像头122根据采集到的四幅受调制的条纹图(即结构光图像)计算出被调制相位，此时得到的相位图是截断相位图。因为四步移相算法得到的结果是由反正切函数计算所得，因此结构光调制后的相位被限制在[-π,π]之间，也就是说，每当调制后的相位超过[-π,π]，其又会重新开始。最终得到的相位主值如图7(c)所示。The second step is to perform phase recovery. The structured optical camera 122 calculates the modulated phase based on the acquired four modulated fringe patterns (ie, the structured light image), and the phase map obtained at this time is a truncated phase map. Since the result obtained by the four-step phase shifting algorithm is calculated by the inverse tangent function, the phase after the structured light modulation is limited to [-π, π], that is, whenever the modulated phase exceeds [-π] , π], which will start again. The resulting phase principal value is shown in Figure 7(c).

其中，在进行相位恢复过程中，需要进行消跳变处理，即将截断相位恢复为连续相位。如图7(d)所示，左边为受调制的连续相位图，右边是参考连续相位图。Among them, in the phase recovery process, the bounce processing is required, that is, the truncated phase is restored to the continuous phase. As shown in Figure 7(d), the left side is the modulated continuous phase map and the right side is the reference continuous phase map.

第三步，将受调制的连续相位和参考连续相位相减得到相位差(即相位信息)，该相位差表征了被测物相对参考面的深度信息，再将相位差代入相位与深度的转化公式(公式中涉及到的参数经过标定)，即可得到如图7(e)所示的待测物体的三维模型。In the third step, the modulated continuous phase and the reference continuous phase are subtracted to obtain a phase difference (ie, phase information), the phase difference is used to represent the depth information of the measured object relative to the reference surface, and then the phase difference is substituted into the phase and depth. The formula (the parameters involved in the formula are calibrated), the three-dimensional model of the object to be tested as shown in Fig. 7(e) can be obtained.

应当理解的是，在实际应用中，根据具体应用场景的不同，本发明实施例中所采用的结构光除了上述光栅之外，还可以是其他任意图案。It should be understood that, in practical applications, the structured light used in the embodiments of the present invention may be any other pattern in addition to the above-mentioned grating, depending on the specific application scenario.

作为一种可能的实现方式，本发明还可使用散斑结构光进行当前用户的深度信息的采集。As a possible implementation manner, the present invention can also use the speckle structure light to perform the acquisition of the depth information of the current user.

具体地，散斑结构光获取深度信息的方法是使用一基本为平板的衍射元件，该衍射元件具有特定相位分布的浮雕衍射结构，横截面为具有两个或多个凹凸的台阶浮雕结构。衍射元件中基片的厚度大致为1微米，各个台阶的高度不均匀，高度的取值范围可为0.7微米～0.9微米。图8(a)所示结构为本实施例的准直分束元件的局部衍射结构。图8(b)为沿截面A-A的剖面侧视图，横坐标和纵坐标的单位均为微米。散斑结构光生成的散斑图案具有高度的随机性，并且会随着距离的不同而变换图案。因此，在使用散斑结构光获取深度信息前，首先需要标定出空间中的散斑图案，例如，在距离结构光摄像头122的0～4米的范围内，每隔1厘米取一个参考平面，则标定完毕后就保存了400幅散斑图像，标定的间距越小，获取的深度信息的精度越高。随后，结构光投射器121将散斑结构光投射到被测物(即当前用户)上，被测物表面的高度差使得投射到被测物上的散斑结构光的散斑图案发生变化。结构光摄像头122拍摄投射到被测物上的散斑图案(即结构光图像)后，再将散斑图案与前期标定后保存的400幅散斑图像逐一进行互相关运算，进而得到400幅相关度图像。空间中被测物体所在的位置会在相关度图像上显示出峰值，把上述峰值叠加在一起并经过插值运算后即可得到被测物的深度信息。Specifically, the method of obtaining depth information by the speckle structure light is to use a substantially flat plate diffraction element having an embossed diffraction structure of a specific phase distribution, the cross section being a step relief structure having two or more concavities and convexities. The thickness of the substrate in the diffractive element is approximately 1 micrometer, and the height of each step is not uniform, and the height may range from 0.7 micrometer to 0.9 micrometer. The structure shown in Fig. 8(a) is a partial diffraction structure of the collimating beam splitting element of the present embodiment. Fig. 8(b) is a cross-sectional side view taken along section A-A, and the units of the abscissa and the ordinate are both micrometers. The speckle pattern generated by the speckle structure light has a high degree of randomness and will change the pattern as the distance is different. Therefore, before the depth information is acquired using the speckle structure light, it is first necessary to calibrate the speckle pattern in the space. For example, within a range of 0 to 4 meters from the structured optical camera 122, a reference plane is taken every 1 cm. After the calibration is completed, 400 speckle images are saved. The smaller the calibration interval, the higher the accuracy of the acquired depth information. Subsequently, the structured light projector 121 projects the speckle structure light onto the object to be tested (i.e., the current user), and the height difference of the surface of the object to be measured causes the speckle pattern of the speckle structure light projected onto the object to be measured to change. The structured light camera 122 captures the speckle pattern (ie, the structured light image) projected onto the object to be tested, and then performs the cross-correlation operation on the speckle pattern and the 400 speckle images saved in the previous calibration, thereby obtaining 400 correlations. Degree image. The position of the measured object in the space will show the peak on the correlation image, and the above peaks will be superimposed and interpolated to obtain the depth information of the measured object.

由于普通的衍射元件对光束进行衍射后得到多束衍射光，但每束衍射光光强差别大，对人眼伤害的风险也大。即便是对衍射光进行二次衍射，得到的光束的均匀性也较低。因此，利用普通衍射元件衍射的光束对被测物进行投射的效果较差。本实施例中采用准直分束元件，该元件不仅具有对非准直光束进行准直的作用，还具有分光的作用，即经反射镜反射的非准直光经过准直分束元件后往不同的角度出射多束准直光束，且出射的多束准直光束的截面面积近似相等，能量通量近似相等，进而使得利用该光束衍射后的散点光进行投射的效果更好。同时，激光出射光分散至每一束光，进一步降低了伤害人眼的风险，且散斑结构光相对于其他排布均匀的结构光来说，达到同样的采集效果时，散斑结构光消耗的电量更低。Since a common diffraction element diffracts a light beam to obtain a plurality of diffracted lights, the difference in intensity of each of the diffracted lights is large, and the risk of injury to the human eye is also large. Even if the diffracted light is twice diffracted, the uniformity of the resulting beam is low. Therefore, the light beam diffracted by the ordinary diffractive element is less effective in projecting the object to be measured. In this embodiment, a collimating beam splitting element is used, which not only has the function of collimating the non-collimated beam, but also has the function of splitting light, that is, the non-collimated light reflected by the mirror passes through the collimating beam splitting element. A plurality of collimated beams are emitted at different angles, and the cross-sectional areas of the plurality of collimated beams are approximately equal, and the energy fluxes are approximately equal, so that the effect of the astigmatism of the diffracted light by the beam is better. At the same time, the laser light is scattered to each beam, which further reduces the risk of harming the human eye, and the speckle structure light consumes the same collection effect compared to other uniformly arranged structured light. The battery is lower.

请参阅图9，在某些实施方式中，步骤023处理场景图像和深度图像以提取当前用户在场景图像中的人物区域而获得人物区域图像包括：Referring to FIG. 9, in some embodiments, the step 023 processes the scene image and the depth image to extract the current user's character area in the scene image to obtain the character area image, including:

0231：识别场景图像中的人脸区域；0231: Identify a face area in the scene image;

0232：从深度图像中获取与人脸区域对应的深度信息；0232: Obtain depth information corresponding to a face area from the depth image;

0233：根据人脸区域的深度信息确定人物区域的深度范围；和0233: determining a depth range of the character region according to the depth information of the face region; and

0234：根据人物区域的深度范围确定与人脸区域连接且落入深度范围内的人物区域以获得人物区域图像。0234: Determine a person area that is connected to the face area and falls within the depth range according to the depth range of the person area to obtain a person area image.

请再参阅图2，在某些实施方式中，步骤0231、步骤0232、步骤0233和步骤0234均可以由处理器20实现。也即是说，处理器20可用于识别场景图像中的人脸区域，从深度图像中获取与人脸区域对应的深度信息，根据人脸区域的深度信息确定人物区域的深度范围，以及根据人物区域的深度范围确定与人脸区域连接且落入深度范围内的人物区域以获得人物区域图像。Referring to FIG. 2 again, in some embodiments, step 0231, step 0232, step 0233, and step 0234 can all be implemented by the processor 20. That is to say, the processor 20 can be used to identify a face region in the scene image, obtain depth information corresponding to the face region from the depth image, determine a depth range of the person region according to the depth information of the face region, and The depth range of the region determines a person region that is connected to the face region and falls within the depth range to obtain a person region image.

具体地，首先可采用已训练好的深度学***均值得到；或者，可以通过对人脸区域的深度数据取中值得到。Specifically, the trained depth learning model may firstly identify the face region in the scene image, and then the depth information of the face region may be determined according to the corresponding relationship between the scene image and the depth image. Since the face region includes features such as a nose, an eye, an ear, and a lip, the depth data corresponding to each feature in the face region is different in the depth image, for example, when the face faces the depth image capturing component 12 In the depth image captured by the depth image acquisition component 12, the depth data corresponding to the nose may be small, and the depth data corresponding to the ear may be large. Therefore, the depth information of the face area described above may be a numerical value or a numerical range. Wherein, when the depth information of the face region is a value, the value may be obtained by averaging the depth data of the face region; or, by taking the median value of the depth data of the face region.

由于人物区域包含人脸区域，也即是说，人物区域与人脸区域同处于某一个深度范围内，因此，处理器20确定出人脸区域的深度信息后，可以根据人脸区域的深度信息设定人物区域的深度范围，再根据人物区域的深度范围提取落入该深度范围内且与人脸区域相连接的人物区域以获得人物区域图像。Since the person area includes a face area, that is, the person area and the face area are within a certain depth range, the processor 20 may determine the depth information of the face area according to the depth information of the face area. The depth range of the character area is set, and the character area falling within the depth range and connected to the face area is extracted according to the depth range of the character area to obtain a person area image.

如此，即可根据深度信息从场景图像中提取出人物区域图像。由于深度信息的获取不受环境中光照、色温等因素的影像响，因此，提取出的人物区域图像更加准确。In this way, the person region image can be extracted from the scene image according to the depth information. Since the acquisition of the depth information is not affected by the image of the illumination, color temperature and the like in the environment, the extracted image of the person region is more accurate.

请再参阅图9，在某些实施方式中，步骤023处理场景图像和深度图像以提取当前用户在场景图像中的人物区域而获得人物区域图像还包括：Referring to FIG. 9 again, in some embodiments, the step 023 processing the scene image and the depth image to extract the current user's character area in the scene image to obtain the character area image further includes:

0235：处理场景图像以得到场景图像的全场边缘图像；和0235: processing the scene image to obtain a full-field edge image of the scene image; and

0236：根据全场边缘图像修正人物区域图像。0236: Correct the image of the person region based on the full-field edge image.

请再参阅图2，在某些实施方式中，步骤0235和步骤0236均可以由处理器20实现。也即是说，处理器20还可用于处理场景图像以得到场景图像的全场边缘图像，以及根据全场边缘图像修正人物区域图像。Referring to FIG. 2 again, in some embodiments, both step 0235 and step 0236 can be implemented by processor 20. That is, the processor 20 can also be used to process the scene image to obtain a full field edge image of the scene image, and to correct the person region image from the full field edge image.

处理器20首先对场景图像进行边缘提取以得到全场边缘图像，其中，全场边缘图像中的边缘线条包括当前用户以及当前用户所处场景中背景物体的边缘线条。具体地，可通过Canny算子对场景图像进行边缘提取。Canny算子进行边缘提取的算法的核心主要包括以下几步：首先，用2D高斯滤波模板对场景图像进行卷积以消除噪声；随后，利用微分算子得到各个像素的灰度的梯度值，并根据梯度值计算各个像素的灰度的梯度方向，通过梯度方向可以找到对应像素沿梯度方向的邻接像素；随后，遍历每一个像素，若某个像素的灰度值与其梯度方向上前后两个相邻像素的灰度值相比不是最大的，那么认为这个像素不是边缘点。如此，即可确定场景图像中处于边缘位置的像素点，从而获得边缘提取后的全场边缘图像。The processor 20 first performs edge extraction on the scene image to obtain a full field edge image, wherein the edge line in the full field edge image includes the current user and the edge line of the background object in the scene where the current user is located. Specifically, edge extraction of the scene image can be performed by the Canny operator. The core of the algorithm for edge extraction of Canny operator mainly includes the following steps: First, convolving the scene image with 2D Gaussian filter template to eliminate noise; then, using the differential operator to obtain the gradient value of the gray level of each pixel, and Calculating the gradient direction of the gray level of each pixel according to the gradient value, and finding the adjacent pixel of the corresponding pixel along the gradient direction by the gradient direction; then, traversing each pixel, if the gray value of a certain pixel and the two directions of the gradient direction If the gray value of the neighboring pixel is not the largest, then the pixel is considered not to be the edge point. In this way, the pixel points at the edge position in the scene image can be determined, thereby obtaining the full-field edge image after the edge extraction.

处理器20获取全场边缘图像后，再根据全场边缘图像对人物区域图像进行修正。可以理解，人物区域图像是将场景图像中与人脸区域连接并落入设定的深度范围的所有像素进行归并后得到的，在某些场景下，可能存在一些与人脸区域连接且落入深度范围内的物体。因此，为使得提取的人物区域图像更为准确，可使用全场边缘图对人物区域图像进行修正。After acquiring the full-field edge image, the processor 20 corrects the person region image according to the full-field edge image. It can be understood that the image of the person region is obtained by merging all the pixels in the scene image that are connected to the face region and fall within the set depth range. In some scenarios, there may be some connections with the face region and falling into the image. Objects in the depth range. Therefore, in order to make the extracted person region image more accurate, the person region image can be corrected using the full field edge map.

进一步地，处理器20还可对修正后的人物区域图像进行二次修正，例如，可对修正后的人物区域图像进行膨胀处理，扩大人物区域图像以保留人物区域图像的边缘细节。Further, the processor 20 may perform secondary correction on the corrected person region image. For example, the corrected person region image may be expanded to enlarge the person region image to preserve the edge details of the person region image.

请参阅图10，在某些实施方式中，在预定背景图像为预定三维背景图像时，步骤024将人物区域图像与预定背景图像融合以得到合并图像包括：Referring to FIG. 10, in some embodiments, when the predetermined background image is a predetermined three-dimensional background image, step 024 is to merge the character region image with the predetermined background image to obtain a merged image, including:

02411：获取预定三维背景图像中的预定融合区域；02411: Acquire a predetermined fusion region in a predetermined three-dimensional background image;

02412：根据人物区域图像确定预定融合区域的待替换像素区域；和02412: determining a pixel area to be replaced of a predetermined fusion area according to the image of the person area; and

02413：将预定融合区域的待替换像素区域替换为人物区域图像以得到合并图像。02413: Replace the pixel area to be replaced of the predetermined fusion area with the person area image to obtain a merged image.

请再参阅图2，在某些实施方式中，步骤02411、步骤02412和步骤02413均可以由处理器20实现。也即是说，处理器20可用于获取预定三维背景图像中的预定融合区域，根据人物区域图像确定预定融合区域的待替换像素区域，以及将预定融合区域的待替换像素区域替换为人物区域图像以得到合并图像。Referring to FIG. 2 again, in some embodiments, step 02411, step 02212, and step 02413 can all be implemented by processor 20. That is to say, the processor 20 can be configured to acquire a predetermined fusion region in the predetermined three-dimensional background image, determine a pixel region to be replaced of the predetermined fusion region according to the image of the person region, and replace the pixel region to be replaced of the predetermined fusion region with the image of the region region. To get a merged image.

可以理解，在预定三维背景图像通过实际场景建模得到时，预定三维背景图像中各个像素对应的深度数据在建模过程中就可直接获取；在预定三维背景图像通过动画制作得到时，预定三维背景图像中各个像素对应的深度数据可以由制作者自行设定；另外，预定三维背景图像中存在的各个物体也是已知的，因此，在使用预定三维背景图像进行图像融合处理前，可先根据深度数据以及存在于预定三维背景图像中的物体标定出人物区域图像的融合位置，即预定融合区域。由于可见光摄像头11采集到的人物区域图像的大小受采集距离的影响，采集距离较近时，人物区域图像较大，采集距离较远时，人物区域图像较小，因此，处理器20需根据可见光摄像头11实际采集到的人物区域图像的大小确定预定融合区域中的待替换像素区域。随后，将预定融合区域中的待替换像素区域替换为人物区域图像即可得到融合后的合并图像。如此，实现人物区域图像与预定三维背景图像的融合。It can be understood that when the predetermined three-dimensional background image is obtained by actual scene modeling, the depth data corresponding to each pixel in the predetermined three-dimensional background image can be directly acquired during the modeling process; when the predetermined three-dimensional background image is obtained through animation, the predetermined three-dimensional is obtained. The depth data corresponding to each pixel in the background image may be set by the producer; in addition, each object existing in the predetermined three-dimensional background image is also known, and therefore, before using the predetermined three-dimensional background image for image fusion processing, The depth data and the objects existing in the predetermined three-dimensional background image calibrate the fusion position of the person region image, that is, the predetermined fusion region. Since the size of the image of the person area captured by the visible light camera 11 is affected by the collection distance, when the collection distance is relatively close, the image of the person area is large, and when the collection distance is long, the image of the person area is small, and therefore, the processor 20 needs to be based on visible light. The size of the image of the person region actually acquired by the camera 11 determines the pixel region to be replaced in the predetermined fusion region. Then, the merged image is obtained by replacing the pixel area to be replaced in the predetermined fusion area with the person area image. In this way, the fusion of the person region image with the predetermined three-dimensional background image is achieved.

请参阅图11，在某些实施方式中，在预定背景图像为预定三维背景图像时，步骤024将人物区域图像与预定背景图像融合以得到合并图像包括：Referring to FIG. 11, in some embodiments, when the predetermined background image is a predetermined three-dimensional background image, step 024 is to merge the character region image with the predetermined background image to obtain a merged image, including:

02421：处理预定三维背景图像以得到预定三维背景图像的全场边缘图像；02421: processing a predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image;

02422：获取预定三维背景图像的深度数据；02422: Obtain depth data of a predetermined three-dimensional background image;

02423：根据预定三维背景图像的全场边缘图像及深度数据确定预定三维背景图像的计算融合区域；02424：根据人物区域图像确定计算融合区域的待替换像素区域；和02423: Determine a calculated fusion region of the predetermined three-dimensional background image according to the full-field edge image and the depth data of the predetermined three-dimensional background image; 02424: determine a pixel region to be replaced that calculates the fusion region according to the image of the person region; and

02425：将计算融合区域的待替换像素区域替换为人物区域图像以得到合并图像。02425: Replace the pixel area to be replaced of the calculated fusion area with the person area image to obtain a merged image.

请再参阅图2，在某些实施方式中，步骤02421、步骤02422、步骤02423、步骤02424和步骤02425均可以由处理器20实现。也即是说，处理器20可用于处理预定三维背景图像以得到预定三维背景图像的全场边缘图像，获取预定三维背景图像的深度数据，根据预定三维背景图像的全场边缘图像及深度数据确定预定三维背景图像的计算融合区域，根据人物区域图像确定计算融合区域的待替换像素区域，以及将计算融合区域的待替换像素区域替换为人物区域图像以得到合并图像。Referring to FIG. 2 again, in some embodiments, step 02421, step 02224, step 02423, step 02424, and step 02425 can all be implemented by processor 20. That is, the processor 20 can be configured to process a predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image, acquire depth data of the predetermined three-dimensional background image, and determine according to the full-field edge image and the depth data of the predetermined three-dimensional background image. The calculation fusion region of the three-dimensional background image is predetermined, the pixel region to be replaced of the fusion region is determined according to the image of the person region, and the pixel region to be replaced of the calculation fusion region is replaced with the image of the region region to obtain a merged image.

可以理解，若预定三维背景图像与人物区域图像融合时，人物区域图像的融合位置未事先标定，则处理器20首先需确定人物区域图像在预定三维背景图像中的融合位置。具体地，处理器20先对预定三维背景图像进行边缘提取以得到全场边缘图像，并获取预定三维背景图像的深度数据，其中，深度数据在预定三维背景图像建模或动画制作过程中获取。随后，处理器20根据预定三维背景图像的全场边缘图像及深度数据确定预定三维背景图像中的计算融合区域。由于人物区域图像的大小受可见光摄像头11的采集距离的影响，因此，需计算出人物区域图像的大小，并根据人物区域图像的大小确定计算融合区域中的待替换像素区域。最终，将计算融合区域图像中的待替换像素区域替换为人物区域图像，从而得到合并图像。如此，实现人物区域图像与预定三维背景图像的融合。It can be understood that if the predetermined three-dimensional background image is merged with the person region image, the fusion position of the person region image is not previously calibrated, the processor 20 first needs to determine the fusion position of the person region image in the predetermined three-dimensional background image. Specifically, the processor 20 performs edge extraction on the predetermined three-dimensional background image to obtain a full-field edge image, and acquires depth data of the predetermined three-dimensional background image, wherein the depth data is acquired in a predetermined three-dimensional background image modeling or animation process. Subsequently, the processor 20 determines a calculated fusion region in the predetermined three-dimensional background image based on the full-field edge image and the depth data of the predetermined three-dimensional background image. Since the size of the image of the person region is affected by the collection distance of the visible light camera 11, the size of the image of the person region is calculated, and the pixel region to be replaced in the fusion region is determined according to the size of the image of the region region. Finally, the pixel area to be replaced in the calculated fusion area image is replaced with the person area image, thereby obtaining a merged image. In this way, the fusion of the person region image with the predetermined three-dimensional background image is achieved.

在某些实施方式中，人物区域图像可以是二维的人物区域图像，也可以是三维的人物区域图像。其中，处理器20可结合深度图像中的深度信息从场景图像中提取得到二维的人物区域图像，处理器20还可根据深度图像中的深度信息建立人物区域的三维图像，再结合场景图像中的色彩信息对三维的人物区域进行颜色填补以得到三维的彩色的人物区域图像。In some embodiments, the person region image may be a two-dimensional person region image or a three-dimensional person region image. The processor 20 can extract a two-dimensional person region image from the scene image by combining the depth information in the depth image, and the processor 20 can also establish a three-dimensional image of the person region according to the depth information in the depth image, and then combine the scene image. The color information is color-filled to the three-dimensional character area to obtain a three-dimensional colored character area image.

在某些实施方式中，预定三维背景图像中的预定融合区域或计算融合区域可以是一个或者多个。当预定融合区域为一个时，二维的人物区域图像或三维的人物区域图像在预定三维背景图像中的融合位置为即为上述唯一的一个预定融合区域；当计算融合区域为一个时，二维的人物区域图像或三维的人物区域图像在预定三维背景图像中的融合位置为即为上述唯一的一个计算融合区域；当预定融合区域为多个时，二维的人物区域图像或三维的人物区域图像在预定三维背景图像中的融合位置可为多个预定融合区域中的任意一个，更进一步地，由于三维的人物区域图像具有深度信息，因此可在多个预定融合区域中寻找与三维人物区域图像的深度信息相匹配的预定融合区域作为融合位置，以得到更好的融合效果；当计算融合区域为多个时，二维的人物区域图像或三维的人物区域图像在计算三维背景图像中的融合位置可为多个计算融合区域中的任意一个，更进一步地，由于三维的人物区域图像具有深度信息，因此可在多个计算融合区域中寻找与三维人物区域图像的深度信息相匹配的计算融合区域作为融合位置，以得到更好的融合效果。In some embodiments, the predetermined fusion region or computational fusion region in the predetermined three-dimensional background image may be one or more. When the predetermined fusion area is one, the fusion position of the two-dimensional person area image or the three-dimensional person area image in the predetermined three-dimensional background image is the only one predetermined fusion area; when the calculation fusion area is one, two-dimensional The fusion position of the character area image or the three-dimensional character area image in the predetermined three-dimensional background image is the only one calculation fusion area; when the predetermined fusion area is plural, the two-dimensional person area image or the three-dimensional character area The fusion position of the image in the predetermined three-dimensional background image may be any one of a plurality of predetermined fusion regions, and further, since the three-dimensional person region image has depth information, the three-dimensional person region may be searched for in a plurality of predetermined fusion regions. The predetermined fusion region matching the depth information of the image is used as the fusion position to obtain a better fusion effect; when the calculation fusion region is plural, the two-dimensional human region image or the three-dimensional human region image is calculated in the three-dimensional background image. The fusion location can be any of a plurality of computational fusion regions. Further, since the three-dimensional person region image has depth information, the calculated fusion region matching the depth information of the three-dimensional person region image can be found as a fusion position in the plurality of calculation fusion regions to obtain a better fusion effect.

处理器20得到人物区域图像后，即可将人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)进行融合，进而得到合并图像。随后，再对合并图像进行处理以识别特定物体，从而匹配出预定声音模型。最终将预定声音模型与合并图像融合输出有声图像。After the processor 20 obtains the image of the person region, the image of the person region can be merged with the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) to obtain a combined image. The merged image is then processed to identify a particular object to match the predetermined sound model. Finally, the predetermined sound model is merged with the merged image to output an audio image.

其中，有声图像可以由一帧合并图像与预定声音模型组成，也可由多帧合并图像与预定声音模型组成，此时的有声图像为有声视频。The sound image may be composed of a frame merged image and a predetermined sound model, or may be composed of a multi-frame merged image and a predetermined sound model, and the sound image at this time is a sound video.

请参阅图12，在某些实施方式中，本发明实施方式的图像处理方法还包括：Referring to FIG. 12, in some embodiments, the image processing method of the embodiment of the present invention further includes:

011：判断合并图像中的预定背景图像(预定二维背景图像或预定三维背景图像)是否有关联存储的预定声音模型；011: determining whether a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) in the merged image has a predetermined sound model stored in association;

012：在预定背景图像(预定二维背景图像或预定三维背景图像)具有关联存储的预定声音模型时，将合并图像与预定声音模型融合以输出有声图像；在预定背景图像(预定二维背景图像或预定三维背景图像)不具有关联存储的预定声音模型时，进入步骤03识别合并图像中的特定物体。012: when the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) has a predetermined stored sound model, the merged image is merged with the predetermined sound model to output the sound image; and the predetermined background image (predetermined two-dimensional background image) When the predetermined three-dimensional background image does not have a predetermined stored sound model, the process proceeds to step 03 to identify a specific object in the merged image.

请再参阅图2，在某些实施方式中，步骤011和步骤012均可以由处理器20实现。也即是说，处理器20还可用于判断合并图像中的预定背景图像(预定二维背景图像或预定三维背景图像)是否有关联存储的预定声音模型，在预定背景图像(预定二维背景图像或预定三维背景图像)具有关联存储的预定声音模型时，将合并图像与预定声音模型融合以输出有声图像，以及在预定背景图像(预定二维背景图像或预定三维背景图像)不具有关联存储的预定声音模型时，进入步骤03识别合并图像中的特定物体。Referring to FIG. 2 again, in some embodiments, both step 011 and step 012 can be implemented by processor 20. That is to say, the processor 20 can also be used to determine whether a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) in the merged image is associated with a predetermined sound model, in a predetermined background image (predetermined two-dimensional background image) Or when the predetermined three-dimensional background image has a predetermined stored sound model, the merged image is fused with the predetermined sound model to output the sound image, and the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) has no associated storage. When the sound model is predetermined, the process proceeds to step 03 to identify a specific object in the merged image.

具体地，合并图像中预定背景图像(预定二维背景图像或预定三维背景图像)在前期构建或选取时，出现在每一预定背景图像(预定二维背景图像或预定三维背景图像)中的特定物体是已知的，此时即可直接将与特定物体匹配的预定声音模型与预定背景图像(预定二维背景图像或预定三维背景图像)进行关联存储。如此，在处理器20使用到某一幅预定背景图像(预定二维背景图像或预定三维背景图像)时可以直接将关联存储的预定声音模型与合并图像进行融合以输出有声图像。当然，如果预定背景图像(预定二维背景图像或预定三维背景图像)未与预定声音模型进行关联存储，则在处理器20使用该幅预定背景图像(预定二维背景图像或预定三维背景图像)时首先需要识别预定背景图像(预定二维背景图像或预定三维背景图像)中的特定物体，随后根据识别出来的特定物体从预存的多曲预定声音模型中选择与特定物体相匹配的预定声音模型，最终将选取的预定声音模型与合并图像进行融合以输出有声图像。Specifically, the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) in the merged image is present in each predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) when it is constructed or selected in the previous stage. The object is known, and at this time, a predetermined sound model matching the specific object can be directly stored in association with a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image). As such, when the processor 20 uses a predetermined background image (a predetermined two-dimensional background image or a predetermined three-dimensional background image), the associated stored predetermined sound model and the merged image may be directly merged to output the sound image. Of course, if the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) is not stored in association with the predetermined sound model, the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) is used at the processor 20. First, it is necessary to identify a specific object in a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image), and then select a predetermined sound model matching the specific object from the pre-stored poly-predetermined sound model according to the identified specific object. Finally, the selected predetermined sound model is merged with the merged image to output an audio image.

请参阅图13，在某些实施方式中，本发明实施方式的图像处理方法还包括：Referring to FIG. 13, in some embodiments, the image processing method of the embodiment of the present invention further includes:

05：播放有声图像，播放有声图像为自动播放或根据触发播放请求进行播放。05: Play an audio image, play the audio image for automatic playback or play according to the trigger playback request.

请参阅图14，在某些实施方式中，图像处理装置100还包括电声元件70和显示器50。步骤05可以由电声元件70和显示器50共同实现。其中，图像由显示器50进行显示，声音由电声元件70进行播放。电声元件70可以是扬声器、耳机、传声器、唱头等。Referring to FIG. 14, in some embodiments, the image processing apparatus 100 further includes an electroacoustic component 70 and a display 50. Step 05 can be implemented by electroacoustic element 70 and display 50. Among them, the image is displayed by the display 50, and the sound is played by the electroacoustic element 70. The electroacoustic component 70 can be a speaker, an earphone, a microphone, a cartridge, and the like.

具体地，有声图像进行播放时，可以是默认播放图像而不播放声音。但当前用户可以选择触发播放请求进行图像和声音的同时播放。在当前用户不触发播放请求时，只播放图像而不播放声音。此外，当有声图像中的合并图像为多帧，例如，当前用户与好友在视频聊天时，好友看到的当前用户的图像即为合并图像，此时，当前用户和好友均可触发播放请求进行图像和声音的同时播放。如此，为用户的视频聊天的过程增添趣味性。此外，当前用户和好友在有声图像的图像和声音同时播放的状态下再次触发播放请求，则显示器50继续显示合并图像，而电声元件70停止播放声音。Specifically, when the audio image is played, it may be the default playback image without playing the sound. However, the current user can choose to trigger a play request to play the image and sound simultaneously. When the current user does not trigger a play request, only the image is played without playing the sound. In addition, when the merged image in the audio image is a multi-frame, for example, when the current user and the friend are in a video chat, the current user's image seen by the friend is a merged image, and at this time, the current user and the friend may trigger the play request to perform the playback request. Play images and sound simultaneously. In this way, the user's video chat process adds fun. Further, the current user and the friend trigger the play request again in a state where the image and sound of the sound image are simultaneously played, and the display 50 continues to display the merged image, and the electroacoustic element 70 stops playing the sound.

有声图像播放时，也可以是默认图像和声音同时播放。此时，当前用户可以选择触发播放请求从而停止声音的播放。When the audio image is played, it is also possible to play the default image and sound simultaneously. At this time, the current user can choose to trigger the play request to stop the playback of the sound.

在某些实施方式中，与特定物体匹配的预定声音模型包括一曲或多曲。In some embodiments, the predetermined sound model that matches a particular object includes one or more songs.

在预定声音模型包括一曲时，在有声图像播放期间，预定声音模型播放一次或多次。也即是说，在播放的有声图像包括一帧合并图像和一曲预定声音模型时，显示器50持续显示一帧的合并图像，电声元件70播放一次预定声音模型，或者循环多次播放预定声音模型。在播放的有声图像包括多帧合并图像和一曲预定声音模型时，显示器50以一定的帧率显示上述的多帧合并图像，期间电声元件70播放一次预定声音模型，或者循环多次播放预定声音模型。When the predetermined sound model includes a song, the predetermined sound model is played one or more times during the playback of the sound image. That is to say, when the played audio image includes a frame merged image and a predetermined sound model, the display 50 continuously displays the merged image of one frame, the electro-acoustic component 70 plays the predetermined sound model once, or repeats the predetermined sound repeatedly. model. When the played audio image includes a multi-frame merged image and a predetermined sound model, the display 50 displays the above-described multi-frame merged image at a certain frame rate, during which the electro-acoustic component 70 plays a predetermined sound model, or cycles through multiple times to reserve. Sound model.

在预定声音模型包括多曲时，预定声音模型顺序存储成列表，在有声图像播放期间，多个所述预定声音模型以顺序播放、随机播放、单曲循环、列表循环中的任意一种方式播放。也即是说，在播放的有声图像包括一帧合并图像和多曲预定声音模型时，多曲的预定声音模型顺序存储成列表，显示器50持续显示一帧的合并图像，电声元件70可按多曲预定声音模型的列表的存储顺序，顺序播放一次多曲预定声音模型，或者循环该列表顺序多次播放多曲的预定声音模型，或者随机播放列表中的预定声音模型，或者选取多曲预定声音模型中的一首进行循环播放。在播放的有声图像包括多帧合并图像和一曲预定声音模型时，显示器50以一定的帧率显示上述的多帧合并图像，期间电声元件70可按多曲预定声音模型的列表的存储顺序，顺序播放一次多曲预定声音模型，或者循环该列表顺序多次播放多曲的预定声音模型，或者随机播放列表中的预定声音模型，或者选取多曲预定声音模型中的一首进行循环播放。When the predetermined sound model includes a plurality of songs, the predetermined sound models are sequentially stored in a list, and during the sound image playback, the plurality of the predetermined sound models are played in any one of a sequential play, a random play, a single loop, and a list loop. . That is to say, when the played audio image includes a frame merge image and a multi-track predetermined sound model, the plurality of predetermined sound models are sequentially stored in a list, and the display 50 continuously displays the merged image of one frame, and the electro-acoustic component 70 can be pressed. a sequence of storing a list of predetermined sound models, sequentially playing a plurality of predetermined sound models, or looping through the list to sequentially play a plurality of predetermined sound models, or randomly playing a predetermined sound model in the list, or selecting a multi-track reservation One of the sound models is looped. When the played audio image includes the multi-frame merged image and a predetermined sound model, the display 50 displays the above-described multi-frame merged image at a certain frame rate, during which the electro-acoustic component 70 can order the storage of the list of predetermined sound models. The plurality of predetermined sound models are sequentially played, or the predetermined sound model of the plurality of songs is repeatedly played in the order of the list, or the predetermined sound model in the random play list is selected, or one of the plurality of predetermined sound models is selected for loop playback.

请一并参阅图3和图14，本发明实施方式还提出一种电子装置1000。电子装置1000包括图像处理装置100。图像处理装置100可以利用硬件和/或软件实现。图像处理装置100包括成像设备10和处理器20。Referring to FIG. 3 and FIG. 14 together, an embodiment of the present invention further provides an electronic device 1000. The electronic device 1000 includes an image processing device 100. The image processing apparatus 100 can be implemented using hardware and/or software. The image processing apparatus 100 includes an imaging device 10 and a processor 20.

成像设备10包括可见光摄像头11和深度图像采集组件12。The imaging device 10 includes a visible light camera 11 and a depth image acquisition assembly 12.

具体地，可见光摄像头11包括图像传感器111和透镜112，可见光摄像头11可用于捕捉当前用户的彩色信息以获得场景图像，其中，图像传感器111包括彩色滤镜阵列(如Bayer滤镜阵列)，透镜112的个数可为一个或多个。可见光摄像头11在获取场景图像过程中，图像传感器111中的每一个成像像素感应来自拍摄场景中的光强度和波长信息，生成一组原始图像数据；图像传感器111将该组原始图像数据发送至处理器20中，处理器20对原始图像数据进行去噪、插值等运算后即得到彩色的场景图像。处理器20可按多种格式对原始图像数据中的每个图像像素逐一处理，例如，每个图像像素可具有8、10、12或14比特的位深度，处理器20可按相同或不同的位深度对每一个图像像素进行处理。Specifically, the visible light camera 11 includes an image sensor 111 and a lens 112, and the visible light camera 11 can be used to capture color information of the current user to obtain a scene image, wherein the image sensor 111 includes a color filter array (such as a Bayer filter array), and the lens 112 The number can be one or more. During acquisition of the scene image by the visible light camera 11, each of the image pixels in the image sensor 111 senses light intensity and wavelength information from the captured scene to generate a set of raw image data; the image sensor 111 sends the set of raw image data to the processing In the processor 20, the processor 20 obtains a color scene image by performing operations such as denoising, interpolation, and the like on the original image data. The processor 20 can process each image pixel in the original image data one by one in a plurality of formats, for example, each image pixel can have a bit depth of 8, 10, 12 or 14 bits, and the processor 20 can be the same or different. The bit depth is processed for each image pixel.

深度图像采集组件12包括结构光投射器121和结构光摄像头122，深度图像采集组件12可用于捕捉当前用户的深度信息以得到深度图像。结构光投射器121用于将结构光投射至当前用户，其中，结构光图案可以是激光条纹、格雷码、正弦条纹或者随机排列的散斑图案等。结构光摄像头122包括图像传感器1221和透镜1222，透镜1222的个数可为一个或多个。图像传感器1221用于捕捉结构光投射器121投射至当前用户上的结构光图像。结构光图像可由深度采集组件12发送至处理器20进行解调、相位恢复、相位信息计算等处理以获取当前用户的深度信息。The depth image acquisition component 12 includes a structured light projector 121 and a structured light camera 122 that can be used to capture depth information of the current user to obtain a depth image. The structured light projector 121 is for projecting structured light to a current user, wherein the structured light pattern may be a laser stripe, a Gray code, a sinusoidal stripe, or a randomly arranged speckle pattern or the like. The structured light camera 122 includes an image sensor 1221 and a lens 1222, and the number of the lenses 1222 may be one or more. Image sensor 1221 is used to capture a structured light image that structured light projector 121 projects onto the current user. The structured light image may be transmitted by the depth acquisition component 12 to the processor 20 for processing such as demodulation, phase recovery, phase information calculation, etc. to obtain depth information of the current user.

在某些实施方式中，可见光摄像头11与结构光摄像头122的功能可由一个摄像头实现，也即是说，成像设备10仅包括一个摄像头和一个结构光投射器121，上述摄像头不仅可以拍摄场景图像，还可拍摄结构光图像。In some embodiments, the functions of the visible light camera 11 and the structured light camera 122 can be implemented by a single camera, that is, the imaging device 10 includes only one camera and one structured light projector 121, which can not only capture scene images, It is also possible to take a structured light image.

除了采用结构光获取深度图像外，还可通过双目视觉方法、基于飞行时间差(Time of Flight，TOF)等深度像获取方法来获取当前用户的深度图像。In addition to acquiring depth images using structured light, depth images of the current user can be acquired by a binocular vision method, a depth image acquisition method based on Time of Flight (TOF).

处理器20进一步用于将从场景图像和深度图像中提取的人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)融合得到合并图像，并处理合并图像以确定预定声音模型，最终将合并图像与预定声音模型融合输出有声图像。在提取人物区域图像时，处理器20可以结合深度图像中的深度信息从场景图像中提取出二维的人物区域图像，也可以根据深度图像中的深度信息建立人物区域的三维图，再结合场景图像中的色彩信息对三维的人物区域进行颜色填补以得到三维的彩色的人物区域图像。因此，融合处理人物区域图像和预定背景图像(预定二维背景图像或预定三维背景图像)时可以是将二维的人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)进行融合以得到合并图像，也可以是将三维的彩色的人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)进行融合以得到合并图像。The processor 20 is further configured to fuse the person region image extracted from the scene image and the depth image with a predetermined background image (predetermined two-dimensional background image or a predetermined three-dimensional background image) to obtain a merged image, and process the merged image to determine a predetermined sound model, Finally, the merged image is fused with the predetermined sound model to output the sound image. When extracting the image of the person region, the processor 20 may extract the two-dimensional image of the person region from the scene image in combination with the depth information in the depth image, or create a three-dimensional image of the person region according to the depth information in the depth image, and then combine the scene The color information in the image is color-filled to the three-dimensional character area to obtain a three-dimensional colored person area image. Therefore, the fusion processing of the person region image and the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) may be performed by performing the two-dimensional person region image and the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) Fusion to obtain a merged image may also be performed by fusing a three-dimensional colored person region image with a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) to obtain a merged image.

此外，图像处理装置100还包括存储器30。存储器30可内嵌在电子装置1000中，也可以是独立于电子装置1000外的存储器，并可包括直接存储器存取(Direct Memory Access，DMA)特征。可见光摄像头11采集的原始图像数据或深度图像采集组件12采集的结构光图像相关数据均可传送至存储器30中进行存储或缓存。预定声音模型也可存储在存储器30中。处理器20可从存储器30中读取原始图像数据以进行处理得到场景图像，也可从存储器30中读取结构光图像相关数据以进行处理得到深度图像，还可从存储器30中读取预定声音模型以进行合并图像的进一步处理。另外，场景图像和深度图像还可存储在存储器30中，以供处理器20随时调用处理，例如，处理器20调用场景图像和深度图像进行人物区域提取，并将提取后的得到的人物区域图像与预定背景图像(预定二维背景图像或预定三维背景图像)进行融合处理以得到合并图像，随后处理合并图像识别特定物体，再寻找与特定物体匹配的预定声音模型，最终将合并图像与预定声音模型融合以输出有声图像。其中，预定背景图像(预定二维背景图像或预定三维背景图像)、合并图像和有声图像也可存储在存储器30中。Further, the image processing apparatus 100 further includes a memory 30. The memory 30 can be embedded in the electronic device 1000, or can be independent of the memory outside the electronic device 1000, and can include direct memory access (DMA) features. The raw image data acquired by the visible light camera 11 or the structured light image related data collected by the depth image acquisition component 12 may be transferred to the memory 30 for storage or buffering. The predetermined sound model can also be stored in the memory 30. The processor 20 can read raw image data from the memory 30 for processing to obtain a scene image, and can also read structured light image related data from the memory 30 for processing to obtain a depth image, and can also read a predetermined sound from the memory 30. The model is used for further processing of the merged image. In addition, the scene image and the depth image may also be stored in the memory 30 for the processor 20 to call the processing at any time. For example, the processor 20 calls the scene image and the depth image to perform the character region extraction, and the extracted image of the person region is obtained. Performing fusion processing with a predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image) to obtain a merged image, then processing the merged image to identify a specific object, and then searching for a predetermined sound model matching the specific object, and finally combining the image with the predetermined sound The model is fused to output an audio image. Among them, the predetermined background image (predetermined two-dimensional background image or predetermined three-dimensional background image), merged image, and voiced image may also be stored in the memory 30.

图像处理装置100还可包括显示器50。显示器50可直接从处理器20中获取有声图像的合并图像，还可从存储器30中获取有声图像的合并图像。显示器50显示有声图像中的合并图像以供用户观看，或者由图形引擎或图形处理器(Graphics Processing Unit，GPU)进行进一步的处理。图像处理装置100还包括编码器/解码器60，编码器/解码器60可编解码场景图像、深度图像及合并图像等的图像数据，编码的图像数据可被保存在存储器30中，并可以在图像显示在显示器50上之前由解码器解压缩以进行显示。编码器/解码器60可由中央处理器(Central Processing Unit，CPU)、GPU或协处理器实现。换言之，编码器/解码器60可以是中央处理器(Central Processing Unit，CPU)、GPU、及协处理器中的任意一种或多种。The image processing apparatus 100 may also include a display 50. The display 50 can acquire a merged image of the sound image directly from the processor 20, and can also acquire a merged image of the sound image from the memory 30. Display 50 displays the merged images in the sound image for viewing by the user or for further processing by a graphics engine or a Graphics Processing Unit (GPU). The image processing apparatus 100 further includes an encoder/decoder 60 that can encode image data of a scene image, a depth image, and a merged image, etc., and the encoded image data can be saved in the memory 30 and can be The image is decompressed by the decoder for display prior to display on display 50. Encoder/decoder 60 may be implemented by a Central Processing Unit (CPU), GPU, or coprocessor. In other words, the encoder/decoder 60 may be any one or more of a central processing unit (CPU), a GPU, and a coprocessor.

图像处理装置100还包括控制逻辑器40。成像设备10在成像时，处理器20会根据成像设备获取的数据进行分析以确定成像设备10的一个或多个控制参数(例如，曝光时间等)的图像统计信息。处理器20将图像统计信息发送至控制逻辑器40，控制逻辑器40控制成像设备10以确定好的控制参数进行成像。控制逻辑器40可包括执行一个或多个例程(如固件)的处理器和/或微控制器。一个或多个例程可根据接收的图像统计信息确定成像设备10的控制参数。The image processing apparatus 100 also includes a control logic 40. When the imaging device 10 is imaging, the processor 20 analyzes the data acquired by the imaging device to determine image statistical information of one or more control parameters (eg, exposure time, etc.) of the imaging device 10. Processor 20 sends image statistics to control logic 40, which controls imaging device 10 to determine good control parameters for imaging. Control logic 40 may include a processor and/or a microcontroller that executes one or more routines, such as firmware. One or more routines may determine control parameters of imaging device 10 based on the received image statistics.

图像处理装置100还可包括电声元件70，电声元件70用于播放有声图像中的预定声音模型。电声元件70通常由振动膜、音圈、永久磁铁、支架等组成。当电声元件70的音圈通入音频电流后，在电流的作用下产生交变磁场，永久磁铁同时也产生一个大小和方向不变的恒定的磁场。由于音圈所产生的磁场的大小和方向随着音频电流的变化不断在改变，这样两个磁场的相互作用使音圈作垂直于音圈中电流方向的运动，由于音圈和振动膜相连，从而带动振动膜产生振动，由振动膜引起空气的震动而发出声音。电声元件70可从处理器20获取有声图像中的预定声音模型进行播放，也可从存储器30中获取有声图像中的预定声音模型进行播放。The image processing apparatus 100 may further include an electroacoustic element 70 for playing a predetermined sound model in the sound image. The electroacoustic element 70 is usually composed of a diaphragm, a voice coil, a permanent magnet, a bracket, and the like. When the voice coil of the electroacoustic element 70 is supplied with an audio current, an alternating magnetic field is generated by the action of the current, and the permanent magnet also generates a constant magnetic field of constant magnitude and direction. Since the magnitude and direction of the magnetic field generated by the voice coil are constantly changing with the change of the audio current, the interaction of the two magnetic fields causes the voice coil to move perpendicular to the direction of the current in the voice coil. Since the voice coil is connected to the diaphragm, Thereby, the vibrating membrane is caused to generate vibration, and the vibrating membrane causes vibration of the air to emit sound. The electroacoustic component 70 can acquire a predetermined sound model in the sound image from the processor 20 for playback, or can acquire a predetermined sound model in the sound image from the memory 30 for playback.

请参阅图15，本发明实施方式的电子装置1000包括一个或多个处理器20、存储器30和一个或多个程序31。其中一个或多个程序31被存储在存储器30中，并且被配置成由一个或多个处理器20执行。程序31包括用于执行上述任意一项实施方式的图像处理方法的指令。Referring to FIG. 15, an electronic device 1000 of an embodiment of the present invention includes one or more processors 20, a memory 30, and one or more programs 31. One or more of the programs 31 are stored in the memory 30 and are configured to be executed by one or more processors 20. The program 31 includes instructions for executing the image processing method of any of the above embodiments.

例如，请结合图1，程序31包括用于执行以下步骤所述的图像处理方法的指令：For example, referring to FIG. 1, the program 31 includes instructions for performing the image processing method described in the following steps:

再例如，请结合图9，程序31还包括用于执行以下步骤所述的图像处理方法的指令：For another example, in conjunction with FIG. 9, the program 31 further includes instructions for performing the image processing method described in the following steps:

本发明实施方式的计算机可读存储介质包括与能够摄像的电子装置1000结合使用的计算机程序。计算机程序可被处理器20执行以完成上述任意一项实施方式的图像处理方法。A computer readable storage medium in accordance with an embodiment of the present invention includes a computer program for use in conjunction with an electronic device 1000 capable of imaging. The computer program can be executed by the processor 20 to perform the image processing method of any of the above embodiments.

例如，请结合图1，计算机程序可被处理器20执行以完成以下步骤所述的图像处理方法：For example, in conjunction with FIG. 1, a computer program can be executed by processor 20 to perform the image processing methods described in the following steps:

再例如，请结合图9，计算机程序还可被处理器20执行以完成以下步骤所述的图像处理方法：For another example, please refer to FIG. 9, the computer program can also be executed by the processor 20 to complete the image processing method described in the following steps:

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of the present specification, the description with reference to the terms "one embodiment", "some embodiments", "example", "specific example", or "some examples" and the like means a specific feature described in connection with the embodiment or example. A structure, material or feature is included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms is not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples. In addition, various embodiments or examples described in the specification and features of various embodiments or examples may be combined and combined without departing from the scope of the invention.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。Moreover, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, features defining "first" and "second" may include at least one of the features, either explicitly or implicitly. In the description of the present invention, the meaning of "a plurality" is at least two, such as two, three, etc., unless specifically defined otherwise.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本发明的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code that includes one or more executable instructions for implementing the steps of a particular logical function or process. And the scope of the preferred embodiments of the invention includes additional implementations, in which the functions may be performed in a substantially simultaneous manner or in an opposite order depending on the functions involved, in the order shown or discussed. It will be understood by those skilled in the art to which the embodiments of the present invention pertain.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described, it is understood that the above-described embodiments are illustrative and are not to be construed as limiting the scope of the invention. The embodiments are subject to variations, modifications, substitutions and variations.

Claims

一种图像处理方法，用于处理合并图像，所述合并图像由预定背景图像与当前用户在真实场景下的场景图像中的人物区域图像融合而成，其特征在于，所述图像处理方法包括：An image processing method for processing a merged image, the merged image being fused by a predetermined background image and a person region image in a scene image of a current user in a real scene, wherein the image processing method comprises:

识别所述合并图像中的特定物体；和Identifying a particular object in the merged image; and

将与所述特定物体匹配的预定声音模型与所述合并图像融合以输出有声图像。A predetermined sound model matching the specific object is fused with the merged image to output a sound image.
根据权利要求1所述的图像处理方法，其特征在于，所述预定背景图像包括预定二维背景图像或预定三维背景图像中。The image processing method according to claim 1, wherein the predetermined background image comprises a predetermined two-dimensional background image or a predetermined three-dimensional background image.
根据权利要求2所述的图像处理方法，其特征在于，所述图像处理方法还包括：The image processing method according to claim 2, wherein the image processing method further comprises:

判断所述合并图像中的所述预定背景图像是否有关联存储的预定声音模型；Determining whether the predetermined background image in the merged image has a predetermined sound model stored in association;

在所述预定背景图像具有所述关联存储的预定声音模型时，将所述合并图像与所述预定声音模型融合以输出有声图像；和And integrating the merged image with the predetermined sound model to output a sound image when the predetermined background image has the associated stored predetermined sound model; and

在所述预定背景图像不具有所述关联存储的预定声音模型时，进入所述识别所述合并图像中的特定物体的步骤。When the predetermined background image does not have the associated sound model stored, the step of identifying the specific object in the merged image is entered.
根据权利要求2所述的图像处理方法，其特征在于，所述图像处理方法还包括：The image processing method according to claim 2, wherein the image processing method further comprises:

获取所述当前用户的场景图像；Obtaining a scene image of the current user;

获取所述当前用户的深度图像；Obtaining a depth image of the current user;

处理所述场景图像和所述深度图像以提取所述当前用户在所述场景图像中的人物区域而获得人物区域图像；和Processing the scene image and the depth image to extract a character area of the current user in the scene image to obtain a person area image; and

将所述人物区域图像与所述预定背景图像融合以得到合并图像。The person region image is fused with the predetermined background image to obtain a merged image.
根据权利要求4所述的图像处理方法，其特征在于，所述获取所述当前用户的深度图像的步骤包括：The image processing method according to claim 4, wherein the step of acquiring the depth image of the current user comprises:

向所述当前用户投射结构光；Projecting structured light to the current user;

拍摄经所述当前用户调制的结构光图像；和Taking a structured light image modulated by the current user; and

解调所述结构光图像的各个像素对应的相位信息以得到所述深度图像。Phase information corresponding to each pixel of the structured light image is demodulated to obtain the depth image.
根据权利要求5所述的图像处理方法，其特征在于，所述解调所述结构光图像的各个像素对应的相位信息以得到所述深度图像的步骤包括：The image processing method according to claim 5, wherein the step of demodulating phase information corresponding to each pixel of the structured light image to obtain the depth image comprises:

解调所述结构光图像中各个像素对应的相位信息；Demodulating phase information corresponding to each pixel in the structured light image;

将所述相位信息转化为深度信息；和Converting the phase information into depth information; and

根据所述深度信息生成所述深度图像。The depth image is generated based on the depth information.
根据权利要求4所述的图像处理方法，其特征在于，在所述预定背景图像为所述预定三维背景图像时，所述将所述人物区域图像与所述预定背景图像融合以得到合并图像的步骤包括：The image processing method according to claim 4, wherein when the predetermined background image is the predetermined three-dimensional background image, the merging the person region image with the predetermined background image to obtain a merged image The steps include:

获取所述预定三维背景图像中的预定融合区域；Obtaining a predetermined fusion region in the predetermined three-dimensional background image;

根据所述人物区域图像确定所述预定融合区域的待替换像素区域；和Determining a pixel area to be replaced of the predetermined fusion area according to the person area image; and

将所述预定融合区域的待替换像素区域替换为所述人物区域图像以得到所述合并图像。The pixel area to be replaced of the predetermined fusion area is replaced with the person area image to obtain the merged image.
根据权利要求4所述的图像处理方法，其特征在于，在所述预定背景图像为所述预定三维背景图像时，所述将所述人物区域图像与所述预定背景图像融合以得到合并图像的步骤包括：The image processing method according to claim 4, wherein when the predetermined background image is the predetermined three-dimensional background image, the merging the person region image with the predetermined background image to obtain a merged image The steps include:

处理所述预定三维背景图像以得到所述预定三维背景图像的全场边缘图像；Processing the predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image;

获取所述预定三维背景图像的深度数据；Obtaining depth data of the predetermined three-dimensional background image;

根据所述预定三维背景图像的全场边缘图像及所述深度数据确定所述预定三维背景图像的计算融合区域；Determining a calculated fusion region of the predetermined three-dimensional background image according to the full-field edge image of the predetermined three-dimensional background image and the depth data;

根据所述人物区域图像确定所述计算融合区域的待替换像素区域；和Determining, according to the character area image, the pixel area to be replaced of the calculated fusion area; and

将所述计算融合区域的待替换像素区域替换为所述人物区域图像以得到所述合并图像。The pixel area to be replaced of the calculated fusion region is replaced with the person region image to obtain the merged image.
根据权利要求1所述的图像处理方法，其特征在于，所述与所述特定物体匹配的预定声音模型包括一曲或多曲；The image processing method according to claim 1, wherein the predetermined sound model matching the specific object comprises one or more songs;

在所述与所述特定物体匹配的预定声音模型包括一曲时，在所述有声图像播放期间，所述预定声音模型播放一次或多次；And when the predetermined sound model matching the specific object includes a song, the predetermined sound model is played one or more times during the sound image playing;

在所述与所述特定物体匹配的预定声音模型包括多曲时，所述预定声音模型顺序存储成列表，在所述有声图像播放期间，多个所述预定声音模型以顺序播放、随机播放、单曲循环、列表循环中的任意一种方式播放。When the predetermined sound model matching the specific object includes a plurality of songs, the predetermined sound models are sequentially stored in a list, and during the sound image playing, a plurality of the predetermined sound models are sequentially played, randomly played, Play in any of the single loop and list loop.
根据权利要求1所述的图像处理方法，其特征在于，所述图像处理方法还包括：The image processing method according to claim 1, wherein the image processing method further comprises:

播放所述有声图像，所述播放所述有声图像为自动播放或根据触发播放请求进行播放。Playing the sound image, the playing the sound image is playing automatically or playing according to a trigger play request.
一种图像处理装置，用于处理合并图像，所述合并图像由所述预定背景图像与当前用户在真实场景下的场景图像中的人物区域图像融合而成，其特征在于，所述图像处理装置包括处理器，所述处理器用于：An image processing apparatus for processing a merged image, the merged image being formed by fusing the predetermined background image with a person region image in a scene image of a current user in a real scene, wherein the image processing device A processor is included, the processor for:

识别所述合并图像中的特定物体；和Identifying a particular object in the merged image; and

将与所述特定物体匹配的预定声音模型与所述合并图像融合以输出有声图像。A predetermined sound model matching the specific object is fused with the merged image to output a sound image.
根据权利要求11所述的图像处理装置，其特征在于，所述预定背景图像包括预定二维背景图像或预定三维背景图像。The image processing device according to claim 11, wherein said predetermined background image comprises a predetermined two-dimensional background image or a predetermined three-dimensional background image.
根据权利要求12所述的图像处理装置，其特征在于，所述处理器还用于：The image processing device according to claim 12, wherein the processor is further configured to:

判断所述合并图像中的所述预定背景图像是否有关联存储的预定声音模型；Determining whether the predetermined background image in the merged image has a predetermined sound model stored in association;

在所述预定背景图像具有所述关联存储的预定声音模型时，将所述合并图像与所述预定声音模型融合以输出有声图像；和And integrating the merged image with the predetermined sound model to output a sound image when the predetermined background image has the associated stored predetermined sound model; and

在所述预定背景图像不具有所述关联存储的预定声音模型时，进入所述识别所述合并图像中的特定物体的步骤。When the predetermined background image does not have the associated sound model stored, the step of identifying the specific object in the merged image is entered.
根据权利要求12所述的图像处理装置，其特征在于，所述图像处理装置还包括：The image processing device according to claim 12, wherein the image processing device further comprises:

可见光摄像头，所述可见光摄像头用于获取所述当前用户的场景图像；a visible light camera, wherein the visible light camera is configured to acquire a scene image of the current user;

深度图像采集组件，所述深度图像采集组件用于获取所述当前用户的深度图像；a depth image acquisition component, the depth image acquisition component is configured to acquire a depth image of the current user;

所述处理器还用于：The processor is further configured to:

处理所述场景图像和所述深度图像以提取所述当前用户在所述场景图像中的人物区域而获得人物区域图像；和Processing the scene image and the depth image to extract a character area of the current user in the scene image to obtain a person area image; and

将所述人物区域图像与所述预定背景图像融合以得到合并图像。The person region image is fused with the predetermined background image to obtain a merged image.
根据权利要求14所述的图像处理装置，其特征在于，所述深度图像采集组件包括结构光投射器和结构光摄像头，所述结构光投射器用于向所述当前用户投射结构光；The image processing device according to claim 14, wherein the depth image acquisition component comprises a structured light projector and a structured light camera, wherein the structured light projector is configured to project structured light to the current user;

所述结构光摄像头用于：The structured light camera is used to:

拍摄经所述当前用户调制的结构光图像；和Taking a structured light image modulated by the current user; and

解调所述结构光图像的各个像素对应的相位信息以得到所述深度图像。Phase information corresponding to each pixel of the structured light image is demodulated to obtain the depth image.
根据权利要求15所述的图像处理装置，其特征在于，所述结构光摄像头还用于：The image processing device according to claim 15, wherein the structured light camera is further configured to:

解调所述结构光图像中各个像素对应的相位信息；Demodulating phase information corresponding to each pixel in the structured light image;

将所述相位信息转化为深度信息；和Converting the phase information into depth information; and

根据所述深度信息生成所述深度图像。The depth image is generated based on the depth information.
根据权利要求14所述的图像处理装置，其特征在于，在所述预定背景图像为所述预定三维背景图像时，所述处理器还用于：The image processing apparatus according to claim 14, wherein when the predetermined background image is the predetermined three-dimensional background image, the processor is further configured to:

获取所述预定三维背景图像中的预定融合区域；Obtaining a predetermined fusion region in the predetermined three-dimensional background image;

根据所述人物区域图像确定所述预定融合区域的待替换像素区域；和Determining a pixel area to be replaced of the predetermined fusion area according to the person area image; and

将所述预定融合区域的待替换像素区域替换为所述人物区域图像以得到所述合并图像。The pixel area to be replaced of the predetermined fusion area is replaced with the person area image to obtain the merged image.
根据权利要求14所述的图像处理装置，其特征在于，在所述预定背景图像为所述预定三维背景图像时，所述处理器还用于：The image processing apparatus according to claim 14, wherein when the predetermined background image is the predetermined three-dimensional background image, the processor is further configured to:

处理所述预定三维背景图像以得到所述预定三维背景图像的全场边缘图像；Processing the predetermined three-dimensional background image to obtain a full-field edge image of the predetermined three-dimensional background image;

获取所述预定三维背景图像的深度数据；Obtaining depth data of the predetermined three-dimensional background image;

根据所述预定三维背景图像的全场边缘图像及所述深度数据确定所述预定三维背景图像的计算融合区域；Determining a calculated fusion region of the predetermined three-dimensional background image according to the full-field edge image of the predetermined three-dimensional background image and the depth data;

根据所述人物区域图像确定所述计算融合区域的待替换像素区域；和Determining, according to the character area image, the pixel area to be replaced of the calculated fusion area; and

将所述计算融合区域的待替换像素区域替换为所述人物区域图像以得到所述合并图像。The pixel area to be replaced of the calculated fusion region is replaced with the person region image to obtain the merged image.
根据权利要求11所述的图像处理装置，其特征在于，所述与所述特定物体匹配的预定声音模型包括一曲或多曲；The image processing device according to claim 11, wherein said predetermined sound model matching said specific object comprises one or more songs;

在所述与所述特定物体匹配的预定声音模型包括一曲时，在所述有声图像播放期间，所述预定声音模型播放一次或多次；And when the predetermined sound model matching the specific object includes a song, the predetermined sound model is played one or more times during the sound image playing;

在所述与所述特定物体匹配的预定声音模型包括多曲时，所述预定声音模型顺序存储成列表，在所述有声图像播放期间，多个所述预定声音模型以顺序播放、随机播放、单曲循环、列表循环中的任意一种方式播放。When the predetermined sound model matching the specific object includes a plurality of songs, the predetermined sound models are sequentially stored in a list, and during the sound image playing, a plurality of the predetermined sound models are sequentially played, randomly played, Play in any of the single loop and list loop.
根据权利要求11所述的图像处理装置，其特征在于，所述图像处理装置还包括：The image processing device according to claim 11, wherein the image processing device further comprises:

电声元件及显示器，所述电声元件及所述显示器用于播放所述有声图像，所述播放所述有声图像为自动播放或根据触发播放请求进行播放。An electroacoustic component and a display, the electroacoustic component and the display are configured to play the sound image, and the playing the sound image is played automatically or according to a trigger play request.
一种电子装置，其特征在于，所述电子装置包括：An electronic device, the electronic device comprising:

一个或多个处理器；One or more processors;

存储器；和Memory; and

一个或多个程序，其中所述一个或多个程序被存储在所述存储器中，并且被配置成由所述一个或多个处理器执行，所述程序包括用于执行权利要求1至10任意一项所述的图像处理方法的指令。One or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the program comprising for performing any of claims 1 to 10 An instruction of the image processing method described.
一种计算机可读存储介质，其特征在于，包括与能够摄像的电子装置结合使用的计算机程序，所述计算机程序可被处理器执行以完成权利要求1至10任意一项所述的图像处理方法。A computer readable storage medium comprising a computer program for use in conjunction with an electronic device capable of imaging, the computer program being executable by a processor to perform the image processing method of any one of claims 1 to 10. .