TWI817594B

TWI817594B - Method for identifying depth image, computer device and storage medium

Info

Publication number: TWI817594B
Application number: TW111124990A
Authority: TW
Inventors: 李潔; 郭錦斌
Original assignee: 鴻海精密工業股份有限公司
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2023-10-01
Also published as: TW202403666A

Abstract

The present application relates to an image analysis technology, and provides a method for identifying a depth image, a computer device and a storage medium. The method includes obtaining an image to be identified, a first initial image and a second initial image; obtaining an initial depth image based on identifying the first initial image using a depth identification network; generating a pose absolute value matrix based on the first initial image, the second initial image and a pose network; generating a projected image based on the first initial image, the second initial image, the pose network and the initial depth image; identifying a target image and a target projected image according to the pose absolute value matrix and a preset threshold matrix; generating a depth identification model by adjusting the depth identification network according to an error between the target image, the target projection image and the initial depth image. Finally, depth information of the image to be identified is generated by inputting the image to be identified into the depth identification model.

Description

圖像深度識別方法、電腦設備及儲存介質 Image depth recognition method, computer equipment and storage media

本申請涉及影像處理領域，尤其涉及一種圖像深度識別方法、電腦設備及儲存介質。 The present application relates to the field of image processing, and in particular to an image depth recognition method, computer equipment and storage media.

在目前對車載圖像進行深度識別的方案中，可利用訓練圖像對深度網路進行訓練。然而，由於採用的訓練圖像中通常包括靜態對象與動態對象，而其中的動態對象會造成對深度網路的訓練精度不佳，從而導致訓練得到的深度識別模型無法準確識別出車載圖像的深度資訊，進而難以確定車輛與周圍環境中各類物體或障礙物的真實距離，會影響駕車安全。 In the current solution for deep recognition of vehicle images, training images can be used to train the deep network. However, because the training images used usually include static objects and dynamic objects, and the dynamic objects among them will cause poor training accuracy of the deep network, resulting in the trained deep recognition model being unable to accurately identify vehicle images. Depth information makes it difficult to determine the true distance between the vehicle and various objects or obstacles in the surrounding environment, which will affect driving safety.

鑒於以上內容，有必要提供一種圖像深度識別方法、電腦設備及儲存介質，解決了車載圖像的深度資訊識別不準確的技術問題。 In view of the above, it is necessary to provide an image depth recognition method, computer equipment and storage medium, which solves the technical problem of inaccurate depth information recognition in vehicle images.

一種圖像深度識別方法，所述圖像深度識別方法包括：獲取待識別圖像、第一初始圖像及第二初始圖像，並獲取深度識別網路及位姿網路；基於所述深度識別網路對所述第一初始圖像進行深度識別，得到初始深度圖像；對所述第一初始圖像進行預處理，得到與所述第一初始圖像對應的第一靜態圖像和第一動態圖像，並對所述第二初始圖像進行預處理，得到與所述第二初始圖像對應的第二靜態圖像和第二動態圖像；基於所述第一靜態圖像、所述第一動態圖像、所述第二靜態圖像、所述第二動態圖像及所述位姿網路生成位姿絕對值矩陣；將所述第一初始圖像及所述第二初始圖像輸入所述位姿網路中，得到目標位姿矩陣；基於所述第一初始圖像、所述初始深度圖像及所述目標位姿矩陣生成所述第一初始圖像的初始投影圖像；根據所述位姿絕對值矩陣及預設閥值矩陣識別所述第一初始圖像的目標圖像及所述初始投影圖像的目標投影圖像；基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型；將所述待識別圖像輸入到所述深度識別模型中，得到所述待識別圖像的目標深度圖像及所述待識別圖像的深度資訊。 An image depth recognition method, which includes: acquiring an image to be recognized, a first initial image and a second initial image, and acquiring a depth recognition network and a pose network; based on the depth The recognition network performs depth recognition on the first initial image to obtain an initial depth image; preprocesses the first initial image to obtain a first static image corresponding to the first initial image and first dynamic image, and preprocessing the second initial image to obtain a second static image and a second dynamic image corresponding to the second initial image; based on the first The static image, the first dynamic image, the second static image, the second dynamic image and the pose network generate a pose absolute value matrix; combine the first initial image and The second initial image is input into the pose network to obtain a target pose matrix; the first initial image is generated based on the first initial image, the initial depth image and the target pose matrix. the initial projection image of the image; identifying the target image of the first initial image and the target projection image of the initial projection image according to the pose absolute value matrix and the preset threshold matrix; based on the The gradient error between the initial depth image and the target image and the photometric error between the target projection image and the target image are used to adjust the depth recognition network to obtain a depth recognition model; The image to be recognized is input into the depth recognition model, and the target depth image of the image to be recognized and the depth information of the image to be recognized are obtained.

根據本申請可選實施例，所述對所述第一初始圖像進行預處理，得到與所述第一初始圖像對應的第一靜態圖像和第一動態圖像包括：基於所述第一初始圖像中每個像素點的像素值計算所述第一初始圖像中每個像素點的單個評分值；基於所述單個評分值及多個預設對象，計算所述第一初始圖像中每個像素點在每個預設對象上的類別概率；將取值最大的類別概率所對應的預設對象確定為該像素點所對應的像素對象；將所述第一初始圖像中相同像素對象的像素點所構成的像素區域確定為初始對象；根據預設規則對所述初始對象進行分類，得到所述第一初始圖像中與動態類別對應的動態對象以及與靜態類別對應的靜態對象；對所述第一初始圖像中的所述動態對象進行掩膜處理，得到所述第一靜態圖像；對所述第一初始圖像中的所述靜態對象進行掩膜處理，得到所述第一動態圖像。 According to an optional embodiment of the present application, preprocessing the first initial image to obtain the first static image and the first dynamic image corresponding to the first initial image includes: based on the first initial image The pixel value of each pixel in an initial image is used to calculate a single score value for each pixel in the first initial image; based on the single score value and multiple preset objects, the first initial image is calculated The class probability of each pixel in the image on each preset object; determine the preset object corresponding to the largest class probability as the pixel object corresponding to the pixel; determine the class probability of each pixel in the first initial image The pixel area composed of pixels of the same pixel object is determined as the initial object; the initial object is classified according to the preset rules to obtain the dynamic objects corresponding to the dynamic category and the static category corresponding to the first initial image. Static objects; performing masking processing on the dynamic objects in the first initial image to obtain the first static image; performing masking processing on the static objects in the first initial image, Obtain the first dynamic image.

根據本申請可選實施例，所述基於所述第一靜態圖像、所述第一動態圖像、所述第二靜態圖像、所述第二動態圖像及所述位姿網路生成位姿絕對值矩陣包括：將所述第一靜態圖像及所述第二靜態圖像輸入到所述位姿網路中，得到靜態位姿矩陣；將所述第一動態圖像及所述第二動態圖像輸入到所述位姿網路中，得到動態位姿矩陣；將所述靜態位姿矩陣中的每個矩陣元素與所述動態位姿矩陣中對應的矩陣元素進行相減運算，得到位姿差值；對所述位姿差值取絕對值，得到所述靜態位姿矩陣中每個矩陣元素的位姿絕對值；根據所述靜態位姿矩陣中每個矩陣元素的元素位置，將所述位姿絕對值進行排列，得到所述位姿絕對值矩陣。 According to an optional embodiment of the present application, the generated network based on the first static image, the first dynamic image, the second static image, the second dynamic image and the pose network The pose absolute value matrix includes: inputting the first static image and the second static image into the pose network to obtain a static pose matrix; converting the first dynamic image and the The second dynamic image is input into the pose network to obtain a dynamic pose matrix; each matrix element in the static pose matrix is combined with the dynamic pose network. The corresponding matrix elements in the posture matrix are subtracted to obtain the posture difference; the absolute value of the posture difference is taken to obtain the absolute value of each matrix element in the static posture matrix; according to The element positions of each matrix element in the static pose matrix are arranged, and the pose absolute values are arranged to obtain the pose absolute value matrix.

根據本申請可選實施例，所述第一初始圖像及所述第二初始圖像為同一拍攝設備拍攝的圖像，所述將所述第一初始圖像及所述第二初始圖像輸入所述位姿網路中，得到目標位姿矩陣包括：將所述第一初始圖像中所述動態對象對應的像素點確定為第一像素點；獲取所述第一像素點的第一齊次座標矩陣，並獲取所述第一像素點在所述第二初始圖像中對應的第二像素點的第二齊次座標矩陣；獲取所述拍攝設備的內參矩陣的逆矩陣；根據所述第一齊次座標矩陣及所述內參矩陣的逆矩陣計算出所述第一像素點的第一相機座標，並根據所述第二齊次座標矩陣及所述內參矩陣的逆矩陣計算出所述第二像素點的第二相機座標；基於預設對極約束關係式對所述第一相機座標及所述第二相機座標進行計算，得到旋轉矩陣及平移矩陣；將所述旋轉矩陣及所述平移矩陣進行拼接，得到所述目標位姿矩陣。 According to an optional embodiment of the present application, the first initial image and the second initial image are images taken by the same shooting device, and the first initial image and the second initial image are Inputting into the pose network and obtaining the target pose matrix includes: determining the pixel corresponding to the dynamic object in the first initial image as the first pixel; obtaining the first pixel of the first pixel. Homogeneous coordinate matrix, and obtain the second homogeneous coordinate matrix of the second pixel point corresponding to the first pixel point in the second initial image; obtain the inverse matrix of the internal reference matrix of the shooting device; according to the Calculate the first camera coordinate of the first pixel point based on the first homogeneous coordinate matrix and the inverse matrix of the internal parameter matrix, and calculate the first camera coordinate of the first pixel point based on the second homogeneous coordinate matrix and the inverse matrix of the internal parameter matrix. The second camera coordinates of the second pixel point; calculate the first camera coordinates and the second camera coordinates based on the preset epipolar constraint relationship to obtain a rotation matrix and a translation matrix; combine the rotation matrix and the The translation matrix is spliced to obtain the target pose matrix.

根據本申請可選實施例，所述根據所述位姿絕對值矩陣及預設閥值矩陣識別所述第一初始圖像的目標圖像及所述初始投影圖像的目標投影圖像包括：將所述位姿絕對值矩陣中的每個位姿絕對值與所述預設閥值矩陣中對應閥值進行比較；若所述位姿絕對值矩陣中存在至少一個大於所述對應閥值的位姿絕對值，則將所述第一靜態圖像確定為所述目標圖像，識別所述動態對象在所述第一初始圖像中的動態位置，將所述初始投影圖像中與所述動態位置所對應的區域確定為投影對象，並對所述投影對象進行掩膜處理，得到所述目標投影圖像；或者，若所述位姿絕對值矩陣中的每個位姿絕對值均小於或者等於所述對應閥值，將所述第一初始圖像確定為所述目標圖像，並將所述初始投影圖像確定為所述目標投影圖像。 According to an optional embodiment of the present application, identifying the target image of the first initial image and the target projection image of the initial projection image based on the pose absolute value matrix and the preset threshold matrix includes: Compare each pose absolute value in the pose absolute value matrix with the corresponding threshold in the preset threshold matrix; if there is at least one in the pose absolute value matrix that is greater than the corresponding threshold If the absolute value of the pose is determined, the first static image is determined as the target image, the dynamic position of the dynamic object in the first initial image is identified, and the initial projection image is compared with the target image. The area corresponding to the dynamic position is determined as the projection object, and the projection object is masked to obtain the target projection image; or, if the absolute value of each pose in the pose absolute value matrix is equal to is less than or equal to the corresponding threshold, the first initial image is determined as the target image, and the initial projection image is determined as the target projection image.

根據本申請可選實施例，所述基於所述第一初始圖像、所述初始深度圖像及所述目標位姿矩陣生成所述第一初始圖像的初始投影圖像包括：獲取所述第一初始圖像中每個像素點的目標齊次座標矩陣，並從所述初始深度圖像中獲取所述第一初始圖像中每個像素點的深度值；基於所述目標位姿矩陣、每個像素點的目標齊次座標矩陣及每個像素點的深度值計算出所述第一初始圖像中每個像素點的投影座標；根據每個像素點的投影座標對每個像素點進行排列處理，得到所述初始投影圖像。 According to an optional embodiment of the present application, generating an initial projection image of the first initial image based on the first initial image, the initial depth image and the target pose matrix includes: obtaining the The target homogeneous coordinate matrix of each pixel in the first initial image, and obtain the depth value of each pixel in the first initial image from the initial depth image; based on the target pose matrix , the target homogeneous coordinate matrix of each pixel point and the depth value of each pixel point are used to calculate the projection coordinates of each pixel point in the first initial image; according to the projection coordinates of each pixel point, each pixel point is Perform arrangement processing to obtain the initial projection image.

根據本申請可選實施例，所述基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型包括：基於所述梯度誤差及所述光度誤差計算所述深度識別網路的深度損失值；基於所述深度損失值調整所述深度識別網路，直至所述深度損失值下降到最低，得到所述深度識別模型。 According to an optional embodiment of the present application, the adjustment is based on the gradient error between the initial depth image and the target image and the photometric error between the target projection image and the target image. Deep recognition network, the obtained deep recognition model includes: Calculate the depth loss value of the depth recognition network based on the gradient error and the photometric error; adjust the depth recognition network based on the depth loss value until the depth loss value drops to a minimum to obtain the depth Identify the model.

根據本申請可選實施例，所述光度誤差的計算公式為：

；其中，Lt表示所述光度誤差，α為預設的平衡參數，SSIM(x,y)表示所述目標投影圖像與所述目標圖像之間的結構相似指數，∥x _i-y _i∥表示所述目標投影圖像與所述目標圖像之間的灰度差值，x _i表示所述目標投影圖像中第i個像素點的像素值，y _i表示所述目標圖像中與所述第i個像素點對應的像素點的像素值。 According to an optional embodiment of the present application, the calculation formula of the photometric error is:

;wherein, Lt represents the photometric error, α is the preset balance parameter, SSIM ( x,y ) represents the structural similarity index between the target projection image and the target image, ∥ x _i - y _i ∥ represents the grayscale difference between the target projection image and the target image, x _i represents the pixel value of the i- th pixel in the target projection image, y _i represents the pixel value in the target image The pixel value of the pixel corresponding to the i- th pixel.

本申請提供一種電腦設備，所述電腦設備包括：儲存器，儲存至少一個指令；及處理器，執行所述至少一個指令以實現所述的圖像深度識別方法。 The present application provides a computer device, which includes: a storage that stores at least one instruction; and a processor that executes the at least one instruction to implement the image depth recognition method.

本申請提供一種電腦可讀儲存介質，所述電腦可讀儲存介質中儲存有至少一個指令，所述至少一個指令被電腦設備中的處理器執行以實現所述的圖像深度識別方法。 The present application provides a computer-readable storage medium in which at least one instruction is stored, and the at least one instruction is executed by a processor in a computer device to implement the image depth recognition method.

綜上所述，本申請對所述第一初始圖像進行預處理，能夠準確地確定出所述第一初始圖像中的動態對象及靜態對象，當存在至少一個位姿絕對值大於對應的閥值時，確定所述動態對象發生移動，當所述位姿絕對值矩陣中所有的位姿絕對值均小於或者等於對應的閥值時，確定所述動態對象沒有發生移動，因此能夠避免將所述初始圖像中未發生移動的動態對象進行掩膜處理，當所述第一初始圖像中的動態對象發生移動時，將所述第一初始圖像中的動態對象進行掩膜處理，得到目標圖像，並將所述初始投影圖像中的動態對象進行掩膜處理，得到所述目標投影圖像，基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型，由於基於所述梯度誤差及所述光度誤差對所述深度神經網路進行調整時能夠避免將所述初始圖像中未發生移動的動態對象進行掩膜處理，因此能夠提高所述深度識別模型的精度，進而能夠提高圖像的深度識別的精確度。 To sum up, this application preprocesses the first initial image and can accurately Determine the dynamic objects and static objects in the first initial image. When there is at least one pose whose absolute value is greater than the corresponding threshold, it is determined that the dynamic object has moved. When all the pose absolute values in the matrix When the absolute values of the poses are less than or equal to the corresponding threshold, it is determined that the dynamic object has not moved. Therefore, it is possible to avoid masking the dynamic objects that have not moved in the initial image. When the first When the dynamic object in the initial image moves, the dynamic object in the first initial image is masked to obtain a target image, and the dynamic object in the initial projection image is masked. The target projection image is obtained, and the depth recognition is adjusted based on the gradient error between the initial depth image and the target image and the photometric error between the target projection image and the target image. network to obtain a depth recognition model. Since the adjustment of the deep neural network based on the gradient error and the photometric error can avoid masking dynamic objects that have not moved in the initial image, therefore The accuracy of the depth recognition model can be improved, thereby improving the accuracy of depth recognition of images.

1:電腦設備 1:Computer equipment

2:拍攝設備 2: Shooting equipment

12:儲存器 12:Storage

13:處理器 13: Processor

101-109:步驟 101-109: Steps

O_uv:像素點 O _uv : pixel

O_XY:光點 O _XY : light spot

圖1是本申請圖像深度識別方法的較佳實施例的應用環境圖。 Figure 1 is an application environment diagram of a preferred embodiment of the image depth recognition method of the present application.

圖2是本申請圖像深度識別方法的較佳實施例的流程圖。 Figure 2 is a flow chart of a preferred embodiment of the image depth recognition method of the present application.

圖3是本申請圖像深度識別方法的像素座標系和相機座標系的示意圖。 Figure 3 is a schematic diagram of the pixel coordinate system and camera coordinate system of the image depth recognition method of this application.

圖4是本申請實現圖像深度識別方法的較佳實施例的電腦設備的結構示意圖。 Figure 4 is a schematic structural diagram of a computer device implementing a preferred embodiment of the image depth recognition method of the present application.

為了使本申請的目的、技術方案和優點更加清楚，下面結合附圖和具體實施例對本申請進行詳細描述。 In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in detail below with reference to the accompanying drawings and specific embodiments.

如圖1所示，是本申請一種圖像深度識別方法的較佳實施例的應用環境圖。所述圖像深度識別方法可應用於一個或者多個電腦設備1中，所述電腦設備1與拍攝設備2相通信，所述拍攝設備2可以是單目相機，也可以是實現拍攝的其它設備。 As shown in Figure 1, it is an application environment diagram of a preferred embodiment of an image depth recognition method of the present application. The image depth recognition method can be applied to one or more computer devices 1. The computer device 1 communicates with the shooting device 2. The shooting device 2 can be a monocular camera or a camera. Other equipment for shooting.

所述電腦設備1是一種能夠按照事先設定或儲存的指令，自動進行參數值計算和/或資訊處理的設備，其硬體包括，但不限於：微處理器、專用積體電路(Application Specific Integrated Circuit，ASIC)、可程式設計閘陣列(Field-Programmable Gate Array，FPGA)、數位訊號處理器(Digital Signal Processor，DSP)、嵌入式設備等。 The computer device 1 is a device that can automatically perform parameter value calculation and/or information processing according to preset or stored instructions. Its hardware includes, but is not limited to: microprocessor, application specific integrated circuit (Application Specific Integrated Circuit). Circuit (ASIC), Field-Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc.

所述電腦設備1可以是任何一種可與用戶進行人機交互的電腦產品，例如，個人電腦、平板電腦、智慧手機、個人數位助理(Personal Digital Assistant，PDA)、遊戲機、互動式網路電視(Internet Protocol Television，IPTV)、穿戴式智能設備等。所述電腦設備1還可以包括網路設備和/或使用者設備。其中，所述網路設備包括，但不限於單個網路服務器、多個網路服務器組成的伺服器組或基於雲計算(Cloud Computing)的由大量主機或網路服務器構成的雲。 The computer device 1 can be any computer product that can perform human-computer interaction with the user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (Personal Digital Assistant, PDA), a game console, and an interactive Internet TV. (Internet Protocol Television, IPTV), wearable smart devices, etc. The computer equipment 1 may also include network equipment and/or user equipment. The network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud composed of a large number of hosts or network servers based on cloud computing.

所述電腦設備1所處的網路包括，但不限於：網際網路、廣域網路、都會區網路、區域網路、虛擬私人網路(Virtual Private Network，VPN)等。 The network where the computer device 1 is located includes, but is not limited to: the Internet, wide area network, metropolitan area network, regional network, virtual private network (Virtual Private Network, VPN), etc.

如圖2所示，是本申請一種圖像深度識別方法的較佳實施例的流程圖。根據不同的需求，所述流程圖中各個步驟的順序可以根據實際檢測要求進行調整，某些步驟可以省略。所述方法的執行主體為電腦設備，例如圖1所示的電腦設備1。 As shown in Figure 2, it is a flow chart of a preferred embodiment of an image depth recognition method of the present application. According to different needs, the order of each step in the flow chart can be adjusted according to actual detection requirements, and some steps can be omitted. The execution subject of the method is a computer device, such as the computer device 1 shown in Figure 1 .

步驟101，獲取待識別圖像、第一初始圖像及第二初始圖像，並獲取深度識別網路及位姿網路。 Step 101: Obtain the image to be recognized, the first initial image and the second initial image, and acquire the depth recognition network and the pose network.

在本申請的至少一個實施例中，所述待識別圖像是指需要進行深度資訊識別的圖像。在本申請的至少一個實施例中，所述第一初始圖像及所述第二初始圖像為相鄰幀的三原色光(Red Green Blue，RGB)圖像，所述第二初始圖像的生成時間大於所述第一初始圖像的生成時間，所述第一初始圖像及所述第二初始圖像中可以包含車輛，地面、行人、天空、樹木等初始對象，所述第一初始圖像及所述第二初始圖像包含相同的初始對象。 In at least one embodiment of the present application, the image to be recognized refers to an image that requires depth information recognition. In at least one embodiment of the present application, the first initial image and the second initial image are three primary color light (Red Green Blue, RGB) images of adjacent frames, and the second initial image is The generation time is greater than the generation time of the first initial image, the first initial image and The second initial image may include initial objects such as vehicles, ground, pedestrians, sky, trees, etc. The first initial image and the second initial image include the same initial objects.

在本申請的至少一個實施例中，所述電腦設備獲取待識別圖像包括：所述電腦設備控制所述拍攝設備拍攝目標場景，得到所述待識別圖像。其中，所述目標場景中可以包括車輛，地面、行人等目標對象。 In at least one embodiment of the present application, the computer device acquiring the image to be recognized includes: the computer device controlling the shooting device to capture the target scene to obtain the image to be recognized. The target scene may include target objects such as vehicles, ground, and pedestrians.

在本申請的至少一個實施例中，所述電腦設備從預設的資料庫中獲取所述第一初始圖像及所述第二初始圖像，所述預設的資料庫可以為KITTI資料庫、Cityscapes資料庫及vKITTI資料庫等等。在本申請的至少一個實施例中，所述深度識別網路可以為深度神經網路，所述位姿網路是指對位姿進行識別的卷積神經網路，所述深度識別網路及所述位姿網路均可以從網際網路的資料庫中獲取。 In at least one embodiment of the present application, the computer device obtains the first initial image and the second initial image from a preset database. The preset database may be a KITTI database. , Cityscapes database and vKITTI database, etc. In at least one embodiment of the present application, the depth recognition network may be a deep neural network, and the pose network refers to a convolutional neural network that recognizes poses. The depth recognition network and The pose network can be obtained from a database on the Internet.

步驟102，基於所述深度識別網路對所述第一初始圖像進行深度識別，得到初始深度圖像。 Step 102: Perform depth recognition on the first initial image based on the depth recognition network to obtain an initial depth image.

在本申請的至少一個實施例中，所述初始深度圖像是指包含深度資訊的圖像，其中，所述深度資訊是指所述第一初始圖像中每個像素點對應的初始對象與所述第一初始圖像的拍攝設備之間的距離，其中，所述拍攝設備可以為單目相機。在本申請的至少一個實施例中，所述深度識別網路包括卷積層及反卷積層。 In at least one embodiment of the present application, the initial depth image refers to an image containing depth information, wherein the depth information refers to an initial object corresponding to each pixel in the first initial image. The distance between the shooting devices of the first initial image, where the shooting device may be a monocular camera. In at least one embodiment of the present application, the depth recognition network includes a convolutional layer and a deconvolutional layer.

在本申請的至少一個實施例中，所述電腦設備基於所述深度識別網路對所述第一初始圖像進行深度識別，得到初始深度圖像包括：所述電腦設備將所述第一初始圖像輸入到所述卷積層中進行卷積運算，得到所述第一初始圖像對應的初始特徵圖，並將所述初始特徵圖輸入所述反卷積層進行反卷積運算得到高維度特徵圖，進一步地，所述電腦設備將每個像素點與所述拍攝設備之間的距離映射為所述高維度特徵圖中的每個像素點的深度值，更進一步地，所述電腦設備基於每個像素點及每個像素點的像素值生成所述初始深度圖像。 In at least one embodiment of the present application, the computer device performs depth recognition on the first initial image based on the depth recognition network, and obtaining the initial depth image includes: the computer device performs depth recognition on the first initial image. The image is input into the convolution layer to perform a convolution operation to obtain an initial feature map corresponding to the first initial image, and the initial feature map is input into the deconvolution layer to perform a deconvolution operation to obtain high-dimensional features. Figure, further, the computer device maps the distance between each pixel point and the shooting device into the depth value of each pixel point in the high-dimensional feature map, and further, the computer device is based on Each pixel point and the pixel value of each pixel point generate the initial depth image.

透過上述實施方式，由於每個像素點的深度值能夠反映所述第一初始圖像上的每個像素點與所述拍攝設備之間的真實距離，從而使得透過所述初始深度圖像能夠準確地計算投影座標。 Through the above implementation, since the depth value of each pixel can reflect the first The real distance between each pixel on the initial image and the shooting device allows the projection coordinates to be accurately calculated through the initial depth image.

步驟103，對所述第一初始圖像進行預處理，得到與所述第一初始圖像對應的第一靜態圖像和第一動態圖像，並對所述第二初始圖像進行預處理，得到與所述第二初始圖像對應的第二靜態圖像和第二動態圖像。 Step 103: Preprocess the first initial image to obtain a first static image and a first dynamic image corresponding to the first initial image, and preprocess the second initial image. , obtaining a second static image and a second dynamic image corresponding to the second initial image.

在本申請的至少一個實施例中，所述第一初始圖像包含多個初始對象。在本申請的至少一個實施例中，所述第一靜態圖像是指對所述第一初始圖像中的動態對象進行掩膜處理之後所生成的圖像，所述第二靜態圖像是指對所述第二初始圖像中的動態對象進行掩膜處理之後生成的圖像，所述動態對象是指能夠移動的對象，例如所述動態對象可以是行人、車輛，所述第一動態圖像是指對所述第一初始圖像中的靜態對象進行掩膜處理之後所生成的圖像，所述第二動態圖像是指對所述第二初始圖像中的靜態對象進行掩膜處理之後所生成的圖像，所述靜態對象是指不能夠移動的對象，例如，所述靜態對象可以為樹木、地面等等。 In at least one embodiment of the present application, the first initial image includes a plurality of initial objects. In at least one embodiment of the present application, the first static image refers to an image generated after masking the dynamic objects in the first initial image, and the second static image is Refers to the image generated after masking the dynamic objects in the second initial image. The dynamic objects refer to objects that can move. For example, the dynamic objects can be pedestrians and vehicles. The first dynamic objects The image refers to an image generated after masking the static objects in the first initial image, and the second dynamic image refers to the static objects in the second initial image being masked. In the image generated after film processing, the static object refers to an object that cannot move. For example, the static object can be a tree, the ground, etc.

在本申請的至少一個實施例中，所述電腦設備對所述第一初始圖像進行預處理，得到所述第一初始圖像對應的第一靜態圖像和第一動態圖像包括：所述電腦設備基於所述第一初始圖像中每個像素點的像素值計算所述第一初始圖像中每個像素點的單個評分值，進一步地，所述電腦設備基於所述單個評分值及多個預設對象，計算所述第一初始圖像中每個像素點在每個預設對象上的類別概率，更進一步地，所述電腦設備將取值最大的類別概率所對應的預設對象確定為該像素點所對應的像素對象，並將所述第一初始圖像中相同像素對象的像素點所構成的像素區域確定為初始對象，更進一步地，所述電腦設備根據預設規則對所述初始對象進行分類，得到所述第一初始圖像中與動態類別對應的動態對象以及與靜態類別對應的靜態對象，更進一步地，所述電腦設備對所述第一初始圖像中的所述動態對象進行掩膜處理，得到所述第一靜態圖像，及對所述第一初始圖像中的所述靜態對象進行掩膜處理，得到所述第一動態圖像。 In at least one embodiment of the present application, the computer device pre-processes the first initial image, and obtaining the first static image and the first dynamic image corresponding to the first initial image includes: The computer device calculates a single score value of each pixel in the first initial image based on the pixel value of each pixel in the first initial image. Further, the computer device calculates a single score value based on the single score value. and a plurality of preset objects, and calculate the category probability of each pixel in the first initial image on each preset object. Furthermore, the computer device calculates the preset value corresponding to the largest category probability. Assume that the object is determined to be the pixel object corresponding to the pixel point, and the pixel area composed of the pixel points of the same pixel object in the first initial image is determined as the initial object. Furthermore, the computer device is configured according to the preset Rules classify the initial objects to obtain dynamic objects corresponding to the dynamic category and static objects corresponding to the static category in the first initial image. Furthermore, the computer device classifies the first initial image Mask processing is performed on the dynamic objects in the first static image to obtain the first static image, and mask processing is performed on the static objects in the first initial image to obtain the first dynamic image. picture.

在本申請的至少一個實施例中，所述預設規則將屬於代步工具、人或者動物等初始對象確定為可以移動的初始對象，並將所述可以移動的初始對象確定為所述動態類別，將屬於植物、固定對象等初始對象確定為不能移動的初始對象，並將不能移動的初始對象對應的初始類別確定為所述靜態類別。例如將可以移動的行人、小貓、小狗、自行車及小轎車等初始對象確定為所述動態類別，並將不能移動的樹木、路燈及建築物等初始對象確定為所述靜態類別。 In at least one embodiment of the present application, the preset rules determine initial objects belonging to transportation tools, people, or animals as movable initial objects, and determine the movable initial objects as the dynamic category, Initial objects belonging to plants, fixed objects, etc. are determined as immovable initial objects, and an initial category corresponding to the immovable initial object is determined as the static category. For example, movable initial objects such as pedestrians, kittens, puppies, bicycles, and cars are determined as the dynamic category, and initial objects such as immovable trees, street lights, and buildings are determined as the static category.

具體地，所述類別概率的計算公式為：

，i=1,2,...,k；其中，S _i表示每個像素點屬於第i個預設對象的類別概率，

表示所述第一初始圖像中的第j個像素點的單個評分值，z _j表示所述第一初始圖像中的第j個像素點的像素值，

表示所述第一初始圖像中所有像素點的總評分值，i表示所述第i個預設對象，k表示所述多個預設對象的數量。 Specifically, the calculation formula of the category probability is:

, i=1 , 2 , ... , k; where, S _i represents the category probability that each pixel belongs to the i-th preset object,

represents the single score value of the j-th pixel in the first initial image, z _j represents the pixel value of the j-th pixel in the first initial image,

represents the total score value of all pixels in the first initial image, i represents the i- th preset object, and k represents the number of the multiple preset objects.

在本申請的至少一個實施例中，所述第二靜態圖像的生成過程與所述第一靜態圖像基本一致，所述第二動態圖像的生成過程與所述第一動態圖像基本一致，故本申請在此不作贅述。 In at least one embodiment of the present application, the generation process of the second static image is basically the same as the first static image, and the generation process of the second dynamic image is basically the same as the first dynamic image. Consistent, so this application will not go into details here.

透過上述實施方式，將所述第一初始圖像中相同像素對象的像素點所構成的像素區域確定為初始對象，並根據預設規則對所述初始對象進行初步分類，能夠初步確定出所述第一初始圖像中動態對象及靜態對象的位置，從而透過位置能夠準確地將所述動態對象及所述靜態對象進行掩膜處理。 Through the above implementation, the pixel area composed of pixel points of the same pixel object in the first initial image is determined as the initial object, and the initial object is initially classified according to the preset rules, and the initial object can be initially determined. The position of the dynamic object and the static object in the first initial image, so that the dynamic object and the static object can be accurately masked through the position.

步驟104，基於所述第一靜態圖像、所述第一動態圖像、所述第二靜態圖像、所述第二動態圖像及所述位姿網路生成位姿絕對值矩陣。 Step 104: Generate a pose absolute value matrix based on the first static image, the first dynamic image, the second static image, the second dynamic image and the pose network.

在本申請的至少一個實施例中，所述位姿絕對值矩陣是指由多個位姿絕對值生成的矩陣，所述位姿絕對值是指靜態位姿矩陣中的元素與動態位姿矩陣中對應的元素之間的差值的絕對值，所述靜態位姿矩陣是指根據所述第一靜態圖像及所述第二靜態圖像生成的矩陣，所述動態位姿矩陣是指根據所述第一動態圖像及所述第二動態圖像生成的矩陣。在本申請的至少一個實施例中，所述電腦設備基於所述第一靜態圖像、所述第一動態圖像、所述第二靜態圖像、所述第二動態圖像及所述位姿網路生成位姿絕對值矩陣包括：所述電腦設備將所述第一靜態圖像及所述第二靜態圖像輸入到所述位姿網路中，得到靜態位姿矩陣，及將所述第一動態圖像及所述第二動態圖像輸入到所述位姿網路中，得到動態位姿矩陣，進一步地，所述電腦設備將所述靜態位姿矩陣中的每個矩陣元素與所述動態位姿矩陣中對應的矩陣元素進行相減運算，得到位姿差值，更進一步地，所述電腦設備對所述位姿差值取絕對值，得到所述靜態位姿矩陣中每個矩陣元素的位姿絕對值，更進一步地，所述電腦設備根據所述靜態位姿矩陣中每個矩陣元素的元素位置，將所述位姿絕對值進行排列，得到所述位姿絕對值矩陣。在本實施例中，所述靜態位姿矩陣和所述動態位姿矩陣的生成方式與下文中的目標位姿矩陣的生成方式基本相同，故本申請在此不作贅述。 In at least one embodiment of the present application, the pose absolute value matrix refers to a matrix generated from multiple pose absolute values, and the pose absolute value refers to the elements in the static pose matrix and the dynamic pose matrix. The absolute value of the difference between the corresponding elements in , the static pose matrix refers to the A matrix generated from a static image and the second static image, and the dynamic pose matrix refers to a matrix generated based on the first dynamic image and the second dynamic image. In at least one embodiment of the present application, the computer device is based on the first static image, the first dynamic image, the second static image, the second dynamic image and the bit Generating the pose absolute value matrix by the pose network includes: the computer device inputs the first static image and the second static image into the pose network, obtains the static pose matrix, and converts the The first dynamic image and the second dynamic image are input into the pose network to obtain a dynamic pose matrix. Further, the computer device converts each matrix element in the static pose matrix Subtract the corresponding matrix elements in the dynamic pose matrix to obtain the pose difference. Furthermore, the computer device takes the absolute value of the pose difference to obtain the static pose matrix. The absolute value of each matrix element. Furthermore, the computer device arranges the absolute value of the pose according to the element position of each matrix element in the static pose matrix to obtain the absolute value of the pose. value matrix. In this embodiment, the generation method of the static pose matrix and the dynamic pose matrix is basically the same as the generation method of the target pose matrix below, so the details will not be described in this application.

透過上述實施方式，由於所述靜態位姿矩陣中包含靜態對象的位置和姿態，所述動態位姿矩陣中包含動態對象的位置和姿態，因此所述靜態位姿矩陣能夠準確反映所述靜態對象的狀態，所述動態位姿矩陣能夠準確反映所述動態對象的狀態，當所述動態對象沒有移動時，所述動態位姿矩陣與所述靜態位姿矩陣基本相同，透過所述位姿絕對值與對應閥值的運算結果來確定所述動態對象是否發生移動，能夠避免合理誤差的影響。 Through the above embodiments, since the static pose matrix contains the position and attitude of the static object, and the dynamic pose matrix contains the position and attitude of the dynamic object, the static pose matrix can accurately reflect the static object. state, the dynamic pose matrix can accurately reflect the state of the dynamic object. When the dynamic object does not move, the dynamic pose matrix is basically the same as the static pose matrix. Through the pose absolute The operation result of the value and the corresponding threshold value is used to determine whether the dynamic object moves, which can avoid the influence of reasonable errors.

步驟105，將所述第一初始圖像及所述第二初始圖像輸入所述位姿網路中，得到目標位姿矩陣。 Step 105: Input the first initial image and the second initial image into the pose network to obtain a target pose matrix.

在本申請的至少一個實施例中，所述目標位姿矩陣是指所述第一初始圖像中的每個像素點的相機座標到世界座標的變換關係，所述第一初始圖像中的每個像素點的相機座標是指每個像素點在相機座標系中的座標。如圖3所示，是本申請深度識別方法的像素座標系和相機座標系的示意圖，所述電腦設備以所述第一初始圖像的第一行第一列的像素點O_uv為原點，以第一行像素點所在的平行線為u軸，以第一列像素點所在的垂直線為v軸構建像素座標系。此外，所述電腦設備以所述單目相機的光點O_XY為原點，以所述單目相機的光軸為Z軸，以所述像素座標系u軸的平行線為X軸，以所述像素座標系的v軸的平行線為Y軸構建所述相機座標系。 In at least one embodiment of the present application, the target pose matrix refers to the transformation relationship from camera coordinates to world coordinates of each pixel in the first initial image. The camera coordinates of each pixel point refer to the coordinates of each pixel point in the camera coordinate system. As shown in Figure 3, it is a schematic diagram of the pixel coordinate system and the camera coordinate system of the depth recognition method of the present application. The computer device takes the pixel point O _uv in the first row and first column of the first initial image as the origin. , construct a pixel coordinate system with the parallel line where the pixels in the first row are located as the u-axis, and the vertical line where the pixels in the first column are located as the v-axis. In addition, the computer device takes the light point _O The parallel line of the v-axis of the pixel coordinate system is the Y-axis to construct the camera coordinate system.

在本申請的至少一個實施例中，所述電腦設備將所述第一初始圖像及所述第二初始圖像輸入所述位姿網路中，得到目標位姿矩陣包括：所述電腦設備將所述第一初始圖像中所述動態對象對應的像素點確定為第一像素點，進一步地，所述電腦設備獲取所述第一像素點的第一齊次座標矩陣，並獲取所述第一像素點在所述第二初始圖像中對應的第二像素點的第二齊次座標矩陣，進一步地，所述電腦設備獲取所述拍攝設備的內參矩陣的逆矩陣，更進一步地，所述電腦設備根據所述第一齊次座標矩陣及所述內參矩陣的逆矩陣計算出所述第一像素點的第一相機座標，並根據所述第二齊次座標矩陣及所述內參矩陣的逆矩陣計算出所述第二像素點的第二相機座標，更進一步地，所述電腦設備基於預設對極約束關係式對所述第一相機座標及所述第二相機座標進行計算，得到旋轉矩陣及平移矩陣，並將所述旋轉矩陣及所述平移矩陣進行拼接，得到所述目標位姿矩陣。 In at least one embodiment of the present application, the computer device inputs the first initial image and the second initial image into the pose network, and obtaining the target pose matrix includes: the computer device The pixel point corresponding to the dynamic object in the first initial image is determined as the first pixel point. Further, the computer device obtains the first homogeneous coordinate matrix of the first pixel point, and obtains the first homogeneous coordinate matrix of the first pixel point. The second homogeneous coordinate matrix of the second pixel point corresponding to the first pixel point in the second initial image. Further, the computer device obtains the inverse matrix of the internal reference matrix of the shooting device, and further, The computer device calculates the first camera coordinates of the first pixel based on the first homogeneous coordinate matrix and the inverse matrix of the internal parameter matrix, and calculates the first camera coordinates of the first pixel based on the second homogeneous coordinate matrix and the internal parameter matrix. The inverse matrix calculates the second camera coordinates of the second pixel point. Furthermore, the computer device calculates the first camera coordinates and the second camera coordinates based on the preset epipolar constraint relationship, A rotation matrix and a translation matrix are obtained, and the rotation matrix and the translation matrix are spliced to obtain the target pose matrix.

其中，所述第一像素點的第一齊次座標矩陣是指維度比像素座標矩陣的維度多出一維的矩陣，而且多出的一個維度的元素值為1，所述像素座標矩陣是指根據所述第一像素點的第一像素座標生成的矩陣，所述第一像素座標是指所述第一像素點在所述像素座標系中的座標，例如，所述第一像素點在所述像素座標系中的第一像素座標為(u,v)，所述第一像素點的像素座標矩陣為

；則該像素點的齊次座標矩陣為

。將所述第一齊次座標矩陣及所述內參矩陣的逆矩陣進行相乘，得到所述第一像素點的第一相機座標，並將所述第二齊次座標矩陣及所述內參矩陣的逆矩陣進行相乘，得到所述第二像素點的第二相機座標。 Wherein, the first homogeneous coordinate matrix of the first pixel point refers to a matrix with one dimension more than the dimension of the pixel coordinate matrix, and the element value of the extra dimension is 1, and the pixel coordinate matrix refers to A matrix generated according to the first pixel coordinate of the first pixel point, where the first pixel coordinate refers to the coordinate of the first pixel point in the pixel coordinate system. For example, the first pixel point is at the The first pixel coordinate in the pixel coordinate system is ( u, v ), and the pixel coordinate matrix of the first pixel point is

; Then the homogeneous coordinate matrix of the pixel is

. Multiply the first homogeneous coordinate matrix and the inverse matrix of the internal parameter matrix to obtain the first camera coordinate of the first pixel point, and combine the second homogeneous coordinate matrix and the internal parameter matrix Multiply the inverse matrices to obtain the second camera coordinates of the second pixel point.

其中，所述第二齊次座標矩陣的生成方式與所述第一齊次座標矩陣的生成方式基本一致，本申請在此不作贅述。 The generation method of the second homogeneous coordinate matrix is basically the same as the generation method of the first homogeneous coordinate matrix, and will not be described in detail here in this application.

所述目標旋轉矩陣可以表示為：

；其中，pose為所述目標位姿矩陣，所述目標位姿矩陣為4x4的矩陣，R為所述旋轉矩陣，所述旋轉矩陣為3x3的矩陣，t為所述平移矩陣，所述平移矩陣為3x1的矩陣。 The target rotation matrix can be expressed as:

;wherein, pose is the target pose matrix, and the target pose matrix is a 4x4 matrix, R is the rotation matrix, and the rotation matrix is a 3x3 matrix, t is the translation matrix, and the translation matrix is a 3x1 matrix.

其中，所述平移矩陣及所述旋轉矩陣的計算公式為：K ^-1 p ₁(txR)(K ^-1 p ₂)^T=0；其中，K ^-1 p ₁為所述第一相機座標，K ^-1 p ₂為所述第二相機座標，p ₁為所述第一齊次座標矩陣，p ₂為所述第二齊次座標矩陣，K ^-1為所述內參矩陣的逆矩陣。 Wherein, the calculation formula of the translation matrix and the rotation matrix is: K ^-1 p ₁ ( t x R ) ( K ^-1 p ₂ ) ^T =0; where, K ^-1 p ₁ is the first camera coordinates, K ^-1 p ₂ is the second camera coordinate, p ₁ is the first homogeneous coordinate matrix, p ₂ is the second homogeneous coordinate matrix, K ^-1 is the inverse matrix of the internal parameter matrix .

透過上述實施方式，根據所述相機內參矩陣將所述第一初始圖像及所述第二初始圖像中每個像素點的二維像素座標轉換為相機座標系中三維的相機座標，透過所述相機座標能夠準確地計算出旋轉矩陣及平移矩陣，從而根據所述旋轉矩陣及所述平移矩陣準確地生成所述目標位姿矩陣。 Through the above implementation, the two-dimensional pixel coordinates of each pixel in the first initial image and the second initial image are converted into three-dimensional camera coordinates in the camera coordinate system according to the camera internal parameter matrix. The camera coordinates can accurately calculate the rotation matrix and the translation matrix, thereby accurately generating the target pose matrix based on the rotation matrix and the translation matrix.

步驟106，基於所述第一初始圖像、所述初始深度圖像及所述目標位姿矩陣生成所述第一初始圖像的初始投影圖像。 Step 106: Generate an initial projection image of the first initial image based on the first initial image, the initial depth image and the target pose matrix.

在本申請的至少一個實施例中，所述初始投影圖像表示變換過程的圖像，所述變換過程是指所述第一初始圖像中像素點的像素座標與所述第二初始圖像中對應的像素座標之間的變換過程。在本申請的至少一個實施例中，所述電腦設備基於所述第一初始圖像、所述初始深度圖像及所述目標位姿矩陣生成所述第一初始圖像的初始投影圖像包括：所述電腦設備獲取所述第一初始圖像中每個像素點的目標齊次座標矩陣，並從所述初始深度圖像中獲取所述第一初始圖像中每個像素點的深度值，進一步地，所述電腦設備基於所述目標位姿矩陣、每個像素點的目標齊次座標矩陣及每個像素點的深度值計算出所述第一初始圖像中每個像素點的投影座標，更進一步地，所述電腦設備根據每個像素點的投影座標對每個像素點進行排列處理，得到所述初始投影圖像。 In at least one embodiment of the present application, the initial projection image represents an image of a transformation process, and the transformation process refers to the difference between the pixel coordinates of the pixels in the first initial image and the second initial image. The transformation process between corresponding pixel coordinates in . In at least one embodiment of the present application, the computer device generates an initial projection image of the first initial image based on the first initial image, the initial depth image and the target pose matrix, including : The computer device obtains the target homogeneous coordinate matrix of each pixel in the first initial image, and obtains the depth value of each pixel in the first initial image from the initial depth image. , further, the computer device calculates the projection of each pixel in the first initial image based on the target pose matrix, the target homogeneous coordinate matrix of each pixel and the depth value of each pixel. coordinates. Furthermore, the computer device arranges each pixel point according to the projection coordinates of each pixel point to obtain the initial projection image.

其中，所述深度值是指所述初始深度圖像中每個像素點的像素值。 The depth value refers to the pixel value of each pixel in the initial depth image.

具體地，所述初始投影圖像中每個像素點的投影座標的計算公式為：P=K * pose * Z * K ^-1 * H；其中，P表示每個像素點的投影座標，K表示所述拍攝設備的內參矩陣，pose表示所述目標位姿矩陣，K^-1表示K的逆矩陣，H表示所述第一初始圖像中每個像素點的目標齊次座標矩陣，Z表示所述初始深度圖像中對應的像素點的深度值。 Specifically, the calculation formula for the projection coordinates of each pixel in the initial projection image is: P = K * pose * Z * K ^-1 * H ; where P represents the projection coordinates of each pixel, and K represents The internal parameter matrix of the shooting device, pose represents the target pose matrix, K ^-1 represents the inverse matrix of K , H represents the target homogeneous coordinate matrix of each pixel in the first initial image, and Z represents the The depth value of the corresponding pixel in the initial depth image.

步驟107，根據所述位姿絕對值矩陣及預設閥值矩陣識別所述第一初始圖像的目標圖像及所述初始投影圖像的目標投影圖像。 Step 107: Identify the target image of the first initial image and the target projection image of the initial projection image according to the pose absolute value matrix and the preset threshold matrix.

在本申請的至少一個實施例中，所述預設閥值矩陣是指預先設置的與所述位姿絕對值矩陣的維度相同的矩陣，所述預設閥值矩陣中包含多個閥值。在本申請的至少一個實施例中，所述電腦設備根據所述位姿絕對值矩陣及預設閥值矩陣識別所述第一初始圖像的目標圖像及所述初始投影圖像的目標投影圖像包括：所述電腦設備將所述位姿絕對值矩陣中的每個位姿絕對值與所述預設閥值矩陣中對應閥值進行比較，若所述位姿絕對值矩陣中存在至少一個大於所述對應閥值的位姿絕對值，則所述電腦設備將所述第一靜態圖像確定為所述目標圖像，識別所述動態對象在所述第一初始圖像中的動態位置，將所述初始投影圖像中與所述動態位置所對應的區域確定為投影對象，並對所述投影對象進行掩膜處理，得到所述目標投影圖像，或者，若所述位姿絕對值矩陣中的每個位姿絕對值均小於或者等於所述對應閥值，則所述電腦設備將所述第一初始圖像確定為所述目標圖像，並將所述初始投影圖像確定為所述目標投影圖像。 In at least one embodiment of the present application, the preset threshold matrix refers to a preset matrix with the same dimension as the pose absolute value matrix, and the preset threshold matrix contains multiple thresholds. In at least one embodiment of the present application, the computer device identifies the target image of the first initial image and the target projection of the initial projection image based on the pose absolute value matrix and the preset threshold matrix. The image includes: the computer device compares each pose absolute value in the pose absolute value matrix with the corresponding threshold in the preset threshold matrix. If there is at least one in the pose absolute value matrix, If an absolute value of pose is greater than the corresponding threshold, then the computer device determines the first static image as the target image, and identifies the movement of the dynamic object in the first initial image. position, determine the area corresponding to the dynamic position in the initial projection image as the projection object, and perform mask processing on the projection object to obtain the target projection image, or if the pose If the absolute value of each pose in the absolute value matrix is less than or equal to the corresponding threshold, then the computer device determines the first initial image as the target image, and converts the initial projected image Determine the target projection image.

透過上述實施方式，當存在至少一個位姿絕對值大於對應的閥值時，確定所述動態對象發生移動，將所述第一靜態圖像確定為所述目標圖像，並將所述初始投影圖像中所述動態類別對應的動態對象進行掩膜處理，由於所述動態類別對應的動態對象的位置發生了變化，因此所述動態對象對應的像素點的深度值發生了變化，在計算損失值時不使用所述深度值進行計算，能夠避免發生移動的動態對象對計算損失值的影響，當所述位姿絕對值矩陣中所有的位姿絕對值均小於或者等於對應的閥值時，確定所述動態對象沒有發生移動，將所述第一初始圖像確定為所述目標圖像，並將所述初始投影圖像確定為所述目標投影圖像，能夠準確的計算出所述損失值。 Through the above embodiments, when there is at least one pose whose absolute value is greater than the corresponding threshold, it is determined that the dynamic object has moved, the first static image is determined as the target image, and the initial projection is The dynamic object corresponding to the dynamic category in the image is masked. Since the position of the dynamic object corresponding to the dynamic category has changed, the depth value of the pixel corresponding to the dynamic object has changed. When calculating the loss The depth value is not used for calculation, which can avoid the impact of moving dynamic objects on the calculation of the loss value. When all the absolute values of the poses in the pose absolute value matrix are less than or equal to the corresponding threshold, Determine that the dynamic object has not moved, By determining the first initial image as the target image and the initial projection image as the target projection image, the loss value can be accurately calculated.

步驟108，基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型。 Step 108: Based on the gradient error between the initial depth image and the target image and the photometric error between the target projection image and the target image, adjust the depth recognition network to obtain the depth Identify the model.

在本申請的至少一個實施例中，所述深度識別模型是指對所述深度識別網路進行調整後生成的模型。在本申請的至少一個實施例中，所述電腦設備基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型包括：所述電腦設備基於所述梯度誤差及所述光度誤差計算所述深度識別網路的深度損失值，進一步地，所述電腦設備基於所述深度損失值調整所述深度識別網路，直至所述深度損失值下降到最低，得到所述深度識別模型。 In at least one embodiment of the present application, the depth recognition model refers to a model generated after adjusting the depth recognition network. In at least one embodiment of the present application, the computer device is based on a gradient error between the initial depth image and the target image and a photometric error between the target projection image and the target image. , adjusting the depth recognition network to obtain a depth recognition model includes: the computer device calculates the depth loss value of the depth recognition network based on the gradient error and the photometric error, and further, the computer device calculates the depth loss value of the depth recognition network based on the gradient error and the photometric error. The depth loss value adjusts the depth recognition network until the depth loss value drops to a minimum, and the depth recognition model is obtained.

具體地，所述深度損失值的計算公式為：Lc=Lt+Ls；其中，Lc表示所述深度損失值，Lt表示所述光度誤差，Ls表示所述梯度誤差。 Specifically, the calculation formula of the depth loss value is: Lc = Lt + Ls ; where, Lc represents the depth loss value, Lt represents the photometric error, and Ls represents the gradient error.

其中，所述光度誤差的計算公式為：

；其中，Lt表示所述光度誤差，α為預設的平衡參數，一般取值為0.85，SSIM(x,y)表示所述目標投影圖像與所述目標圖像之間的結構相似指數，∥x _i-y _i∥表示所述目標投影圖像與所述目標圖像之間的灰度差值，x _i表示所述目標投影圖像第i個像素點的像素值，y _i表示所述目標圖像中與所述第i個像素點對應的像素點的像素值值。 Among them, the calculation formula of the photometric error is:

; Among them, Lt represents the photometric error, α is a preset balance parameter, generally the value is 0.85, SSIM ( x, y ) represents the structural similarity index between the target projection image and the target image, ∥ x _i - y _i ∥ represents the grayscale difference between the target projection image and the target image, xi represents the pixel value of the i- th pixel _of the target projection image, y _i represents the The pixel value of the pixel corresponding to the i- th pixel in the target image.

其中，所述結構相似指數的計算公式為：

；c ₁=(K ₁ L)²；c ₂=(K ₂ L)²；其中，SSIM(x,y)為所述結構相似指數，x為所述目標投影圖像，y為所述目標圖像，μ _x為所述目標投影圖像的灰度平均值，μ _y為所述目標圖像的灰度平均值，σ _x為所述目標投影圖像的灰度標準差，σ _y為所述目標圖像的灰度標準差，σ _xy為所述目標投影圖像與所述目標圖像之間的灰度協方差，c ₁及c ₂均為預設參數， L為所述目標圖像中最大的像素值，K ₁及K ₂是預先設置的常數，且K ₁<<1，K ₂<<1。 Wherein, the calculation formula of the structural similarity index is:

; c ₁ =( K ₁ L ) ² ; c ₂ =( K ₂ L ) ² ; Where, SSIM ( x,y ) is the structural similarity index, x is the target projection image, and y is the target image, μ _x is the grayscale average of the target projection image, μ _y is the grayscale average of the target image, σ _x is the grayscale standard deviation of the target projection image, σ _y is The grayscale standard deviation of the target image, σ _xy is the grayscale covariance between the target projection image and the target image, c ₁ and c ₂ are both preset parameters, L is the target The maximum pixel value in the image, K ₁ and K ₂ are preset constants, and K ₁ <<1, K ₂ <<1.

所述梯度誤差的計算公式為：

；其中，Ls表示所述梯度誤差，x表示所述初始深度圖像，y表示所述目標圖像，D(u，v)表示所述初始深度圖像中第i個像素點的像素座標，I(u，v)表示所述目標圖像中第i個像素點的像素座標。 The calculation formula of the gradient error is:

;wherein, Ls represents the gradient error, x represents the initial depth image, y represents the target image, D ( u , v ) represents the pixel coordinates of the i -th pixel in the initial depth image, I ( u , v ) represents the pixel coordinate of the i- th pixel in the target image.

透過上述實施方式，由於避免了發生移動的動態對象對計算所述深度識別網路的損失值的影響，因此能夠提高所述深度識別模型的精度。 Through the above implementation, since the influence of moving dynamic objects on the calculation of the loss value of the depth recognition network is avoided, the accuracy of the depth recognition model can be improved.

步驟109，將所述待識別圖像輸入到所述深度識別模型中，得到所述待識別圖像的目標深度圖像及所述待識別圖像的深度資訊。 Step 109: Input the image to be recognized into the depth recognition model to obtain the target depth image of the image to be recognized and the depth information of the image to be recognized.

在本申請的至少一個實施例中，所述目標深度圖像是指包含所述待識別圖像中每個像素點的深度資訊的圖像，所述待識別圖像中每個像素點的深度資訊是指所述待識別圖像中每個像素點對應的待識別對象與所述拍攝設備之間的距離。在本申請的至少一個實施例中，所述目標深度圖像的生成方式與所述初始深度圖像的生成方式基本一致，故本申請在此不做贅述。 In at least one embodiment of the present application, the target depth image refers to an image containing depth information of each pixel in the image to be identified, and the depth of each pixel in the image to be identified is Information refers to the distance between the object to be identified corresponding to each pixel in the image to be identified and the shooting device. In at least one embodiment of the present application, the method of generating the target depth image is basically the same as the method of generating the initial depth image, so the details will not be described here in this application.

在本申請的至少一個實施例中，所述電腦設備獲取所述目標深度圖像中每個像素點的像素值作為所述待識別圖像中對應的像素點的深度資訊。 In at least one embodiment of the present application, the computer device obtains the pixel value of each pixel in the target depth image as the depth information of the corresponding pixel in the image to be recognized.

透過上述實施方式，由於提升了所述深度識別模型的精度，因此能夠提高所述待識別圖像的深度識別的精確度。 Through the above embodiments, since the accuracy of the depth recognition model is improved, the accuracy of depth recognition of the image to be recognized can be improved.

綜上所述，本申請對所述第一初始圖像進行預處理，能夠準確地確定出所述第一初始圖像中的動態對象及靜態對象，當存在至少一個位姿絕對值大於對應的閥值時，確定所述動態對象發生移動，當所述位姿絕對值矩陣中所有的位姿絕對值均小於或者等於對應的閥值時，確定所述動態對象沒有發生移動，因此能夠避免將所述初始圖像中未發生移動的動態對象進行掩膜處理，當所述第一初始圖像中的動態對象發生移動時，將所述第一初始圖像中的動態對象進行掩膜處理，得到目標圖像，並將所述初始投影圖像中的動態對象進行掩膜處理，得到所述目標投影圖像，基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型，由於基於所述梯度誤差及所述光度誤差對所述深度神經網路進行調整時能夠避免將所述初始圖像中未發生移動的動態對象進行掩膜處理，因此能夠提高所述深度識別模型的精度，進而能夠提高圖像的深度識別的精確度。 To sum up, this application preprocesses the first initial image and can accurately determine the dynamic objects and static objects in the first initial image. When there is at least one pose whose absolute value is greater than the corresponding When the threshold value is reached, it is determined that the dynamic object has moved. When all the absolute values of the poses in the pose absolute value matrix are less than or equal to the corresponding threshold value, it is determined that the dynamic object has not moved. Therefore, it can be avoided that the dynamic object has moved. The dynamic objects that have not moved in the initial image are masked. When the dynamic objects in the first initial image move, the dynamic objects in the first initial image are masked. The object is masked to obtain a target image, and the dynamic object in the initial projection image is masked to obtain the target projection image. Based on the relationship between the initial depth image and the target image The gradient error between the target projection image and the photometric error between the target projection image and the target image, the depth recognition network is adjusted to obtain a depth recognition model, because the depth recognition model is obtained based on the gradient error and the photometric error When the deep neural network is adjusted, it can avoid masking dynamic objects that have not moved in the initial image, so it can improve the accuracy of the depth recognition model, thereby improving the accuracy of the depth recognition of the image.

如圖4所示，是本申請實現圖像深度識別方法的較佳實施例的電腦設備的結構示意圖。 As shown in Figure 4, it is a schematic structural diagram of a computer device for implementing a preferred embodiment of the image depth recognition method of the present application.

在本申請的一個實施例中，所述電腦設備1包括，但不限於，儲存器12、處理器13，以及儲存在所述儲存器12中並可在所述處理器13上運行的電腦程式，例如深度識別程式。本領域技術人員可以理解，所述示意圖僅僅是電腦設備1的示例，並不構成對電腦設備1的限定，可以包括比圖示更多或更少的部件，或者組合某些部件，或者不同的部件，例如所述電腦設備1還可以包括輸入輸出設備、網路接入設備、匯流排等。 In one embodiment of the present application, the computer device 1 includes, but is not limited to, a storage 12, a processor 13, and a computer program stored in the storage 12 and capable of running on the processor 13. , such as deep recognition programs. Those skilled in the art can understand that the schematic diagram is only an example of the computer device 1 and does not constitute a limitation on the computer device 1. It may include more or less components than shown in the diagram, or some components may be combined, or different components may be used. Components, for example, the computer device 1 may also include input and output devices, network access devices, buses, etc.

所述處理器13可以是中央處理單元(Central Processing Unit，CPU)，還可以是其他通用處理器、數位訊號處理器(Digital Signal Processor，DSP)、專用積體電路(Application Specific Integrated Circuit，ASIC)、現場可程式設計閘陣列(Field-Programmable Gate Array，FPGA)或者其他可程式設計邏輯器件、分立元器件門電路或者電晶體組件、分立硬體組件等。通用處理器可以是微處理器或者該處理器也可以是任何常規的處理器等，所述處理器13是所述電腦設備1的運算核心和控制中心，利用各種介面和線路連接整個電腦設備1的各個部分，及獲取所述電腦設備1的作業系統以及安裝的各類應用程式、程式碼等。例如，所述處理器13可以透過介面獲取所述拍攝設備2拍攝到的所述待識別圖像。所述處理器13獲取所述電腦設備1的作業系統以及安裝的各類應用程式。所述處理器13獲取所述應用程式以實現上述各個圖像深度識別方法實施例中的步驟，例如圖2所示的步驟。 The processor 13 may be a central processing unit (CPU), or other general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC). , Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete component gate circuits or transistor components, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor can be any conventional processor, etc. The processor 13 is the computing core and control center of the computer device 1 and uses various interfaces and lines to connect the entire computer device 1 various parts, and obtain the operating system of the computer device 1 and various installed applications, program codes, etc. For example, the processor 13 can obtain the image to be recognized captured by the photographing device 2 through an interface. The processor 13 obtains the operating system of the computer device 1 and various installed applications. The processor 13 obtains the application program to implement each of the above image depth recognition methods. The steps in the example are as shown in Figure 2.

示例性的，所述電腦程式可以被分割成一個或多個模組/單元，所述一個或者多個模組/單元被儲存在所述儲存器12中，並由所述處理器13獲取，以完成本申請。所述一個或多個模組/單元可以是能夠完成特定功能的一系列電腦程式指令段，該指令段用於描述所述電腦程式在所述電腦設備1中的獲取過程。 For example, the computer program may be divided into one or more modules/units, and the one or more modules/units are stored in the storage 12 and retrieved by the processor 13, to complete this application. The one or more modules/units may be a series of computer program instruction segments capable of completing specific functions. The instruction segments are used to describe the acquisition process of the computer program in the computer device 1 .

所述儲存器12可用於儲存所述電腦程式和/或模組，所述處理器13透過運行或獲取儲存在所述儲存器12內的電腦程式和/或模組，以及調用儲存在儲存器12內的資料，實現所述電腦設備1的各種功能。所述儲存器12可主要包括儲存程式區和儲存資料區，其中，儲存程式區可儲存作業系統、至少一個功能所需的應用程式(比如聲音播放功能、圖像播放功能等)等；儲存資料區可儲存根據電腦設備的使用所創建的資料等。此外，儲存器12可以包括非易失性儲存器，例如硬碟、儲存器、插接式硬碟，智慧儲存卡(Smart Media Card,SMC)，安全數位(Secure Digital,SD)卡，記憶卡(Flash Card)、至少一個磁碟儲存器件、快閃儲存器器件、或其他非易失性固態儲存器件。所述儲存器12可以是電腦設備1的外部儲存器和/或內部儲存器。進一步地，所述儲存器12可以是具有實物形式的儲存器，如儲存器條、TF卡(Trans-flash Card)等等。 The storage 12 can be used to store the computer programs and/or modules. The processor 13 runs or obtains the computer programs and/or modules stored in the storage 12 and calls the computer programs and/or modules stored in the storage. The information in 12 realizes various functions of the computer device 1. The storage 12 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; a storage data area. Areas can store information created based on the use of computer equipment, etc. In addition, the storage 12 may include non-volatile storage, such as a hard disk, a storage device, a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a memory card (Flash Card), at least one disk storage device, flash memory device, or other non-volatile solid-state storage device. The storage 12 may be an external storage and/or an internal storage of the computer device 1 . Further, the storage 12 may be a storage in a physical form, such as a storage stick, a TF card (Trans-flash Card), and so on.

所述電腦設備1集成的模組/單元如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用時，可以儲存在一個電腦可讀取儲存介質中。基於這樣的理解，本申請實現上述實施例方法中的全部或部分流程，也可以透過電腦程式來指令相關的硬體來完成，所述的電腦程式可儲存於一電腦可讀儲存介質中，該電腦程式在被處理器獲取時，可實現上述各個方法實施例的步驟。 If the integrated modules/units of the computer equipment 1 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the present application can implement all or part of the processes in the above embodiment methods by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program can be stored in a computer-readable storage medium. When acquired by the processor, the computer program can implement the steps of each of the above method embodiments.

其中，所述電腦程式包括電腦程式代碼，所述電腦程式代碼可以為原始程式碼形式、對象代碼形式、可獲取檔或某些中間形式等。所述電腦可讀介質可以包括：能夠攜帶所述電腦程式代碼的任何實體或裝置、記錄介質、隨身碟、移動硬碟、磁碟、光碟、電腦儲存器、唯讀儲存器(ROM，Read-Only Memory)。 Wherein, the computer program includes computer program code, and the computer program code can be in the form of original program code, object code form, obtainable file or some intermediate form, etc. The computer-readable medium may include: any entity or device, recording medium, or device capable of carrying the computer program code. Pen drives, mobile hard drives, magnetic disks, optical disks, computer storage, read-only memory (ROM, Read-Only Memory).

結合圖2，所述電腦設備1中的所述儲存器12儲存多個指令以實現一種圖像深度識別方法，所述處理器13可獲取所述多個指令從而實現：獲取待識別圖像、第一初始圖像及第二初始圖像，並獲取深度識別網路及位姿網路；基於所述深度識別網路對所述第一初始圖像進行深度識別，得到初始深度圖像；對所述第一初始圖像進行預處理，得到與所述第一初始圖像對應的第一靜態圖像和第一動態圖像，並對所述第二初始圖像進行預處理，得到與所述第二初始圖像對應的第二靜態圖像和第二動態圖像；基於所述第一靜態圖像、所述第一動態圖像、所述第二靜態圖像、所述第二動態圖像及所述位姿網路生成位姿絕對值矩陣；將所述第一初始圖像及所述第二初始圖像輸入所述位姿網路中，得到目標位姿矩陣；基於所述第一初始圖像、所述初始深度圖像及所述目標位姿矩陣生成所述第一初始圖像的初始投影圖像；根據所述位姿絕對值矩陣及預設閥值矩陣識別所述第一初始圖像的目標圖像及所述初始投影圖像的目標投影圖像；基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型；將所述待識別圖像輸入到所述深度識別模型中，得到所述待識別圖像的目標深度圖像及所述待識別圖像的深度資訊。 2 , the storage 12 in the computer device 1 stores multiple instructions to implement an image depth recognition method, and the processor 13 can obtain the multiple instructions to achieve: obtain the image to be recognized, The first initial image and the second initial image, and obtain the depth recognition network and the pose network; perform depth recognition on the first initial image based on the depth recognition network to obtain the initial depth image; The first initial image is preprocessed to obtain the first static image and the first dynamic image corresponding to the first initial image, and the second initial image is preprocessed to obtain the first static image and the first dynamic image corresponding to the first initial image. The second static image and the second dynamic image corresponding to the second initial image; based on the first static image, the first dynamic image, the second static image, the second dynamic image The image and the pose network generate a pose absolute value matrix; input the first initial image and the second initial image into the pose network to obtain a target pose matrix; based on the The first initial image, the initial depth image and the target pose matrix generate an initial projection image of the first initial image; identify the first initial image according to the pose absolute value matrix and the preset threshold matrix. The target image of the first initial image and the target projection image of the initial projection image; based on the gradient error between the initial depth image and the target image and the target projection image and the The photometric error between target images is used to adjust the depth recognition network to obtain a depth recognition model; input the image to be recognized into the depth recognition model to obtain the target depth image of the image to be recognized. and the depth information of the image to be recognized.

具體地，所述處理器13對上述指令的具體實現方法可參考圖2對應實施例中相關步驟的描述，在此不贅述。 Specifically, for the specific implementation method of the above instructions by the processor 13, reference can be made to the description of the relevant steps in the corresponding embodiment in Figure 2, which will not be described again here.

在本申請所提供的幾個實施例中，應該理解到，所揭露的系統，裝置和方法，可以透過其它的方式實現。例如，以上所描述的裝置實施例僅僅是示意性的，例如，所述模組的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式。 In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of modules is only a logical function division, and there may be other division methods in actual implementation.

所述作為分離部件說明的模組可以是或者也可以不是物理上分開的，作為模組顯示的部件可以是或者也可以不是物理單元，即可以位於一個地方，或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部模組來實現本實施例方案的目的。另外，在本申請各個實施例中的各功能模組可以集成在一個處理單元中，也可以是各個單元單獨物理存在，也可以兩個或兩個以上單元集成在一個單元中。上述集成的單元既可以採用硬體的形式實現，也可以採用硬體加軟體功能模組的形式實現。 The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place. party, or can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, each functional module in various embodiments of the present application can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of hardware plus software function modules.

因此，無論從哪一點來看，均應將實施例看作是示範性的，而且是非限制性的，本申請的範圍由所附請求項而不是上述說明限定，因此旨在將落在請求項的等同要件的含義和範圍內的所有變化涵括在本申請內。不應將請求項中的任何附關聯圖標記視為限制所涉及的請求項。 Therefore, the embodiments should be regarded as illustrative and non-restrictive from any point of view, and the scope of the present application is defined by the appended claims rather than the above description, and it is therefore intended that those falling within the claims All changes within the meaning and scope of the equivalent elements are included in this application. Any associated association markup in a request item should not be considered to limit the request item in question.

此外，顯然“包括”一詞不排除其他單元或步驟，單數不排除複數。本申請中陳述的多個單元或裝置也可以由一個單元或裝置透過軟體或者硬體來實現。第一、第二等詞語用來表示名稱，而並不表示任何特定的順序。 Furthermore, it is clear that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. Multiple units or devices stated in this application may also be implemented by one unit or device through software or hardware. The words first, second, etc. are used to indicate names and do not indicate any specific order.

最後應說明的是，以上實施例僅用以說明本申請的技術方案而非限制，儘管參照較佳實施例對本申請進行了詳細說明，本領域的普通技術人員應當理解，可以對本申請的技術方案進行修改或等同替換，而不脫離本申請技術方案的精神和範圍。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application and are not limiting. Although the present application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present application can be modified. Modifications or equivalent substitutions may be made without departing from the spirit and scope of the technical solution of the present application.

101-109:步驟 101-109: Steps

Claims

一種圖像深度識別方法，執行於電腦設備，其中，所述圖像深度識別方法包括：獲取待識別圖像、第一初始圖像及第二初始圖像，並獲取深度識別網路及位姿網路；基於所述深度識別網路對所述第一初始圖像進行深度識別，得到初始深度圖像；對所述第一初始圖像進行預處理，得到與所述第一初始圖像對應的第一靜態圖像和第一動態圖像，並對所述第二初始圖像進行預處理，得到與所述第二初始圖像對應的第二靜態圖像和第二動態圖像；基於所述第一靜態圖像、所述第一動態圖像、所述第二靜態圖像、所述第二動態圖像及所述位姿網路生成位姿絕對值矩陣；將所述第一初始圖像及所述第二初始圖像輸入所述位姿網路中，得到目標位姿矩陣；基於所述第一初始圖像、所述初始深度圖像及所述目標位姿矩陣生成所述第一初始圖像的初始投影圖像；根據所述位姿絕對值矩陣及預設閥值矩陣識別所述第一初始圖像的目標圖像及所述初始投影圖像的目標投影圖像；基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型；將所述待識別圖像輸入到所述深度識別模型中，得到所述待識別圖像的目標深度圖像及所述待識別圖像的深度資訊。 An image depth recognition method, executed on a computer device, wherein the image depth recognition method includes: acquiring an image to be recognized, a first initial image and a second initial image, and acquiring a depth recognition network and pose network; perform depth recognition on the first initial image based on the depth recognition network to obtain an initial depth image; perform preprocessing on the first initial image to obtain a corresponding value to the first initial image The first static image and the first dynamic image, and preprocessing the second initial image to obtain the second static image and the second dynamic image corresponding to the second initial image; based on The first static image, the first dynamic image, the second static image, the second dynamic image and the pose network generate a pose absolute value matrix; convert the first The initial image and the second initial image are input into the pose network to obtain a target pose matrix; based on the first initial image, the initial depth image and the target pose matrix, the The initial projection image of the first initial image; identifying the target image of the first initial image and the target projection image of the initial projection image according to the pose absolute value matrix and the preset threshold matrix ; Based on the gradient error between the initial depth image and the target image and the photometric error between the target projection image and the target image, adjust the depth recognition network to obtain a depth recognition model ; Input the image to be recognized into the depth recognition model to obtain the target depth image of the image to be recognized and the depth information of the image to be recognized.

如請求項1所述的圖像深度識別方法，其中，所述對所述第一初始圖像進行預處理，得到與所述第一初始圖像對應的第一靜態圖像和第一動態圖像包括：基於所述第一初始圖像中每個像素點的像素值計算所述第一初始圖像中每個像素點的單個評分值；基於所述單個評分值及多個預設對象，計算所述第一初始圖像中每個像素點在每個預設對象上的類別概率；將取值最大的類別概率所對應的預設對象確定為該像素點所對應的像素對象；將所述第一初始圖像中相同像素對象的像素點所構成的像素區域確定為初始對象；根據預設規則對所述初始對象進行分類，得到所述第一初始圖像中與動態類別對應的動態對象以及與靜態類別對應的靜態對象；對所述第一初始圖像中的所述動態對象進行掩膜處理，得到所述第一靜態圖像；對所述第一初始圖像中的所述靜態對象進行掩膜處理，得到所述第一動態圖像。 The image depth recognition method according to claim 1, wherein the first initial image is pre-processed to obtain a first static image and a first dynamic image corresponding to the first initial image. Like include: Calculate a single score value of each pixel point in the first initial image based on the pixel value of each pixel point in the first initial image; based on the single score value and multiple preset objects, calculate the The category probability of each pixel on each preset object in the first initial image; determine the preset object corresponding to the largest category probability as the pixel object corresponding to the pixel; determine the first The pixel area composed of pixel points of the same pixel object in the initial image is determined as the initial object; the initial object is classified according to the preset rules to obtain the dynamic objects corresponding to the dynamic category in the first initial image and the dynamic objects corresponding to the dynamic categories in the first initial image. Static objects corresponding to the static category; performing mask processing on the dynamic objects in the first initial image to obtain the first static image; performing mask processing on the static objects in the first initial image Mask processing is performed to obtain the first dynamic image.

如請求項1所述的圖像深度識別方法，其中，所述基於所述第一靜態圖像、所述第一動態圖像、所述第二靜態圖像、所述第二動態圖像及所述位姿網路生成位姿絕對值矩陣包括：將所述第一靜態圖像及所述第二靜態圖像輸入到所述位姿網路中，得到靜態位姿矩陣；將所述第一動態圖像及所述第二動態圖像輸入到所述位姿網路中，得到動態位姿矩陣；將所述靜態位姿矩陣中的每個矩陣元素與所述動態位姿矩陣中對應的矩陣元素進行相減運算，得到位姿差值；對所述位姿差值取絕對值，得到所述靜態位姿矩陣中每個矩陣元素的位姿絕對值；根據所述靜態位姿矩陣中每個矩陣元素的元素位置，將所述位姿絕對值進行排列，得到所述位姿絕對值矩陣。 The image depth recognition method according to claim 1, wherein the method is based on the first static image, the first dynamic image, the second static image, the second dynamic image and The pose network generates a pose absolute value matrix including: inputting the first static image and the second static image into the pose network to obtain a static pose matrix; A dynamic image and the second dynamic image are input into the pose network to obtain a dynamic pose matrix; each matrix element in the static pose matrix corresponds to the dynamic pose matrix Perform a subtraction operation on the matrix elements to obtain the pose difference; take the absolute value of the pose difference to obtain the absolute value of the pose of each matrix element in the static pose matrix; according to the static pose matrix The element position of each matrix element in , the absolute value of the pose is Arrange rows to obtain the pose absolute value matrix.

如請求項2所述的圖像深度識別方法，其中，所述第一初始圖像及所述第二初始圖像為同一拍攝設備拍攝的圖像，所述將所述第一初始圖像及所述第二初始圖像輸入所述位姿網路中，得到目標位姿矩陣包括：將所述第一初始圖像中所述動態對象對應的像素點確定為第一像素點；獲取所述第一像素點的第一齊次座標矩陣，並獲取所述第一像素點在所述第二初始圖像中對應的第二像素點的第二齊次座標矩陣；獲取所述拍攝設備的內參矩陣的逆矩陣；根據所述第一齊次座標矩陣及所述內參矩陣的逆矩陣計算出所述第一像素點的第一相機座標，並根據所述第二齊次座標矩陣及所述內參矩陣的逆矩陣計算出所述第二像素點的第二相機座標；基於預設對極約束關係式對所述第一相機座標及所述第二相機座標進行計算，得到旋轉矩陣及平移矩陣；將所述旋轉矩陣及所述平移矩陣進行拼接，得到所述目標位姿矩陣。 The image depth recognition method according to claim 2, wherein the first initial image and the second initial image are images taken by the same shooting device, and the first initial image and the second initial image are The second initial image is input into the pose network, and obtaining the target pose matrix includes: determining the pixel corresponding to the dynamic object in the first initial image as the first pixel; obtaining the The first homogeneous coordinate matrix of the first pixel point, and obtain the second homogeneous coordinate matrix of the second pixel point corresponding to the first pixel point in the second initial image; obtain the internal parameters of the shooting device The inverse matrix of the matrix; calculate the first camera coordinate of the first pixel point according to the inverse matrix of the first homogeneous coordinate matrix and the internal parameter matrix, and calculate the first camera coordinate of the first pixel point according to the second homogeneous coordinate matrix and the internal parameter matrix The inverse matrix of the matrix calculates the second camera coordinates of the second pixel point; calculates the first camera coordinates and the second camera coordinates based on the preset epipolar constraint relationship to obtain a rotation matrix and a translation matrix; The rotation matrix and the translation matrix are concatenated to obtain the target pose matrix.

如請求項2所述的圖像深度識別方法，其中，所述根據所述位姿絕對值矩陣及預設閥值矩陣識別所述第一初始圖像的目標圖像及所述初始投影圖像的目標投影圖像包括：將所述位姿絕對值矩陣中的每個位姿絕對值與所述預設閥值矩陣中對應閥值進行比較；若所述位姿絕對值矩陣中存在至少一個大於所述對應閥值的位姿絕對值，則將所述第一靜態圖像確定為所述目標圖像，識別所述動態對象在所述第一初始圖像中的動態位置，將所述初始投影圖像中與所述動態位置所對應的區域確定為投影對象，並對所述投影對象進行掩膜處理，得到所述目標投影圖像；或者若所述位姿絕對值矩陣中的每個位姿絕對值均小於或者等於所述對應閥值，將所述第一初始圖像確定為所述目標圖像，並將所述初始投影圖像確定為所述目標投影圖像。 The image depth recognition method according to claim 2, wherein the target image and the initial projection image of the first initial image are identified according to the pose absolute value matrix and the preset threshold matrix. The target projection image includes: comparing each pose absolute value in the pose absolute value matrix with the corresponding threshold in the preset threshold matrix; if there is at least one pose absolute value matrix If the absolute value of the pose is greater than the corresponding threshold, the first static image is determined as the target image, the dynamic position of the dynamic object in the first initial image is identified, and the The area corresponding to the dynamic position in the initial projection image is determined as the projection object, and the projection object is masked to obtain the target projection image; or if each position in the pose absolute value matrix The absolute values of each pose are less than or equal to the corresponding threshold, the first initial image is determined as the target image, and the initial projection image is determined as the Target projection image.

如請求項1所述的圖像深度識別方法，其中，所述基於所述第一初始圖像、所述初始深度圖像及所述目標位姿矩陣生成所述第一初始圖像的初始投影圖像包括：獲取所述第一初始圖像中每個像素點的目標齊次座標矩陣，並從所述初始深度圖像中獲取所述第一初始圖像中每個像素點的深度值；基於所述目標位姿矩陣、每個像素點的目標齊次座標矩陣及每個像素點的深度值計算出所述第一初始圖像中每個像素點的投影座標；根據每個像素點的投影座標對每個像素點進行排列處理，得到所述初始投影圖像。 The image depth recognition method according to claim 1, wherein the initial projection of the first initial image is generated based on the first initial image, the initial depth image and the target pose matrix. The image includes: obtaining a target homogeneous coordinate matrix of each pixel in the first initial image, and obtaining a depth value of each pixel in the first initial image from the initial depth image; The projection coordinates of each pixel in the first initial image are calculated based on the target pose matrix, the target homogeneous coordinate matrix of each pixel and the depth value of each pixel; according to the The projection coordinates are used to arrange each pixel point to obtain the initial projection image.

如請求項1所述的圖像深度識別方法，其中，所述基於所述初始深度圖像與所述目標圖像之間的梯度誤差及所述目標投影圖像與所述目標圖像之間的光度誤差，調整所述深度識別網路，得到深度識別模型包括：基於所述梯度誤差及所述光度誤差計算所述深度識別網路的深度損失值；基於所述深度損失值調整所述深度識別網路，直至所述深度損失值下降到最低，得到所述深度識別模型。 The image depth recognition method according to claim 1, wherein the gradient error between the initial depth image and the target image and the gradient error between the target projection image and the target image are The photometric error, adjusting the depth recognition network to obtain a depth recognition model includes: calculating the depth loss value of the depth recognition network based on the gradient error and the photometric error; adjusting the depth based on the depth loss value Identify the network until the depth loss value drops to the minimum to obtain the depth recognition model.

如請求項7所述的圖像深度識別方法，其中，所述光度誤差的計算公式為：

其中，Lt表示所述光度誤差，α為預設的平衡參數，SSIM(x,y)表示所述目標投影圖像與所述目標圖像之間的結構相似指數，∥x _i-y _i∥表示所述目標投影圖像與所述目標圖像之間的灰度差值，x _i表示所述目標投影圖像中第i個像素點的像素值，y _i表示所述目標圖像中與所述第i個像素點對應的像素點的像素值。 The image depth recognition method as described in claim 7, wherein the calculation formula of the photometric error is:

Among them, Lt represents the photometric error, α is the preset balance parameter, SSIM ( x, y ) represents the structural similarity index between the target projection image and the target image, ∥ x _i - y _i ∥ represents the grayscale difference between the target projection image and the target image, xi represents the pixel value of the i- th pixel in the target projection image, y _i represents the difference between the target _image and The pixel value of the pixel corresponding to the i -th pixel.

一種電腦設備，其中，所述電腦設備包括：儲存器，儲存至少一個指令；及處理器，獲取所述儲存器中儲存的指令以實現如請求項1至8中任意一項所述的圖像深度識別方法。 A computer device, wherein the computer device includes: a storage to store at least one instruction; and a processor to obtain the instructions stored in the storage to implement any one of claims 1 to 8 The image depth recognition method.

一種電腦可讀儲存介質，其中：所述電腦可讀儲存介質中儲存有至少一個指令，所述至少一個指令被電腦設備中的處理器執行以實現如請求項1至8中任意一項所述的圖像深度識別方法。 A computer-readable storage medium, wherein: at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is executed by a processor in a computer device to implement any one of claims 1 to 8 image depth recognition method.