WO2023011013A1 - Splicing seam search method and apparatus for video image, and video image splicing method and apparatus - Google Patents

Splicing seam search method and apparatus for video image, and video image splicing method and apparatus Download PDF

Info

Publication number
WO2023011013A1
WO2023011013A1 PCT/CN2022/098992 CN2022098992W WO2023011013A1 WO 2023011013 A1 WO2023011013 A1 WO 2023011013A1 CN 2022098992 W CN2022098992 W CN 2022098992W WO 2023011013 A1 WO2023011013 A1 WO 2023011013A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
image
video image
frame
patchwork
Prior art date
Application number
PCT/CN2022/098992
Other languages
French (fr)
Chinese (zh)
Inventor
刘伟舟
胡晨
周舒畅
Original Assignee
北京旷视科技有限公司
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京旷视科技有限公司, 北京迈格威科技有限公司 filed Critical 北京旷视科技有限公司
Publication of WO2023011013A1 publication Critical patent/WO2023011013A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • G06T3/047Fisheye or wide-angle transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • the present disclosure relates to the technical field of video processing, and in particular, to a seam search method for video images, and a splicing method and device for video images.
  • Panoramic video stitching refers to splicing multiple videos with overlapping fields of view. Specifically, the frame images in multiple videos correspond one-to-one, and the frame images corresponding to each other in multiple videos are spliced to obtain a 360° panoramic view.
  • the panoramic video stitching process it is usually necessary to search for the stitching of each frame of images in multiple videos, and then realize the stitching of each frame of images based on the searched stitching;
  • the image stitching algorithm in the related art mainly Applied to the splicing of a single image, when using this image splicing algorithm to splice multiple videos to be spliced, it is easy to have large differences in the splicing areas of the front and rear frame images, and the spliced video shakes during playback, which affects the panoramic video stitching effect.
  • the present disclosure provides a seam search method for video images, a video image splicing method and a device, so as to alleviate the shaking of spliced videos during playback and improve the splicing effect of panoramic videos.
  • the present disclosure provides a patchwork search method for a video image
  • the method may include: acquiring an energy map of each frame of video image in the first video; wherein, the energy map is used to indicate the location area and edge of a specified object in the video image; for In the first video, the first frame video image, based on the energy map of the first frame video image, determines the patchwork search result of the first frame video image; wherein, the patchwork search result includes: the patchwork area of the video image and the target image; The target image is the video image corresponding to the video image in the second video; for each frame of video image in the first video except the first frame, based on the patchwork search result of the previous frame video image of the current video image, determine the current video image The patchwork search area range of the image; within the patchwork search area range, determine the patchwork search result of the current video image based on the energy map of the current video image.
  • the step of obtaining the energy map of each frame of video image in the first video may include: obtaining a saliency target energy map, a moving target energy map and an edge target energy map of each frame of video image in the first video; for each frame The video image is fused with the salient object energy map, the moving object energy map and the edge energy map corresponding to the video image of the frame to obtain the energy map of the video image of the frame.
  • the step of obtaining the saliency target energy map, the moving target energy map and the edge energy map of each frame of video image in the first video may include: for each frame of video image in the first video, input the video image to In the preset neural network model, to output the saliency target energy map of the frame video image through the preset neural network model; based on the moving target in the frame video image, determine the moving target energy map of the frame video image; Edge detection is performed on each object in the video image, and the edge energy map of the frame video image is obtained.
  • the step of determining the patchwork search result of the first frame of video image may include: for the first frame of video in the first video Image, based on the energy map of the first frame of video image, the dynamic programming algorithm is used to calculate the patchwork search result of the first frame of video image.
  • the step of determining the patchwork search result of the current video image based on the energy map of the current video image may include: for each frame of video image in the first video except the first frame, in the previous frame of the current video image Based on the patchwork search results of the video image, add preset constraints to determine the patchwork search area of the current video image; within the patchwork search area, based on the energy map of the current video image, use a dynamic programming algorithm to determine Patchwork search results for the current video image.
  • each frame of video image has an overlapping area with the target image corresponding to the video image, the area corresponding to the overlapping area in the video image is the first overlapping area, and the corresponding area in the target image is the second overlapping area.
  • overlapping region the method may also include: for each frame of video image, the image corresponding to the first overlapping region in the frame of video image, and the image corresponding to the second overlapping region in the target image corresponding to the frame of video image, input to pre-training In a good neural network model, a patchwork prediction result of the frame of video image is obtained; wherein, the patchwork prediction result includes: a patchwork prediction area of the video image and the corresponding target image.
  • the pre-trained neural network model can be determined in the following manner: obtain training samples containing multiple consecutive groups of image pairs to be stitched, and the seam search results of each group of image pairs to be stitched; For each group of image pairs to be stitched, the group of image pairs to be stitched and the seam prediction results of the adjacent previous group of image pairs to be stitched are input into the initial neural network model to pass through the initial neural network.
  • the model outputs the seam prediction results of the group of image pairs to be stitched; based on the seam search results of the group of image pairs to be stitched and the preset loss function, calculate the loss value of the seam prediction results of the group of image pairs to be stitched; based on The loss value updates the weight parameters of the initial neural network model; continue to perform the step of obtaining training samples containing multiple consecutive groups of image pairs to be stitched until the initial neural network model converges to obtain the neural network model.
  • the method may further include: obtaining a preset stitching template; wherein, the preset stitching The stitching template includes a preset stitching area; for the first group of image pairs to be stitched, the first set of image pairs to be stitched and the preset stitching template are input into the initial neural network model, so as to output the first stitching through the initial neural network model.
  • a video image splicing method provided by the present disclosure may include: acquiring a first fisheye video and a second fisheye video; wherein, the fisheye video images in the first fisheye video and the second fisheye video have overlapping Region; extract the first target region of each frame of fisheye video image in the first fisheye video, and the second target region of each frame of fisheye video image in the second fisheye video; for two frames of fisheye video images corresponding to each other , based on the first target area and the second target area corresponding to the two frames of fisheye video images, and the pre-acquired updated expansion parameter values, determine the first isometric distance after the expansion of the frame of fisheye video image in the first fisheye video
  • the projected picture, and the second equidistant projection picture after the frame of the fisheye video image in the second fisheye video is expanded; based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other, determine the seam search result;
  • the patchwork search result is determined by
  • the step of determining the patchwork search result may include: comparing the first equidistant projection picture and the second equidistant projection picture corresponding to each other Alignment; based on the aligned first equidistant projection picture and the second equidistant projection picture, extract the third overlapping area; based on the third overlapping area, perform illumination compensation on the second equidistant projection picture, so that the illumination compensated first
  • the pixel value of each pixel in the second equidistant projection picture matches the pixel value of each pixel in the corresponding first equidistant projection picture; based on the first equidistant projection picture and the second equidistant after illumination compensation Project images to determine patchwork search results.
  • updating the expansion parameter value may include: the field angle parameter value, the parameter value of the optical center in the x-axis direction, the parameter value of the optical center in the y-axis direction, and the fisheye rotation angle parameter value; updating the expansion parameter value is pre-passed Determined in the following way: Obtain the initial expansion parameter value and preset offset range of each expansion parameter; based on the initial expansion parameter value and preset offset range of each expansion parameter, sample each expansion parameter to obtain each The sampling value of the expansion parameter, based on the sampling value of each expansion parameter, determine the third equidistant projection picture after the expansion of the fisheye video image of the frame in the first fisheye video, and the fisheye video of the frame of the second fisheye video The fourth equidistant projection picture after image expansion; extracting the fourth overlapping area of the third equidistant projection picture and the fourth equidistant projection picture; performing cross-correlation calculation on the fourth overlapping area to obtain the first cross-correlation calculation result; based on The calculation
  • the step of determining to update the expansion parameter value may include: repeatedly performing the initial expansion parameter value and the preset offset based on each expansion parameter according to the preset iteration number Range, the step of sampling each expanded parameter to obtain multiple first cross-correlation calculation results; from the multiple first cross-correlation calculation results, select the first cross-correlation calculation result with the largest value; the first cross-correlation calculation result with the largest value The sampling value of the expansion parameter corresponding to the cross-correlation calculation result is determined as the updated expansion parameter value.
  • the step of aligning the corresponding first isometric projection picture and the second isometric projection picture may include: extracting the first feature point from the first isometric projection picture, extracting the first feature point from the second isometric projection picture Extracting the second feature point; determining a matching feature point pair based on the first feature point and the second feature point; aligning the first equidistant projection picture and the second equidistant projection picture based on the matching feature point pair.
  • the step of aligning the corresponding first equidistant projection picture and the second equidistant projection picture may include: moving the second equidistant projection picture in a preset direction; during the moving process, extracting the first equidistant projection picture A plurality of fifth overlapping areas of the projection image and the second equidistant projection image; cross-correlation calculations are performed on the plurality of fifth overlapping areas to obtain a plurality of second cross-correlation calculation results; based on the plurality of second cross-correlation calculation results , to align the first equidistant projection picture with the second equidistant projection picture.
  • the step of aligning the first equidistant projection picture and the second equidistant projection picture may include: selecting the one with the largest value from the multiple second cross-correlation calculation results The second cross-correlation calculation result; obtain the position coordinates of the first boundary pixel point corresponding to the fifth overlapping area corresponding to the second cross-correlation calculation result with the largest value in the first equidistant projection picture, and obtain the position coordinates of the first boundary pixel point in the second equidistant projection picture
  • the step of determining the video stitching result of the video images includes: for each group of two corresponding fisheye video images, based on the group
  • the patchwork search results corresponding to the two frames of fisheye video images in the group determine the fused overlapping area corresponding to the two frames of fisheye video images in the group; replace the fused overlapping area with the third corresponding to the two frames of fisheye video images in the group In the overlapping area, the image stitching result of the two frames of fisheye video images in the group is obtained; based on the image stitching results of the two frames of fisheye video images in each group, the video stitching result of the video images is determined.
  • the present disclosure provides an apparatus for seam search of video images
  • the apparatus may include: a first acquisition module configured to acquire an energy map of each frame of video image in the first video; wherein the energy map is used to indicate The position area and edge of the specified object in the video image; the first determination module may be configured to determine the first frame of the video image based on the energy map of the first frame of the video image for the first frame of the video image in the first video Patchwork search results; wherein, the patchwork search results include: the patchwork area of the video image and the target image; the target image is a video image corresponding to the video image in the second video; the second determination module can be configured for In the first video, every frame of video image except the first frame, based on the seam search result of the previous frame video image of the current video image, determines the seam search area range of the current video image; within the seam search area range, A patchwork search result of the current video image is determined based on the energy map of the current video image.
  • the present disclosure provides a video image splicing device, the device may include: a second acquisition module, which may be configured to acquire a first fisheye video and a second fisheye video; wherein, the first fisheye video and the second The fisheye video image in the fisheye video has an overlapping area; the extraction module can be configured to extract the first target area of each frame of the fisheye video image in the first fisheye video, and each frame of the second fisheye video The second target area of the fisheye video image; the third determination module may be configured to, for two corresponding frames of fisheye video images, based on the first target area and the second target area corresponding to the two frames of fisheye video images , and the pre-acquired update expansion parameter value, determine the first equidistant projection picture after the expansion of the frame of fisheye video image in the first fisheye video, and the expansion of the frame of fisheye video image in the second fisheye video The second equidistant projection picture; the fourth determination module can be configured to determine the patchwork search result based on the first
  • An electronic system provided by the present disclosure may include: an image acquisition device, a processing device, and a storage device; the image acquisition device is used to acquire a preview video frame or image data; a computer program is stored on the storage device, and the computer program is stored in the storage device.
  • the processing device executes the above-mentioned seam search method for video images, or the above-mentioned video image splicing method.
  • the present disclosure provides a computer-readable storage medium, in which a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processing device, the steps of the above-mentioned patchwork search method for video images are executed, or, as above-mentioned The steps of the stitching method of the video image.
  • the present disclosure provides a video image patchwork search method, video image splicing method and device, firstly obtain the energy map of each frame of the video image in the first video; then for the first frame of the video image in the first video, based on the first The energy map of a frame of video image, determine the patchwork search result of the first frame of video image; for each frame of video image in the first video except the first frame, based on the patchwork search of the previous frame of video image of the current video image As a result, the range of the seam search area of the current video image is determined; within the range of the seam search area, the seam search result of the current video image is determined based on the energy map of the current video image.
  • This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area.
  • This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
  • FIG. 1 is a schematic structural diagram of an electronic system provided by an embodiment of the present disclosure
  • FIG. 2 is a flow chart of a seam search method for a video image provided by an embodiment of the present disclosure
  • FIG. 3 is a flow chart of another method for patchwork search of video images provided by an embodiment of the present disclosure
  • FIG. 4 is a flow chart of another method for patchwork search of video images provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of a neural network model training process provided by an embodiment of the present disclosure.
  • FIG. 6 is a flow chart of a video image splicing method provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of an equidistant projection picture expansion method provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a cross-correlation calculation provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of picture alignment provided by an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of a video image patchwork search device provided by an embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram of a video image splicing device provided by an embodiment of the present disclosure.
  • Artificial Intelligence is an emerging science and technology that studies and develops theories, methods, technologies and application systems for simulating and extending human intelligence.
  • the subject of artificial intelligence is a comprehensive subject that involves many technologies such as chips, big data, cloud computing, Internet of Things, distributed storage, deep learning, machine learning, and neural networks.
  • computer vision is specifically to allow machines to recognize the world.
  • Computer vision technology usually includes face recognition, liveness detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, target detection, pedestrian detection, etc.
  • panoramic video stitching refers to the stitching of multiple videos with overlapping fields of view to obtain a 360° panoramic video; panoramic video stitching technology can be applied to application scenarios such as sports cameras, remote conferences, or security monitoring.
  • panoramic video stitching can be realized by combining multiple wide-angle cameras.
  • the coverage field of view of multiple wide-angle cameras needs to exceed 360°.
  • Fisheye cameras can be used to reduce the number of wide-angle cameras.
  • the two field of view are greater than 180° ° fisheye camera stitching can generate 360 ° panoramic video.
  • the image mosaic algorithm in the related art is mainly applied to a single
  • the quality of the stitched images obtained is unstable, that is, the adaptability to different fisheye cameras is poor
  • the image stitching algorithm to be stitched When multiple videos are stitched together, the stitching area of the front and rear frame images is likely to be quite different, and the stitched video will shake during playback, which affects the stitching effect of the panoramic video.
  • an embodiment of the present disclosure provides a video image seam search method, a video image splicing method and a device, this technology can be applied to multiple video splicing applications, and this technology can use corresponding software and hardware Implementation, the following describes the embodiments of the present disclosure in detail.
  • FIG. 1 An example electronic system 100 for implementing the video image seam search method, the video image splicing method and the device according to the embodiments of the present disclosure will be described with reference to FIG. 1 .
  • the electronic system 100 may include one or more processing devices 102, one or more storage devices 104, an input device 106, an output device 108, and one or more image acquisition devices 110, these components are interconnected via a bus system 112 and/or other forms of connection mechanisms (not shown). It should be noted that the components and structure of the electronic system 100 shown in FIG. 1 are only exemplary rather than limiting, and the electronic system may also have other components and structures as required.
  • the processing device 102 may be a gateway, or an intelligent terminal, or a device including a central processing unit (CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities, and can control other devices in the electronic system 100.
  • the data of the components are processed, and other components in the electronic system 100 can also be controlled to perform desired functions.
  • Storage device 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example.
  • Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like.
  • One or more computer program instructions can be stored on the computer-readable storage medium, and the processing device 102 can execute the program instructions to implement the client functions (implemented by the processing device) in the following embodiments of the present disclosure and/or other desired Function.
  • Various application programs and various data such as various data used and/or generated by the application programs, can also be stored in the computer-readable storage medium.
  • the input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
  • the output device 108 may output various information (eg, images or sounds) to the outside (eg, a user), and may include one or more of a display, a speaker, and the like.
  • the image capture device 110 can capture preview video frames or image data, and store the captured preview video frames or image data in the storage device 104 for use by other components.
  • each device in the example electronic system for realizing the seam search method for video images, the stitching method for video images and the device according to the embodiments of the present disclosure may be integrated or distributed, such as the processing device 102 , the storage device 104, the input device 106 and the output device 108 are integrated into one body, and the image acquisition device 110 is arranged at a designated position where the target image can be collected.
  • the electronic system can be realized as an intelligent terminal such as a camera, a smart phone, a tablet computer, a computer, and a vehicle-mounted terminal.
  • This embodiment provides a patchwork search method for video images, as shown in Figure 2, the method may include the following steps:
  • Step S202 obtaining an energy map of each frame of video image in the first video; wherein, the energy map is used to indicate the location area and edge of a specified object in the video image.
  • the above-mentioned first video can be a video collected by a device such as a camera or a camera, for example, it can be a video collected by a wide-angle camera or a fisheye camera, etc.;
  • the above-mentioned energy map can be expressed in a form such as a grayscale image, for example, if The energy map is a representation of a grayscale image.
  • the higher the gray value of a pixel it usually means that the energy of the pixel is greater, and the corresponding energy value is higher.
  • the gray value of the pixel The lower the value, the lower the energy of the pixel, and the lower the corresponding energy value.
  • the energy value in the energy map can be expressed in a normalized manner, that is, the energy value can be a value between 0 and 1, and 0 is used. energy distribution to 1;
  • the specified object can be any object in the video image, for example, the specified object can be a person or animal in the video image;
  • the above-mentioned position area can be understood as the occupied area of the specified object in the video image
  • the above-mentioned edge can be understood as the outer edge contour corresponding to the specified object in the video image.
  • the energy map corresponding to each frame of video image in the first video when it is necessary to search for the patchwork of video images, it is usually necessary to obtain the energy map corresponding to each frame of video image in the first video, and the energy map of each frame of video image can indicate that in the frame of video image, Specifies the location area and edge outline where the object is located.
  • Step S204 for the first frame of video image in the first video, based on the energy map of the first frame of video image, determine the patchwork search result of the first frame of video image; wherein, the patchwork search result includes: the difference between the video image and the target image patchwork area; the target image is a video image corresponding to the video image in the second video.
  • the above-mentioned second video can be a video collected by devices such as a camera or a camera, for example, it can be a video collected by a wide-angle camera or a fisheye camera, etc.; , the area corresponding to the corresponding splicing seam line; the video image in the above-mentioned first video and the video image in the second video may have a one-to-one correspondence; for example, two fisheye cameras shoot two videos at the same time in the same scene , that is, the above-mentioned first video and second video, the fields of view of the two fisheye cameras are different, in the second video, the video image corresponding to the first frame video image of the first video is the first frame video of the second video Image, the first frame video image of the second video corresponds to the above-mentioned target image; in actual implementation, for the first frame video image in the first video, it can be determined first that the second video is the same as the first video image The target image corresponding to the first frame of video image, and then based
  • Step S206 for each frame of video image in the first video except the first frame, based on the patchwork search result of the previous frame video image of the current video image, determine the range of the patchwork search area of the current video image; Within the area range, the patchwork search result of the current video image is determined based on the energy map of the current video image.
  • the range of the above-mentioned patchwork search area can be understood as the restricted range of the search patchwork search results; in actual implementation, for each frame of video image in the first video except the first frame of video image, it can be based on the current frame
  • the patchwork search result of the previous frame video image of the video image constrains the corresponding patchwork search area range for the patchwork search result of the current frame video image, within the patchwork search area range, based on the energy of the current frame video image
  • the figure determines the patchwork search result of the current frame video image; for example, based on the patchwork search result of the previous frame video image, a preset constraint condition can be added to constrain the patchwork search range of the current frame video image, within the determined Within the patchwork search range, based on the energy map of the current frame video image, the patchwork area between the current frame video image and the corresponding target image can be searched from relatively static areas with low energy.
  • the patchwork search method of the above-mentioned video image first obtains the energy map of each frame of video image in the first video; then, for the first frame of video image in the first video, based on the energy map of the first frame of video image, determine the The patchwork search result of image; For each frame video image except the first frame in the first video, based on the patchwork search result of the previous frame video image of the current video image, determine the patchwork search area range of the current video image; Within the range of the patchwork search area, determine the patchwork search result of the current video image based on the energy map of the current video image.
  • This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area.
  • This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
  • This embodiment provides another method for patchwork search of video images, which is implemented on the basis of the methods of the above-mentioned embodiments, as shown in Figure 3, the method may include the following steps:
  • Step S302 acquiring the salient object energy map, the moving object energy map and the edge energy map of each frame of the video image in the first video.
  • the above-mentioned saliency target energy map may indicate the location area of the first object specified in the video image, that is, the area occupied by the first object in the video image, and the first object is usually the most eye-catching object in the video image , can also be understood as the salient target object, focused object or subject in the video image, for example, in a video image containing a character, the character is usually the first object in the video image, etc.; the salient target energy map In , the energy value of the location region of the first object is usually relatively high.
  • the above-mentioned moving object energy map may indicate the location area of the second object specified in the video image, that is, the area occupied by the second object in the video image, the second object is usually a moving object in the video image, for example, a video In the moving vehicle, etc. contained in the image; in the moving object energy map, the energy value of the location area of the second object is usually relatively high.
  • the above edge energy map may indicate the edge of the third object specified in the video image, that is, the outer edge contour corresponding to the third object, the third object generally includes the first object and the second object, and may also include all Other objects included, etc.; in the edge energy map, the energy value of the edge of the third object is usually relatively high.
  • this step S302 can be implemented through the following steps 1 to 3:
  • Step 1 For each frame of video image in the first video, input the video image into a preset neural network model, so as to output a saliency target energy map of the frame of video image through the preset neural network model.
  • the above preset neural network model can be realized by various convolutional neural networks, such as residual network, VGG network, etc.
  • the preset neural network model can be a convolutional neural network model of any size, for example, it can be resnet34_05x, etc.;
  • each frame of video image can be input into the preset neural network, through which The preset neural network can detect the specified first object in each frame of video image, and output a saliency target energy map of each frame of video image to indicate the position area of the specified first object in each frame of video image.
  • other methods may also be used to determine the saliency target energy map of each frame of video image, for details, reference may be made to implementation methods in related technologies, which will not be repeated here.
  • Step 2 based on the moving target in the frame of video image, determine the moving target energy map of the frame of video image.
  • the energy map of the moving object of each frame of video image can be determined by means of optical flow calculation, for example, for each frame of video image in the first video except the first frame , the moving target can be determined based on the current frame video image and the previous frame video image, and then based on the determined moving target, the moving target energy map of the current frame video image can be determined; wherein, the detection of the moving target can be understood as image
  • the detection of the moving target can be understood as image
  • Step 3 Perform edge detection on each object in the frame of video image to obtain an edge energy map of the frame of video image.
  • edge detection is to find a collection of pixels with sharp changes in brightness in the video image, which often appear as contours.
  • optical flow calculation and other methods can be used to perform edge detection on each object contained in each frame of video image, determine the outline of each object, and obtain the edge energy map of each frame of video image.
  • other methods may also be used to determine the edge energy map of each frame of video image, for details, reference may be made to implementation methods in related technologies, which will not be repeated here.
  • Step S304 for each frame of video image, fusing the corresponding salient object energy map, moving object energy map and edge energy map of the frame of video image to obtain the energy map of the frame of video image.
  • the fused energy map contains the position area and edge of the salient target object in the frame video image, the position area and edge of the moving target, and the edges of other objects.
  • Step S306 for the first frame of video image in the first video, based on the energy map of the first frame of video image, using a dynamic programming algorithm to calculate the patchwork search result of the first frame of video image; wherein, the patchwork search result includes: video The patchwork area between the image and the target image; the target image is a video image corresponding to the video image in the second video.
  • the above dynamic programming algorithm can solve the problem recursively by splitting the problem and defining the relationship between the problem state and the state; it also decomposes the problem to be solved into several sub-problems, and solves the sub-problems in order.
  • the solution of a sub-problem provides useful information for the solution of the latter sub-problem; when solving any sub-problem, list various possible local solutions, keep those local solutions that are likely to be optimal through decision-making, and discard Other local solutions; each sub-problem is solved in turn, and the last sub-problem is the solution of the initial problem; in this embodiment, for the first frame video image in the first video, based on the energy map of the first frame video image, Through the dynamic programming algorithm, a plurality of patchwork search results of the first frame video image are searched out, and the optimal patchwork search result is selected from the multiple patchwork search results, and the optimal patchwork search result is usually From a relatively static background with low energy, the seam between the first frame of video image and the corresponding target image searched may be irregular in shape, and the optimal
  • Step S308 for each frame of video image in the first video except the first frame, on the basis of the patchwork search result of the previous frame of the current video image, add preset constraints to determine the patchwork of the current video image The range of the seam search area.
  • the above preset constraints can be set according to actual needs. For example, for each frame of video image in the first video except the first frame, the constraints can be based on the search results of the previous frame, constraining the upper and lower edges
  • the gap is less than 50 pixels, etc., that is, the gap between the patchwork search result of the current frame video image and the upper and lower edges of the previous frame patchwork search result cannot exceed 50 pixels, and the patchwork of the current frame video image is determined based on this constraint search range.
  • Step S310 within the range of the patchwork search area, based on the energy map of the current video image, a dynamic programming algorithm is used to determine the patchwork search result of the current video image.
  • a dynamic programming algorithm can be used to calculate the current frame video image
  • the patchwork search result usually avoids the position area and edge of the salient target object contained in the current frame video image, the position area and edge of the moving target, and the edge of other objects; usually from In a relatively static background with low energy, the seam between the searched current frame video image and the corresponding target image.
  • the method of searching based on single-frame video images in step S306 can be used to determine the collage of each frame of video images in such video images. search results.
  • the patchwork search method for the above video image first obtains the salient target energy map, moving target energy map and edge energy map of each frame of video image in the first video;
  • the energy map of the sexual target, the energy map of the moving target and the edge energy map are obtained to obtain the energy map of the frame video image; then, for the first frame video image in the first video, based on the energy map of the first frame video image, a dynamic programming algorithm is used, Calculate the patchwork search result of the first frame of video image; for each frame of video image in the first video except the first frame, on the basis of the patchwork search result of the previous frame of the current video image, add a preset Constraints to determine the range of the patchwork search area for the current video image.
  • a dynamic programming algorithm is used to determine the patchwork search result of the current video image.
  • This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area.
  • This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
  • This embodiment provides yet another method for patchwork search of a video image, which is implemented on the basis of the method in the above-mentioned embodiment.
  • each frame of the video image in the first video, each frame of the video image has a corresponding target image of the video image.
  • the overlapping area, the area corresponding to the overlapping area in the video image is the first overlapping area
  • the corresponding area in the target image is the second overlapping area; in actual implementation, each frame of video image in the first video, and the second video
  • There is an overlapping area between the target images corresponding to , and the above-mentioned first overlapping area can be understood as, in each frame of video image in the first video, the area corresponding to a part of the pictures in the overlapping area; the above-mentioned second overlapping area can be understood
  • the method may include the following steps:
  • Step S402 acquiring an energy map of each frame of video image in the first video; wherein, the energy map is used to indicate the location area and edge of a specified object in the video image.
  • Step S404 for the first frame of video image in the first video, based on the energy map of the first frame of video image, determine the patchwork search result of the first frame of video image; wherein, the patchwork search result includes: the video image and the target image patchwork area; the target image is a video image corresponding to the video image in the second video.
  • Step S406 for each frame of video image in the first video except the first frame, based on the patchwork search result of the previous frame of the current video image, determine the range of the patchwork search area of the current video image; Within the area range, the patchwork search result of the current video image is determined based on the energy map of the current video image.
  • Step S408 for each frame of video image, input the image corresponding to the first overlapping area in the frame of video image and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image to the pre-trained neural network model , the seam prediction result of the frame of video image is obtained; wherein, the seam prediction result includes: the seam prediction area of the video image and the corresponding target image.
  • the above-mentioned pre-trained neural network model can be realized by a variety of convolutional neural networks, such as Unet network, residual network, VGG network, etc.
  • the preset neural network model can be a convolutional neural network model of any size, for example, it can be It is resnet34_05x, etc.; in actual implementation, based on the above execution process, the dynamic programming algorithm can be used to determine the seam search results of each frame of the video image in the first video, but the dynamic programming algorithm takes a long time and the execution efficiency is low , therefore, the above patchwork search method can be distilled based on the neural network, and the hardware acceleration can be realized through the neural network, the processing speed can be accelerated, and the processing efficiency can be improved.
  • the image corresponding to the first overlapping area in the frame of video image and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image can be input to the pre-trained In the neural network model, through the pre-trained neural network model, the patchwork prediction result of the frame video image and the corresponding target image can be obtained, and the patchwork prediction result includes the frame of video image and the corresponding target image. Seam prediction area.
  • the above-mentioned pre-trained neural network model is determined through the following steps 4 to 9:
  • Step 4 Obtain training samples containing consecutive sets of image pairs to be stitched, and seam search results for each set of image pairs to be stitched.
  • the above-mentioned image pairs to be spliced usually include two images to be spliced; the above-mentioned training samples can also be called multi-frame data sets, which usually contain multiple consecutive groups of image pairs to be spliced; specifically, the following methods can be used to construct the training Sample: First obtain a single image, which can be represented by overlap_left (overlap_left), perform geometrictransform (geometric transformation) on the single image to obtain an image to be stitched paired with the single image, and the image to be stitched can be Indicated by overlap_right (overlap_right), the single image and the image to be stitched form a group of image pairs to be stitched; then a sequence of pictures is generated based on geometrictransform, that is, multiple sets of image pairs to be stitched are generated to simulate a video scene, and multiple groups to be stitched
  • the overlap_right and overlap_left in the image pair can be distinguished by number.
  • the above geometrictransform usually includes random translation or rotation operations on images, etc., and each image to be stitched in the generated multiple sets of image pairs to be stitched may have black borders.
  • each image to be stitched in the generated multiple sets of image pairs to be stitched may have black borders.
  • the patchwork search result of the stitched image pair can be determined by the method in the above-mentioned related steps, and the patchwork search result of each group of image pairs to be stitched is used as GT (Ground Truth, representing the classification accuracy of the training set of supervised learning, used for Prove or overthrow a hypothesis) train the initial neural network to realize the prediction of the patchwork area of each frame of video image.
  • GT Round Truth
  • Step 5 obtaining a preset patchwork template; wherein, the preset patchwork template includes a preset patchwork area.
  • the preset patchwork area in the above preset patchwork template can be set as a patchwork mask whose left half is 1 and right half is 0, or can be set as a patchwork mask with all 1s; wherein, the patchwork mask 0 means black, and the patchwork mask is 1 means white; in actual implementation, the first group of image pairs to be stitched among multiple groups of image pairs to be stitched can be understood as the first frame overlap_right_1 and the first frame in the simulated video overlap_left_1, for the first group of image pairs to be stitched, it is usually necessary to provide a preset stitching template including a preset stitching area for the first set of image pairs to be stitched.
  • Step 6 for the first group of image pairs to be stitched, input the first group of image pairs to be stitched and the preset stitching template into the initial neural network model, so as to output the stitching of the first group of image pairs to be stitched through the initial neural network model Seam prediction results.
  • the above-mentioned initial neural network model can be realized through various convolutional neural networks, such as residual network, VGG network, etc.; in actual implementation, refer to the schematic diagram of a neural network model training process shown in Figure 5, for the first group to be spliced
  • NN Neurological Network, class Neural network
  • the seam search result of the first group of image pairs to be stitched is used as GT, and seam_mask_left_1 is output, which is the seam prediction result of the first group of image pairs to be stitched;
  • the seam prediction result includes: the patchwork prediction areas of overlap_right_1 and overlap_left_1 in the image pair to be stitched in the first group.
  • Step 7 For each group of image pairs to be stitched except the first group of image pairs to be stitched, input the seam prediction results of the group of image pairs to be stitched and the adjacent previous group of image pairs to be stitched into the initial neural network In the model, the patchwork prediction results of the group of image pairs to be stitched are output through the initial neural network model.
  • each group of image pairs to be spliced except the first group of image pairs to be spliced above can be understood as including every frame overlap_right except the first frame overlap_right_1 in the analog video, and every frame except the first frame overlap_left_1 Frame overlap_left; in actual implementation, for each group of image pairs to be stitched except the first group of image pairs to be stitched, the overlap_right and overlap_left in the current group of image pairs to be stitched, and the adjacent previous group
  • the seam prediction results of the image pairs to be stitched are input into the initial neural network model, and the seam search results of the image pairs to be stitched in the current group are used as GT, and the seam prediction results of the image pairs to be stitched in the current group are output;
  • the seam prediction results include: the seam prediction areas of overlap_right and overlap_left in the image pair to be stitched in the current group;
  • the seam prediction result of the stitched image pair is input to NN, that is, the initial neural network model, and the
  • Step 8 Calculate the loss value of the seam prediction result of the group of image pairs to be stitched based on the seam search results of the group of image pairs to be stitched and a preset loss function.
  • the above loss function can be used to evaluate the degree of inconsistency between the seam prediction results output by the initial neural network model and the corresponding real seam search results, and the degree of inconsistency can be represented by the above loss value, and the loss function can be represented by Loss;
  • the design method of the above loss function can be as follows:
  • loss_continue Max(L1(mask_cur-mask_prev),margin);
  • loss_continue indicates the continuity loss between the output of the current group of image pairs to be stitched and the output of the previous group of image pairs to be stitched
  • scale indicates the weight ratio of the continuity loss, generally set to 0.1
  • loss_gt indicates the current group of image pairs to be stitched
  • mask_cur represents the patchwork mask obtained by predicting the current group of image pairs to be stitched, that is, the patchwork prediction result obtained by predicting the current group of image pairs to be stitched
  • mask_prev Indicates the patchwork mask obtained by predicting the previous group of image pairs to be stitched, that is, the patchwork prediction result obtained by predicting the previous group of image pairs to be stitched
  • margin is the allowable threshold of inconsistency, generally set to 0.2
  • mask_gt is the predicted true seam value of the current group of image pairs to be stitched, and the predicted true seam value is the real seam search result of the current group of image pairs to be
  • Step 9 Update the weight parameters of the initial neural network model based on the loss value; continue to perform the step of obtaining training samples containing multiple consecutive sets of image pairs to be stitched until the initial neural network model converges to obtain the neural network model.
  • the above weight parameters may include all parameters in the initial neural network model, such as convolution kernel parameters, etc.
  • the patchwork search method of the above-mentioned video image for each frame of video image, the image corresponding to the first overlapping area in the frame of video image, and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image, input to the pre-set
  • the patchwork prediction result of the frame video image is obtained; and the training process of the neural network model is also disclosed, and the patchwork area is predicted through the neural network, which can speed up the prediction of the patchwork area processing speed and improve processing efficiency.
  • This embodiment provides a method for splicing video images, as shown in Figure 6, the method includes the following steps:
  • Step S602 acquiring the first fisheye video and the second fisheye video; wherein, the fisheye video images in the first fisheye video and the second fisheye video have overlapping areas.
  • the fisheye lens is a special camera lens with a short focal length and a large field of view. Its field of view can be close to or exceed 180 degrees;
  • the first fisheye video, the second fisheye video is shot through the second fisheye lens, the field of view corresponding to the first fisheye lens and the second fisheye lens are usually different, and the first fisheye video and the second fisheye video have Overlapping fields of view, since the first fisheye lens and the second fisheye lens are shot at the same time, therefore, the fisheye video images in the corresponding first fisheye video and the second fisheye video usually have an overlapping area, for example, the first fisheye video
  • the first frame fisheye video image in a fisheye video and the first frame fisheye video image in the second fisheye video have an overlapping area
  • the second fisheye video image in the eye video has overlapping regions, and so on.
  • Step S604 extracting the first target area of each frame of fisheye video image in the first fisheye video, and the second target area of each frame of fisheye video image in the second fisheye video.
  • the above-mentioned target area can be understood as the effective area in the fisheye video image, that is, the area containing the object to be photographed; in actual implementation, the original fisheye video image collected by the fisheye lens is generally square or rectangular. There is a circular area in the rectangular area, which corresponds to the effective area of the fisheye video image. The area other than the circular area in the square or rectangular area is usually a black background area.
  • the first target area corresponding to the circular area of each frame of fisheye video image in the first fisheye video it is necessary to extract the first target area corresponding to the circular area of each frame of fisheye video image in the first fisheye video, and extract the circular area of each frame of fisheye video image in the second fisheye video
  • the corresponding second target area; specifically, the preset pixel value can be used as the threshold to filter the black background area in the fisheye video image, for example, the pixel value of 20 can be used as the threshold, etc., to extract the circle of the fisheye video image shape effective area.
  • Step S606 for the two corresponding fisheye video images, based on the first target area and the second target area corresponding to the two fisheye video images, and the pre-acquired update expansion parameter value, determine The first equidistant projection picture after the frame of fisheye video image is expanded, and the second equidistant projection picture of the frame of fisheye video image in the second fisheye video after expansion.
  • the above-mentioned equidistant projection picture can also be called equidistant cylindrical projection picture;
  • the above-mentioned update expansion parameters usually include fisheye lens parameters related to fisheye video image expansion, for example, can include field angle parameter values, related parameters of optical center value or fisheye rotation angle parameter value, etc.; in actual implementation, the way to expand the fisheye video image into an equidistant projection picture is usually as follows: first, extract the target area, that is, the fisheye video after extracting the circular effective area The image is converted from two-dimensional fisheye coordinates to three-dimensional coordinates, and then the spherical coordinates of the fisheye video image after extracting the circular effective area are expanded according to latitude and longitude mapping, and the expanded equidistant projection picture corresponding to the fisheye video image is obtained .
  • the Normalized fisheye coordinates corresponding to the target area of the fisheye video image In the unified fisheye coordinates), the pixel point whose coordinate position is (x, y) is taken as an example, the distance between the pixel point whose coordinate position is (x, y) and the origin in the coordinate system is r, and the distance between the pixel point and the x-axis direction
  • the fisheye video image after extracting the target area can be input into the corresponding equidistant projection expansion module, and the equidistant projection image can be output.
  • the above conversion formula needs to use the fisheye lens field of view aperture Parameters, that is, after the field of view of the fisheye lens is known, it can be expanded according to the above conversion formula; the value of r in the above formula can be determined according to the coordinate value of (x, y), for details, refer to the determination method in related technologies , which will not be repeated here.
  • the above expansion method can calculate the remap parameter of the image transformation, that is, the mapping relationship of the coordinates expressed by the above conversion formula.
  • block warp Since the remap parameter is processed pixel by pixel serially, the processing process is time-consuming and the processing efficiency is relatively low.
  • the processing of block warp can improve the efficiency of hardware implementation. Among them, block warp is to block the target area of fisheye video image, and can process the obtained multiple block small images in parallel, so the processing efficiency can be improved. .
  • the corresponding expansion parameters such as the field of view parameter value and the related parameters of the optical center. value or fisheye rotation angle parameter value, etc. are updated to obtain updated expansion parameters; then for two corresponding fisheye video images, based on the first target area and the second target area corresponding to the two frames of fisheye video images respectively, and
  • the updated expansion parameters refer to the above-mentioned equidistant projection picture expansion method, and expand the first target area into the first isometric projection picture, and expand the second target area into the second isometric projection picture.
  • Step S608 based on the corresponding first equidistant projection picture and the second equidistant projection picture, determine the patchwork search result; wherein, the patchwork search result is determined by the method in the foregoing embodiments.
  • the corresponding patchwork search result can be determined based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other ; For example, based on the first equidistant projection picture after the expansion of the first frame of fisheye video image in the first fisheye video, and the second equidistant projection picture after the expansion of the first frame of fisheye video image in the second fisheye video.
  • the seam search results of the first equidistant projection picture and the second equidistant projection picture may be determined according to the solutions for determining the seam search results in the foregoing embodiments.
  • Step S610 based on the seam search results corresponding to each group of two corresponding fisheye video images, determine the video stitching results of the video images.
  • the seam search results corresponding to each group of two corresponding fisheye video images can be determined according to the above steps, and the video stitching results of the video images can be determined based on the obtained multiple seam search results.
  • the splicing method of the above-mentioned video images first obtains the first fisheye video and the second fisheye video; The second target area of the eye video image. Then, for the corresponding two frames of fisheye video images, based on the first target area and the second target area corresponding to the two frames of fisheye video images, and the pre-acquired update expansion parameter value, determine the frame in the first fisheye video The first equidistant projection picture after the fisheye video image is expanded, and the second equidistant projection picture after the frame of the fisheye video image in the second fisheye video is expanded.
  • the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area.
  • This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
  • This embodiment provides another method for splicing video images, which is implemented on the basis of the method in the above-mentioned embodiment, and the method includes the following steps:
  • Step 802 acquire the first fisheye video and the second fisheye video; wherein, the fisheye video images in the first fisheye video and the second fisheye video have overlapping areas.
  • Step 804 extract the first target area of each frame of fisheye video image in the first fisheye video, and the second target area of each frame of fisheye video image in the second fisheye video.
  • Step 806 for the two corresponding fisheye video images, based on the first target area and the second target area corresponding to the two fisheye video images, and the pre-acquired update expansion parameter value, determine The first equidistant projection picture after the frame of fisheye video image is expanded, and the second equidistant projection picture of the frame of fisheye video image in the second fisheye video after expansion.
  • the above update expansion parameter value includes: field angle parameter value, parameter value of optical center in x-axis direction, parameter value of optical center in y-axis direction and fisheye rotation angle parameter value;
  • the field angle parameter value can be fisheye
  • the combination of the known standard angle of view of the lens and the offset of the angle of view, where the offset of the angle of view can be represented by aperture_shift, for example, the known standard angle of view of the fisheye lens is 190 degrees, because the fisheye Due to the manufacturing and installation process errors of the camera, the actual field of view may be shifted, and the estimated field of view shift is +5 degrees, so the value of the field of view parameter is 190+5 195 degrees; the above light
  • the parameter value of the center in the x-axis direction and the parameter value of the optical center in the y-axis direction can be understood as the coordinates of the optical center in the x-axis direction and the y-axis direction after considering the optical center offset, where the optical center is in the x-axis
  • Step eleven obtaining the initial expansion parameter value and preset offset range of each expansion parameter.
  • the above-mentioned initial expansion parameter values usually include the initial parameter value of the field of view, the initial parameter value of the optical center in the x-axis direction, the initial parameter value of the optical center in the y-axis direction, and the initial parameter value of the fisheye rotation angle.
  • the initial expansion parameter value can usually be Obtained from the camera parameters given by the fisheye camera; the above preset offset range usually includes the offset range corresponding to each expansion parameter, and the preset offset range is usually estimated by the technician for each expansion parameter based on empirical values
  • Each expansion parameter corresponds to the initial expansion parameter value.
  • the offset range should not be too large.
  • the initial expansion parameter value of the field of view is 190 degrees.
  • the corresponding preset offset range is ⁇ 10 degrees or the like.
  • Step 12 Sampling each expansion parameter based on the initial expansion parameter value and preset offset range of each expansion parameter to obtain the sampling value of each expansion parameter, and determining the first A third equidistant projection picture of the frame of fisheye video image in the fisheye video after expansion, and a fourth equidistant projection picture of the frame of fisheye video image in the second fisheye video after expansion.
  • each expansion parameter takes a sampling value to form a set of sampling values.
  • the offset of each expansion parameter has two optional values, respectively is 0 and 1, then there are 24 combinations of the sampling values of the four expansion parameters, that is, there are 16 combinations, and each combination corresponds to a set of sampling values; based on each set of sampling values, the first fisheye video In the second fisheye video, each frame of the fisheye video image of the extracted target area is expanded to obtain the third expanded equidistant projection picture, and the second fisheye video is expanded to obtain the fisheye video image of each frame of the extracted target area Unfolded image of the fourth isometric projection.
  • Step 13 extracting a fourth overlapping region of the third equidistant projection picture and the fourth equidistant projection picture.
  • a fourth overlapping area is extracted from the expanded corresponding third equidistant projection picture and fourth equidistant projection picture, and the fourth overlapping area generally includes: the fourth overlapping area corresponding to the third equidistant projection picture A partial picture area, and a corresponding partial picture area in the fourth equidistant projection picture.
  • Step fourteen perform cross-correlation calculation on the fourth overlapping area to obtain the first cross-correlation calculation result.
  • the partial picture area corresponding to the third equidistant projection picture contained in the fourth overlapping area and the corresponding partial picture area in the fourth equidistant projection picture can be A cross-correlation calculation is performed to obtain a first cross-correlation calculation result, and the first cross-correlation calculation result can reflect a measure of the similarity between the two partial picture regions.
  • Figure 8 includes two signals f and g, f*g represents the cross-correlation settlement result of the two signals, it can be seen that the larger the data of the cross-correlation calculation result, usually Indicates that the two partial image regions are more similar, and conversely, the smaller the value of the cross-correlation calculation result, usually indicates that the two partial image regions are less similar.
  • Step fifteen based on the first cross-correlation calculation result and the preset number of iterations, determine and update the expansion parameter value.
  • the above-mentioned preset number of iterations can be set according to actual needs.
  • the preset number of iterations can be set to 10,000 times. In actual implementation, it can be based on the above-mentioned first cross-correlation calculation results and the preset preset iterations The number of times to determine the above-mentioned update expansion parameters.
  • this step fifteen can be achieved through the following steps A to C:
  • step A the step of sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter is repeatedly performed according to a preset number of iterations to obtain a plurality of first cross-correlation calculation results.
  • the step of sampling each expansion parameter will be repeated 10,000 times based on the initial expansion parameter value and the preset offset range of each expansion parameter, so that 10,000 first cross-correlation calculation results are obtained.
  • Step B selecting the first cross-correlation calculation result with the largest value from the multiple first cross-correlation calculation results.
  • Step C determining the sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the largest value as the updated expansion parameter value.
  • the first cross-correlation calculation result with the largest value indicates the highest similarity between the two partial image regions, therefore, the corresponding sampling value of the expansion parameter is the most appropriate, and the sampling value of the expansion parameter can be determined as the above-mentioned updated expansion parameter value .
  • the above-mentioned offline adaptive deployment parameter optimization process can make the algorithm adaptively obtain the optimal deployment parameters of the binocular fisheye camera.
  • the offline calculation process only needs to be performed once.
  • Step 808 align the corresponding first equidistant projection picture and the second equidistant projection picture.
  • Step 20 extracting a first feature point from the first equidistant projection picture, and extracting a second feature point from the second equidistant projection picture.
  • the first feature point can be a corner point, an edge point, a bright spot in a dark area or a dark point in a bright area etc.
  • the above-mentioned second feature point can be extracted from the second equidistant projection picture
  • the extracted corner points, edge points, bright spots in dark areas or dark points in bright areas, etc.; in actual implementation, the above equidistant projection images can be extracted based on sift (Scale-Invariant Feature Transform, scale-invariant feature transformation) and other methods Feature points; Among them, sift is a description used in the field of image processing. This description is scale-invariant and can detect key points in the image. It is a local feature descriptor.
  • Step 21 Determine matching feature point pairs based on the first feature point and the second feature point.
  • the first feature point and the second feature point can be processed in a block manner.
  • img1 corresponds to the above-mentioned first equidistant projection picture
  • img2 corresponds to the above-mentioned second equidistant projection picture.
  • the specific method of block can be the left side of img1
  • the half part matches the feature points with the right half of img2
  • the right half of img1 matches the feature points with the left half of img2 to obtain the corresponding matching feature point pairs.
  • the alignment method can be to minimize the gap between the matching feature points of img1 and img2 in the w dimension, so that the overlapping areas after splicing can be aligned as much as possible. For example, it can be realized by calculating the Euclidean distance of the 128-dimensional key points of two groups of feature points. The smaller the Euclidean distance, the higher the similarity. When the Euclidean distance is smaller than the set threshold, it can be determined that the matching is successful.
  • Step 22 Align the first equidistant projection picture and the second equidistant projection picture based on the matching feature point pairs.
  • the first equidistant projection picture and the second equidistant projection picture are aligned.
  • Each corresponding first equidistant projection picture and second equidistant projection picture are dynamically aligned based on feature point matching.
  • Step 30 moving the second equidistant projection picture in a preset direction.
  • the above-mentioned preset direction may be to move the second equidistant projection picture laterally, for example, refer to a picture alignment schematic diagram shown in FIG. , for example, img2 can be moved to the left first, and then img2 can be moved to the right. During the moving process, the matching degree between img2 and img1 usually changes.
  • Step 31 During the moving process, multiple fifth overlapping regions of the first equidistant projection picture and the second equidistant projection picture are extracted.
  • the overlapping area is extracted by moving img2 in Figure 9.
  • the corresponding overlapping area can be estimated based on the aperture angle of view parameter. For example, for a binocular fisheye camera, if two fisheye lenses, each fisheye The aperture angle of view of the lens is 180 degrees, and the overlapping area is 0. If the aperture angle of each fisheye lens is greater than 180 degrees, it can be estimated based on the overlap angle of the aperture angles of the two fisheye lenses. corresponding overlapping regions.
  • multiple fifth overlapping regions of the first equidistant projection picture and the second equidistant projection picture may be extracted during the process of moving the second equidistant projection picture.
  • Step 32 Carry out cross-correlation calculations on the plurality of fifth overlapping areas respectively, to obtain multiple second cross-correlation calculation results.
  • Step 33 Align the first equidistant projection picture and the second equidistant projection picture based on the multiple second cross-correlation calculation results.
  • this step 33 can be realized through the following steps H to K:
  • Step H selecting the second cross-correlation calculation result with the largest value from the multiple second cross-correlation calculation results.
  • the second cross-correlation with the largest value can be selected from the calculated second cross-correlation calculation results. related calculation results.
  • Step 1 obtaining the position coordinates of the first boundary pixel point corresponding to the fifth overlapping area corresponding to the second cross-correlation calculation result with the largest value in the first equidistant projection picture, and the corresponding first boundary pixel point in the second equidistant projection picture The position coordinates of the two boundary pixels.
  • Step J calculating an affine transformation matrix based on the position coordinates of the first boundary pixel point and the position coordinates of the second boundary pixel point.
  • Step K aligning the first equidistant projection picture and the second equidistant projection picture based on the affine transformation matrix.
  • the alignment parameter can also be called an image boundary point
  • the image boundary point is the fifth overlapping area corresponding to the second cross-correlation calculation result with the largest value above at the first equidistant
  • the position coordinates of the pixel points are used to calculate the affine parameter of the second equidistant projection picture, wherein the affine parameter can also be called an affine transformation matrix.
  • the projection transformation of the second equidistant projection picture is An isometric projection picture is well aligned.
  • Step 810 extracting a third overlapping region based on the aligned first equidistant projection picture and the second equidistant projection picture.
  • the third overlapping area is extracted based on the aligned first equidistant projection picture and the second equidistant projection picture. Specifically, the third overlapping area can be extracted based on feature point matching. For example, after the matching feature point pair is determined, The first equidistant projection picture and the second equidistant projection picture can be combined according to matching feature point pairs, and the third overlapping region can be extracted; the third overlapping region can also be determined based on aperture prior information, for example, for double For fisheye cameras, if the aperture angle of each fisheye lens is 180 degrees among the two fisheye lenses, the overlapping area is 0. If the aperture angle of each fisheye lens is greater than 180 degrees, The corresponding third overlapping area can be estimated according to the overlapping angle of the aperture angles of view of the two fisheye lenses.
  • Step 812 Perform illumination compensation on the second equidistant projection picture based on the third overlapping area, so that the pixel value of each pixel in the second equidistant projection picture after illumination compensation is the same as that of the corresponding first equidistant projection picture Match the pixel value of each pixel in the .
  • light compensation is performed on the second equidistant projection picture.
  • a histogram matching method can be used to map the pixel value distribution of each pixel in the second equidistant projection picture to the A distribution of pixel values of each pixel in the equidistant projection image is similar; of course, other lighting compensation methods can also be used, for details, please refer to lighting compensation methods in related technologies, which will not be repeated here.
  • Step 814 based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation, determine a patchwork search result.
  • the first equidistant projection can be determined according to the scheme of determining the seam search result in the foregoing embodiment picture, and the corresponding seam search results of the second equidistant projection picture after illumination compensation.
  • Step 816 for each group of two corresponding fisheye video images, based on the patchwork search results corresponding to the two fisheye video images in the group, determine the fused overlapping area corresponding to the two fisheye video images in the group.
  • the two frames of fisheye video images in each group can be obtained through the fusion operation Fusion results of overlapping regions corresponding to video images.
  • Step 818 replace the fused overlapping area with the third overlapping area corresponding to the two frames of fisheye video images in the group, and obtain an image splicing result of the two frames of fisheye video images in the group.
  • the corresponding fused overlapping region can be used to replace the corresponding aligned third overlapping region, and the two frames of fisheye video images in the group can be obtained image stitching results.
  • the image mosaic results of two frames of fisheye video images in each group can also be obtained based on optical flow.
  • the first equidistant projection picture and the second equidistant projection after illumination compensation Based on the optical flow information of the overlapping area of the picture, the remap (remapping) transformation parameter of the second equidistant projection image after illumination compensation is calculated based on the optical flow information, and the second equidistant projection image after illumination compensation is calculated based on the calculated remap transformation parameter.
  • the optical flow information of the overlapping area corresponding to the projection picture is fused with the optical flow information of the overlapping area corresponding to the first equidistant projection picture, so as to realize the overlapping of the first equidistant projection picture and the second equidistant projection picture after illumination compensation
  • the remap transformation parameters can be incorporated into the remap parameters when the fisheye video image after extracting the target area is expanded into an equidistant projection image to reduce the amount of calculation; for light
  • the images fused in the stream mode can directly obtain the fused image, that is, the image stitching result.
  • Step 820 based on the image stitching results of the two frames of fisheye video images in each group, determine the video stitching results of the video images.
  • the image stitching results of the two frames of fisheye video images in each group are combined to form the video stitching results of the video images.
  • the mosaic method of the above-mentioned video images firstly aligns the corresponding first equidistant projection picture and the second equidistant projection picture; based on the aligned first equidistant projection picture and the second equidistant projection picture, a third overlapping area; then based on the third overlapping area, light compensation is performed on the second equidistant projection picture, so that the pixel value of each pixel in the second equidistant projection picture after light compensation is the same as that of the corresponding first equidistant projection picture Match the pixel value of each pixel in ; Based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation, determine the patchwork search result; Finally, for each group of two corresponding fisheye video images, Based on the patchwork search results corresponding to the two frames of fish-eye video images in the group, the fused overlapping areas corresponding to the two frames of fish-eye video images in the group are determined.
  • a video stitching result of the video images is determined.
  • This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area.
  • This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
  • the stitching method of the above video images can improve the splicing and alignment effect of the video images through the offline adaptive expansion parameter optimization process, and improve the adaptability to different binocular fisheye modules;
  • the dynamic fine alignment algorithm can improve the stitching algorithm for The stitching effect of scenes with different depths of field;
  • the method of seam prediction and fusion based on video images can improve the stability of video image stitching and improve the stitching effect of panoramic videos; moreover, the above method also uses neural network distillation and block warp instead The way of remap operation realizes hardware acceleration.
  • the device includes: a first acquisition module 100, which may be configured to acquire an energy map of each frame of the video image in the first video; wherein, The energy map is used to indicate the location area and edge of the specified object in the video image; the first determination module 101 may be configured to determine, based on the energy map of the first frame of video image, for the first frame of video image in the first video The patchwork search result of the first frame video image; Wherein, the patchwork search result includes: the patchwork area of the video image and the target image; The target image is the video image corresponding to the video image in the second video; the second determination module 102, It may be configured to determine the seam search area range of the current video image based on the seam search result of the previous frame of the video image of the current video image for each frame of video image except the first frame in the first video; Within the range of the patchwork search area, the patchwork search result of the current video image is determined based on the energy map of the current video image
  • the above-mentioned patchwork search device for video images first obtains the energy map of each frame of video images in the first video; then, for the first frame of video images in the first video, based on the energy map of the first frame of video images, determine the The patchwork search result of image; For each frame video image except the first frame in the first video, based on the patchwork search result of the previous frame video image of the current video image, determine the patchwork search area range of the current video image; Within the range of the patchwork search area, determine the patchwork search result of the current video image based on the energy map of the current video image.
  • the device determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then determines the patchwork search area range in the patchwork
  • the patchwork search result is determined within the search area.
  • the first obtaining module 100 may also be configured to: obtain the salient object energy map, the moving object energy map and the edge energy map of each frame of video image in the first video; for each frame of video image, fuse the The saliency target energy map, the moving target energy map and the edge energy map corresponding to the frame video image are obtained to obtain the energy map of the frame video image.
  • the first acquisition module 100 may also be configured to: for each frame of video image in the first video, input the video image into the preset neural network model, so as to output the frame through the preset neural network model.
  • the salient target energy map of the frame video image based on the moving target in the frame video image, determine the moving target energy map of the frame video image; edge detection is performed on each object in the frame video image to obtain the frame video image Edge energy map.
  • the first determination module 101 may also be configured to: for the first frame of video image in the first video, based on the energy map of the first frame of video image, using a dynamic programming algorithm to calculate the energy of the first frame of video image Patchwork search results.
  • the second determining module 102 may also be configured to: for each frame of video image in the first video except the first frame, based on the patchwork search result of the video image of the previous frame of the current video image Above, add preset constraint conditions to determine the patchwork search area of the current video image; within the patchwork search area, based on the energy map of the current video image, a dynamic programming algorithm is used to determine the patchwork search result of the current video image.
  • each frame of video image has an overlapping area with the target image corresponding to the video image, the area corresponding to the overlapping area in the video image is the first overlapping area, and the corresponding area in the target image is the second overlapping area.
  • overlapping area the device is also used for: for each frame of video image, the image corresponding to the first overlapping area in the frame of video image, and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image, input to the pre-set
  • a patchwork prediction result of the video image of the frame is obtained; wherein, the patchwork prediction result includes: a patchwork prediction area of the video image and the target image.
  • the device further includes a neural network model determination module, the neural network model determination module may be configured to: acquire training samples comprising multiple consecutive groups of image pairs to be stitched, and the stitching of each group of image pairs to be stitched Search results; for each group of image pairs to be stitched except the first group of image pairs to be stitched, input the seam prediction results of the group of image pairs to be stitched and the adjacent previous group of image pairs to be stitched into the initial neural network
  • the seam prediction results of the group of image pairs to be stitched are output through the initial neural network model; based on the seam search results of the group of image pairs to be stitched and the preset loss function, the stitching of the group of image pairs to be stitched is calculated.
  • the loss value of the seam prediction result update the weight parameters of the initial neural network model based on the loss value; continue to perform the step of obtaining training samples containing multiple consecutive groups of image pairs to be stitched until the initial neural network model converges, and the neural network model is obtained.
  • the neural network model determination module can also be configured to: obtain a preset patchwork template; wherein, the preset patchwork template includes a preset patchwork area; for the first group of image pairs to be stitched, the The first group of image pairs to be stitched and the preset patchwork template are input into the initial neural network model, so as to output the patchwork prediction results of the first group of image pairs to be stitched through the initial neural network model.
  • the device includes: a second acquisition module 110, which can be configured to acquire a first fisheye video and a second fisheye video; wherein, The fisheye video images in the first fisheye video and the second fisheye video have overlapping regions; the extraction module 111 can be configured to extract the first target region of each frame of fisheye video images in the first fisheye video, And the second target area of each frame of fisheye video image in the second fisheye video; the third determination module 112 may be configured to, for two frames of fisheye video images corresponding to each other, based on the correspondence between the two frames of fisheye video images The first target area and the second target area, and the pre-acquired update expansion parameter value, determine the first equidistant projection picture after the frame of the fisheye video image in the first fisheye video is expanded, and the second fisheye video The second equidistant projection picture after the frame of the fisheye video image is expanded; the fourth determination module 113 may be configured to
  • the splicing device of the above-mentioned video images first acquires the first fisheye video and the second fisheye video; The second target area of the eye video image. Then, for the corresponding two frames of fisheye video images, based on the first target area and the second target area corresponding to the two frames of fisheye video images, and the pre-acquired update expansion parameter value, determine the frame in the first fisheye video
  • the first equidistant projection picture after the fisheye video image is expanded
  • the second equidistant projection picture after the frame of the fisheye video image in the second fisheye video is expanded.
  • the device determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then determines the patchwork search area range in the patchwork
  • the patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
  • the fourth determination module 113 may also be configured to: align the corresponding first isometric projection picture and the second isometric projection picture; based on the aligned first isometric projection picture and the second The equidistant projection picture extracts the third overlapping area; based on the third overlapping area, light compensation is performed on the second equidistant projection picture, so that the pixel value of each pixel in the second equidistant projection picture after light compensation is the same as The pixel values of each pixel in the corresponding first equidistant projection picture are matched; based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation, a patchwork search result is determined.
  • the device also includes a parameter value determination module, and updating the expansion parameter value includes: field angle parameter value, parameter value of the optical center in the x-axis direction, parameter value of the optical center in the y-axis direction, and fisheye rotation angle parameter value;
  • the parameter value determination module may be configured to: obtain the initial expansion parameter value and the preset offset range of each expansion parameter; based on the initial expansion parameter value and the preset offset range of each expansion parameter, for each Sampling by expanding the parameters to obtain the sampling value of each expansion parameter, based on the sampling value of each expansion parameter, determine the third equidistant projection picture after the expansion of the frame of the fisheye video image in the first fisheye video, and the second fisheye video The fourth equidistant projection picture after the frame fisheye video image in the eye video is expanded; extract the fourth overlapping area of the third equidistant projection picture and the fourth equidistant projection picture; perform cross-correlation calculation on the fourth overlapping area, and obtain The first cross-correlation calculation result
  • the parameter value determination module may also be configured to: repeatedly perform the step of sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter according to a preset number of iterations , to obtain a plurality of first cross-correlation calculation results; from the plurality of first cross-correlation calculation results, select the first cross-correlation calculation result with the largest value; the sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the largest value OK to update the expanded parameter value.
  • the fourth determining module 113 may also be configured to: extract the first feature point from the first equidistant projection picture, and extract the second feature point from the second equidistant projection picture; based on the first feature point and the second feature point to determine a matching feature point pair; based on the matching feature point pair, align the first equidistant projection picture and the second equidistant projection picture.
  • the fourth determining module 113 may also be configured to: move the second isometric projection picture in a preset direction; during the moving process, extract the information of the first isometric projection picture and the second isometric projection picture A plurality of fifth overlapping areas; cross-correlation calculations are performed on the plurality of fifth overlapping areas to obtain a plurality of second cross-correlation calculation results; based on the plurality of second cross-correlation calculation results, the first equidistant projection picture and the second Isometric projection pictures for alignment.
  • the fourth determination module 113 may also be configured to: select the second cross-correlation calculation result with the largest value from the multiple second cross-correlation calculation results; obtain the second cross-correlation calculation result corresponding to the maximum value The position coordinates of the corresponding first boundary pixel point in the first equidistant projection picture of the fifth overlapping region of , and the position coordinates of the corresponding second boundary pixel point in the second equidistant projection picture; based on the first boundary pixel point and the position coordinates of the second boundary pixel point to calculate an affine transformation matrix; based on the affine transformation matrix, the first equidistant projection picture and the second equidistant projection picture are aligned.
  • the fourth determination module 113 may also be configured to: for each group of two corresponding fisheye video images, based on the patchwork search results corresponding to the two fisheye video images in the group, determine the group The fused overlapping area corresponding to the two frames of fisheye video images in the group; replace the fused overlapping area with the third overlapping area corresponding to the two frames of fisheye video images in this group, and obtain the image stitching result of the two frames of fisheye video images in this group ; Based on the image stitching results of the two frames of fisheye video images in each group, determine the video stitching results of the video images.
  • the video image splicing device provided by the embodiment of the present disclosure has the same realization principle and technical effect as the aforementioned embodiment of the video image splicing method.
  • the part of the video image splicing device that is not mentioned in the embodiment is as follows: Reference may be made to the corresponding content in the aforementioned video image splicing method embodiments.
  • This embodiment provides an electronic system, which is characterized in that the electronic system includes: an image acquisition device, a processing device, and a storage device; the image acquisition device is used to obtain a preview video frame or image data; a computer program is stored on the storage device, When the computer program is run by the processing device, it executes the above-mentioned seam search method for video images, or the above-mentioned stitching method for video images.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and is characterized in that, when the computer program is run by the processing device, the steps of the above-mentioned patchwork search method for video images are executed, Or, the steps of the above video image splicing method.
  • the computer program product of the video image patchwork search method, the video image splicing method and the device provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the above-mentioned
  • the instructions included in the program codes can be used to execute the above-mentioned
  • the computer software product is stored in a storage medium, including several
  • the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the present disclosure provides a video image patchwork search method, a video image splicing method and a device, which acquire the energy map of each frame of video image in the first video; for the first frame of video image, determine its patchwork based on its energy map Search results; for each remaining frame of video image, based on the patchwork search result of the previous frame of video image, determine the range of the patchwork search area; within this range, determine its patchwork search result based on the energy map of the current video image.
  • This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area.
  • This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
  • the video image seam search method, video image splicing method and device of the present application are reproducible and can be used in various industrial applications.
  • the seam searching method for video images, the splicing method and device for video images of the present application can be used in the technical field of video processing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

Provided in the present disclosure are a splicing seam search method and apparatus for a video image, and a video image splicing method and apparatus. The splicing seam search method comprises: acquiring an energy map of each frame of video image in a first video; for a first frame of video image, determining a splicing seam search result thereof on the basis of the energy map thereof; for each of the remaining frames of video images, determining the range of a splicing seam search area on the basis of a splicing seam search result of the previous frame of video image; and on the basis of an energy map of the current video image, determining a splicing seam search result thereof within the range. In this way, a splicing seam search result is determined on the basis of an energy map of a video image; moreover, for a video image other than a first frame, the range of a splicing seam search area is first determined on the basis of a splicing seam search result of the previous frame of video image, and a splicing seam search result is then determined within the range of the splicing seam search area. By means of constraining the range of the splicing seam search area, the difference between splicing seam areas of two consecutive frames of video images can be reduced, such that the problem of jittering of a spliced video during a playing process is alleviated, thereby improving the splicing effect of a panoramic video.

Description

视频图像的拼缝搜索方法、视频图像的拼接方法和装置Video image patchwork search method, video image splicing method and device
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年08月04日提交中国国家知识产权局的申请号为202110893253.6、名称为“视频图像的拼缝搜索方法、视频图像的拼接方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110893253.6 and titled "Seam Search Method for Video Images, Method and Device for Stitching Video Images" submitted to the State Intellectual Property Office of China on August 4, 2021. The entire contents are incorporated by reference in this application.
技术领域technical field
本公开涉及视频处理技术领域,尤其是涉及一种视频图像的拼缝搜索方法、视频图像的拼接方法和装置。The present disclosure relates to the technical field of video processing, and in particular, to a seam search method for video images, and a splicing method and device for video images.
背景技术Background technique
全景视频拼接是指将具有重叠视场的多个视频进行拼接,具体的,多个视频中的帧图像一一对应,将多个视频中相互对应的帧图像进行拼接,以得到360°全景视场视频;在全景视频拼接过程中,通常需要对多个视频中的每帧图像的拼缝进行搜索,然后基于搜索到的拼缝实现对每帧图像的拼接;相关技术中的图像拼接算法主要应用于单张图像的拼接,采用该图像拼接算法对待拼接的多个视频进行拼接时,容易前后帧图像的拼缝区域差异较大,拼接后的视频在播放过程中出现抖动,影响了全景视频的拼接效果。Panoramic video stitching refers to splicing multiple videos with overlapping fields of view. Specifically, the frame images in multiple videos correspond one-to-one, and the frame images corresponding to each other in multiple videos are spliced to obtain a 360° panoramic view. field video; in the panoramic video stitching process, it is usually necessary to search for the stitching of each frame of images in multiple videos, and then realize the stitching of each frame of images based on the searched stitching; the image stitching algorithm in the related art mainly Applied to the splicing of a single image, when using this image splicing algorithm to splice multiple videos to be spliced, it is easy to have large differences in the splicing areas of the front and rear frame images, and the spliced video shakes during playback, which affects the panoramic video stitching effect.
发明内容Contents of the invention
本公开提供了一种视频图像的拼缝搜索方法、视频图像的拼接方法和装置,以缓解拼接后的视频在播放过程中的抖动,提升全景视频的拼接效果。The present disclosure provides a seam search method for video images, a video image splicing method and a device, so as to alleviate the shaking of spliced videos during playback and improve the splicing effect of panoramic videos.
本公开提供了一种视频图像的拼缝搜索方法,方法可以包括:获取第一视频中每帧视频图像的能量图;其中,能量图用于指示视频图像中指定对象的位置区域和边缘;针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果;其中,拼缝搜索结果包括:视频图像与目标图像的拼缝区域;目标图像为第二视频中与视频图像对应的视频图像;针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果。The present disclosure provides a patchwork search method for a video image, the method may include: acquiring an energy map of each frame of video image in the first video; wherein, the energy map is used to indicate the location area and edge of a specified object in the video image; for In the first video, the first frame video image, based on the energy map of the first frame video image, determines the patchwork search result of the first frame video image; wherein, the patchwork search result includes: the patchwork area of the video image and the target image; The target image is the video image corresponding to the video image in the second video; for each frame of video image in the first video except the first frame, based on the patchwork search result of the previous frame video image of the current video image, determine the current video image The patchwork search area range of the image; within the patchwork search area range, determine the patchwork search result of the current video image based on the energy map of the current video image.
可选地,获取第一视频中每帧视频图像的能量图的步骤可以包括:获取第一视频中每帧视频图像的显著性目标能量图、运动目标能量图和边缘目标能量图;针对每帧视频图像,融合该帧视频图像所对应的显著性目标能量图、运动目标能量图和边缘能量图,得到该帧视频图像的能量图。Optionally, the step of obtaining the energy map of each frame of video image in the first video may include: obtaining a saliency target energy map, a moving target energy map and an edge target energy map of each frame of video image in the first video; for each frame The video image is fused with the salient object energy map, the moving object energy map and the edge energy map corresponding to the video image of the frame to obtain the energy map of the video image of the frame.
可选地,获取第一视频中每帧视频图像的显著性目标能量图、运动目标能量图和边缘能量图的步骤可以包括:针对第一视频中的每帧视频图像,将该视频图像输入至预设神经网络模型中,以通过预设神经网络模型输出该帧视频图像的显著性目标能量图;基于该帧视频图像中的运动目标,确定该帧视频图像的运动目标能量图;对该帧视频图像中每个对象进行边缘检测,得到该帧视频图像的边缘能量图。Optionally, the step of obtaining the saliency target energy map, the moving target energy map and the edge energy map of each frame of video image in the first video may include: for each frame of video image in the first video, input the video image to In the preset neural network model, to output the saliency target energy map of the frame video image through the preset neural network model; based on the moving target in the frame video image, determine the moving target energy map of the frame video image; Edge detection is performed on each object in the video image, and the edge energy map of the frame video image is obtained.
可选地,针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果的步骤可以包括:针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,采用动态规划算法,计算第一帧视频图像的拼缝搜索结果。Optionally, for the first frame of video image in the first video, based on the energy map of the first frame of video image, the step of determining the patchwork search result of the first frame of video image may include: for the first frame of video in the first video Image, based on the energy map of the first frame of video image, the dynamic programming algorithm is used to calculate the patchwork search result of the first frame of video image.
可选地,针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果的步骤可以包括:针对第一视频中除第一帧以外的每帧视频图像,在当前视频图像的前一帧视频图像的拼缝搜索结果的基础上,增加预设约束条件,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图,采用动态规划算法,确定当前视频图像的拼缝搜索结果。Optionally, for each frame of video image in the first video except the first frame, based on the seam search result of the previous frame video image of the current video image, determine the seam search area range of the current video image; Within the scope of the search area, the step of determining the patchwork search result of the current video image based on the energy map of the current video image may include: for each frame of video image in the first video except the first frame, in the previous frame of the current video image Based on the patchwork search results of the video image, add preset constraints to determine the patchwork search area of the current video image; within the patchwork search area, based on the energy map of the current video image, use a dynamic programming algorithm to determine Patchwork search results for the current video image.
可选地,第一视频中,每帧视频图像与该视频图像对应的目标图像具有重叠区域,重叠区域在视频图像中对应的区域为第一重叠区域,在目标图像中对应的区域为第二重叠区域;方法还可以包括:针对每帧视频图像,将该帧视频图像中第一重叠区域对应的图像,与该帧视频图像对应的目标图像中第二重叠区域对应的图像,输入至预先训练好的神经网络模型中,得到该帧视频图像的拼缝预测结果;其中,拼缝预测结果包括:视频图像与对应的目标图像的拼缝预测区域。Optionally, in the first video, each frame of video image has an overlapping area with the target image corresponding to the video image, the area corresponding to the overlapping area in the video image is the first overlapping area, and the corresponding area in the target image is the second overlapping area. overlapping region; the method may also include: for each frame of video image, the image corresponding to the first overlapping region in the frame of video image, and the image corresponding to the second overlapping region in the target image corresponding to the frame of video image, input to pre-training In a good neural network model, a patchwork prediction result of the frame of video image is obtained; wherein, the patchwork prediction result includes: a patchwork prediction area of the video image and the corresponding target image.
可选地,预先训练好的神经网络模型,可以通过下述方式确定:获取包含连续多组待拼接图像对的训练样本,以及每组待拼接图像对的拼缝搜索结果;针对除第一组待拼接图像对的每组待拼接图像对,将该组待拼接图像对,以及相邻的前一组待拼接图像对的拼缝预测结果,输入至初始神经网络模型中,以通过初始神经网络模型输出该组待拼接图像对的拼缝预测结果;基于该组待拼接图像对的拼缝搜索结果和预设的损失函数,计算该组待拼接图像对的拼缝预测结果的损失值;基于损失值更新初始神经网络模型的权重参数;继续执行获取包含连续多组待拼接图像对的训练样 本的步骤,直至初始神经网络模型收敛,得到神经网络模型。Optionally, the pre-trained neural network model can be determined in the following manner: obtain training samples containing multiple consecutive groups of image pairs to be stitched, and the seam search results of each group of image pairs to be stitched; For each group of image pairs to be stitched, the group of image pairs to be stitched and the seam prediction results of the adjacent previous group of image pairs to be stitched are input into the initial neural network model to pass through the initial neural network. The model outputs the seam prediction results of the group of image pairs to be stitched; based on the seam search results of the group of image pairs to be stitched and the preset loss function, calculate the loss value of the seam prediction results of the group of image pairs to be stitched; based on The loss value updates the weight parameters of the initial neural network model; continue to perform the step of obtaining training samples containing multiple consecutive groups of image pairs to be stitched until the initial neural network model converges to obtain the neural network model.
可选地,获取包含连续多组待拼接图像对的训练样本,以及每组待拼接图像对的拼缝搜索结果的步骤之后,方法还可以包括:获取预设拼缝模板;其中,预设拼缝模板中包括预设拼缝区域;针对第一组待拼接图像对,将第一组待拼接图像对和预设拼缝模板输入至初始神经网络模型中,以通过初始神经网络模型输出第一组待拼接图像对的拼缝预测结果。Optionally, after the step of obtaining training samples containing multiple consecutive groups of image pairs to be stitched and the seam search results of each group of image pairs to be stitched, the method may further include: obtaining a preset stitching template; wherein, the preset stitching The stitching template includes a preset stitching area; for the first group of image pairs to be stitched, the first set of image pairs to be stitched and the preset stitching template are input into the initial neural network model, so as to output the first stitching through the initial neural network model. The patchwork prediction results of the image pairs to be stitched together.
本公开提供的一种视频图像的拼接方法,方法可以包括:获取第一鱼眼视频和第二鱼眼视频;其中,第一鱼眼视频和第二鱼眼视频中的鱼眼视频图像具有重叠区域;提取第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及第二鱼眼视频中每帧鱼眼视频图像的第二目标区域;针对相互对应的两帧鱼眼视频图像,基于两帧鱼眼视频图像对应的第一目标区域和第二目标区域,以及预先获取到的更新展开参数值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片;基于相互对应的第一等距投影图片和第二等距投影图片,确定拼缝搜索结果;其中,拼缝搜索结果采用上述任一项的视频图像的拼缝搜索方法确定;基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定视频图像的视频拼接结果。A video image splicing method provided by the present disclosure may include: acquiring a first fisheye video and a second fisheye video; wherein, the fisheye video images in the first fisheye video and the second fisheye video have overlapping Region; extract the first target region of each frame of fisheye video image in the first fisheye video, and the second target region of each frame of fisheye video image in the second fisheye video; for two frames of fisheye video images corresponding to each other , based on the first target area and the second target area corresponding to the two frames of fisheye video images, and the pre-acquired updated expansion parameter values, determine the first isometric distance after the expansion of the frame of fisheye video image in the first fisheye video The projected picture, and the second equidistant projection picture after the frame of the fisheye video image in the second fisheye video is expanded; based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other, determine the seam search result; Wherein, the patchwork search result is determined by the patchwork search method of any one of the above video images; based on the patchwork search results corresponding to each group of two frames of fisheye video images corresponding to each other, the video stitching result of the video image is determined.
可选地,基于相互对应的第一等距投影图片和第二等距投影图片,确定拼缝搜索结果的步骤可以包括:将相互对应的第一等距投影图片和第二等距投影图片进行对齐;基于对齐后的第一等距投影图片和第二等距投影图片,提取第三重叠区域;基于第三重叠区域,对第二等距投影图片进行光照补偿,以使光照补偿后的第二等距投影图片中每个像素的像素值,与相对应的第一等距投影图片中每个像素的像素值相匹配;基于第一等距投影图片,以及光照补偿后的第二等距投影图片,确定拼缝搜索结果。Optionally, based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other, the step of determining the patchwork search result may include: comparing the first equidistant projection picture and the second equidistant projection picture corresponding to each other Alignment; based on the aligned first equidistant projection picture and the second equidistant projection picture, extract the third overlapping area; based on the third overlapping area, perform illumination compensation on the second equidistant projection picture, so that the illumination compensated first The pixel value of each pixel in the second equidistant projection picture matches the pixel value of each pixel in the corresponding first equidistant projection picture; based on the first equidistant projection picture and the second equidistant after illumination compensation Project images to determine patchwork search results.
可选地,更新展开参数值可以包括:视场角参数值、光心在x轴方向的参数值、光心在y轴方向的参数值和鱼眼旋转角参数值;更新展开参数值预先通过下述方式确定:获取每个展开参数的初始展开参数值和预设偏移范围;基于每个展开参数的初始展开参数值和预设偏移范围,对每个展开参数进行采样,得到每个展开参数的采样值,基于每个展开参数的采样值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第三等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第四等距投影图片;提取第三等距投影图片和第四等距投影图片的第四重叠区域;对第四重叠区域进行互相关计算,得到第一互相关计算结果;基于第一互相关计算结果和预设迭代次数,确定更新展开参数值。Optionally, updating the expansion parameter value may include: the field angle parameter value, the parameter value of the optical center in the x-axis direction, the parameter value of the optical center in the y-axis direction, and the fisheye rotation angle parameter value; updating the expansion parameter value is pre-passed Determined in the following way: Obtain the initial expansion parameter value and preset offset range of each expansion parameter; based on the initial expansion parameter value and preset offset range of each expansion parameter, sample each expansion parameter to obtain each The sampling value of the expansion parameter, based on the sampling value of each expansion parameter, determine the third equidistant projection picture after the expansion of the fisheye video image of the frame in the first fisheye video, and the fisheye video of the frame of the second fisheye video The fourth equidistant projection picture after image expansion; extracting the fourth overlapping area of the third equidistant projection picture and the fourth equidistant projection picture; performing cross-correlation calculation on the fourth overlapping area to obtain the first cross-correlation calculation result; based on The calculation result of the first cross-correlation and the preset number of iterations are used to determine and update the values of the expansion parameters.
可选地,基于第一互相关计算结果和预设迭代次数,确定更新展开参数值的步骤可以包括:按预设迭代次数,重复执行基于每个展开参数的初始展开参数值和预设偏移范围,对每个展开参数进行采样的步骤,得到多个第一互相关计算结果;从多个第一互相关计算结果中,选取数值最大的第一互相关计算结果;将数值最大的第一互相关计算结果对应的展开参数的采样值确定为更新展开参数值。Optionally, based on the first cross-correlation calculation result and the preset number of iterations, the step of determining to update the expansion parameter value may include: repeatedly performing the initial expansion parameter value and the preset offset based on each expansion parameter according to the preset iteration number Range, the step of sampling each expanded parameter to obtain multiple first cross-correlation calculation results; from the multiple first cross-correlation calculation results, select the first cross-correlation calculation result with the largest value; the first cross-correlation calculation result with the largest value The sampling value of the expansion parameter corresponding to the cross-correlation calculation result is determined as the updated expansion parameter value.
可选地,将相互对应的第一等距投影图片和第二等距投影图片进行对齐的步骤可以包括:从第一等距投影图片中提取第一特征点,从第二等距投影图片中提取第二特征点;基于第一特征点和第二特征点,确定匹配特征点对;基于匹配特征点对,将第一等距投影图片和第二等距投影图片进行对齐。Optionally, the step of aligning the corresponding first isometric projection picture and the second isometric projection picture may include: extracting the first feature point from the first isometric projection picture, extracting the first feature point from the second isometric projection picture Extracting the second feature point; determining a matching feature point pair based on the first feature point and the second feature point; aligning the first equidistant projection picture and the second equidistant projection picture based on the matching feature point pair.
可选地,将相互对应的第一等距投影图片和第二等距投影图片进行对齐的步骤可以包括:将第二等距投影图片按预设方向移动;在移动过程中,提取第一等距投影图片和第二等距投影图片的多个第五重叠区域;对多个第五重叠区域分别进行互相关计算,得到多个第二互相关计算结果;基于多个第二互相关计算结果,将第一等距投影图片和第二等距投影图片进行对齐。Optionally, the step of aligning the corresponding first equidistant projection picture and the second equidistant projection picture may include: moving the second equidistant projection picture in a preset direction; during the moving process, extracting the first equidistant projection picture A plurality of fifth overlapping areas of the projection image and the second equidistant projection image; cross-correlation calculations are performed on the plurality of fifth overlapping areas to obtain a plurality of second cross-correlation calculation results; based on the plurality of second cross-correlation calculation results , to align the first equidistant projection picture with the second equidistant projection picture.
可选地,基于多个第二互相关计算结果,将第一等距投影图片和第二等距投影图片进行对齐的步骤可以包括:从多个第二互相关计算结果中,选取数值最大的第二互相关计算结果;获取数值最大的第二互相关计算结果对应的第五重叠区域在第一等距投影图片中对应的第一边界像素点的位置坐标,以及在第二等距投影图片中对应的第二边界像素点的位置坐标;基于第一边界像素点的位置坐标和第二边界像素点的位置坐标,计算仿射变换矩阵;基于仿射变换矩阵,将第一等距投影图片和第二等距投影图片进行对齐。Optionally, based on a plurality of second cross-correlation calculation results, the step of aligning the first equidistant projection picture and the second equidistant projection picture may include: selecting the one with the largest value from the multiple second cross-correlation calculation results The second cross-correlation calculation result; obtain the position coordinates of the first boundary pixel point corresponding to the fifth overlapping area corresponding to the second cross-correlation calculation result with the largest value in the first equidistant projection picture, and obtain the position coordinates of the first boundary pixel point in the second equidistant projection picture The position coordinates of the corresponding second boundary pixel points; based on the position coordinates of the first boundary pixel points and the position coordinates of the second boundary pixel points, the affine transformation matrix is calculated; based on the affine transformation matrix, the first equidistant projection image Align with the second isometric projection image.
可选地,基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定视频图像的视频拼接结果的步骤包括:针对每组相互对应的两帧鱼眼视频图像,基于该组中两帧鱼眼视频图像对应的拼缝搜索结果,确定该组中两帧鱼眼视频图像对应的融合的重叠区域;将融合的重叠区域替换该组中两帧鱼眼视频图像对应的第三重叠区域,得到该组中两帧鱼眼视频图像的图像拼接结果;基于每组中两帧鱼眼视频图像的图像拼接结果,确定视频图像的视频拼接结果。Optionally, based on the patchwork search results corresponding to each group of two corresponding fisheye video images, the step of determining the video stitching result of the video images includes: for each group of two corresponding fisheye video images, based on the group The patchwork search results corresponding to the two frames of fisheye video images in the group determine the fused overlapping area corresponding to the two frames of fisheye video images in the group; replace the fused overlapping area with the third corresponding to the two frames of fisheye video images in the group In the overlapping area, the image stitching result of the two frames of fisheye video images in the group is obtained; based on the image stitching results of the two frames of fisheye video images in each group, the video stitching result of the video images is determined.
本公开提供的一种视频图像的拼缝搜索装置,该装置可以包括:第一获取模块,可以被配置成用于获取第一视频中每帧视频图像的能量图;其中,能量图用于指示视频图像中指定对象的位置区域和边缘;第一确定模块,可以被配置成用于针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果;其中,拼缝搜索结果包括:视频图像与目标图像的拼缝区域;目标图像为第二视频中与视频图像对应的视频图像;第二 确定模块,可以被配置成用于针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果。The present disclosure provides an apparatus for seam search of video images, the apparatus may include: a first acquisition module configured to acquire an energy map of each frame of video image in the first video; wherein the energy map is used to indicate The position area and edge of the specified object in the video image; the first determination module may be configured to determine the first frame of the video image based on the energy map of the first frame of the video image for the first frame of the video image in the first video Patchwork search results; wherein, the patchwork search results include: the patchwork area of the video image and the target image; the target image is a video image corresponding to the video image in the second video; the second determination module can be configured for In the first video, every frame of video image except the first frame, based on the seam search result of the previous frame video image of the current video image, determines the seam search area range of the current video image; within the seam search area range, A patchwork search result of the current video image is determined based on the energy map of the current video image.
本公开提供的一种视频图像的拼接装置,装置可以包括:第二获取模块,可以被配置成用于获取第一鱼眼视频和第二鱼眼视频;其中,第一鱼眼视频和第二鱼眼视频中的鱼眼视频图像具有重叠区域;提取模块,可以被配置成用于提取第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及第二鱼眼视频中每帧鱼眼视频图像的第二目标区域;第三确定模块,可以被配置成用于针对相互对应的两帧鱼眼视频图像,基于两帧鱼眼视频图像对应的第一目标区域和第二目标区域,以及预先获取到的更新展开参数值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片;第四确定模块,可以被配置成用于基于相互对应的第一等距投影图片和第二等距投影图片,确定拼缝搜索结果;其中,拼缝搜索结果采用上述视频图像的拼缝搜索装置确定;第五确定模块,可以被配置成用于基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定视频图像的视频拼接结果。The present disclosure provides a video image splicing device, the device may include: a second acquisition module, which may be configured to acquire a first fisheye video and a second fisheye video; wherein, the first fisheye video and the second The fisheye video image in the fisheye video has an overlapping area; the extraction module can be configured to extract the first target area of each frame of the fisheye video image in the first fisheye video, and each frame of the second fisheye video The second target area of the fisheye video image; the third determination module may be configured to, for two corresponding frames of fisheye video images, based on the first target area and the second target area corresponding to the two frames of fisheye video images , and the pre-acquired update expansion parameter value, determine the first equidistant projection picture after the expansion of the frame of fisheye video image in the first fisheye video, and the expansion of the frame of fisheye video image in the second fisheye video The second equidistant projection picture; the fourth determination module can be configured to determine the patchwork search result based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other; wherein, the patchwork search result adopts the above The seam search device determines the video image; the fifth determination module may be configured to determine the video stitching result of the video image based on the seam search results corresponding to each group of two corresponding fisheye video images.
本公开提供的一种电子***,电子***可以包括:图像采集设备、处理设备和存储装置;图像采集设备,用于获取预览视频帧或图像数据;存储装置上存储有计算机程序,计算机程序在被处理设备运行时执行如上所述的视频图像的拼缝搜索方法,或者,如上所述的视频图像的拼接方法。An electronic system provided by the present disclosure may include: an image acquisition device, a processing device, and a storage device; the image acquisition device is used to acquire a preview video frame or image data; a computer program is stored on the storage device, and the computer program is stored in the storage device. When the processing device is running, it executes the above-mentioned seam search method for video images, or the above-mentioned video image splicing method.
本公开提供的一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理设备运行时执行如上所述的视频图像的拼缝搜索方法的步骤,或者,如上所述的视频图像的拼接方法的步骤。The present disclosure provides a computer-readable storage medium, in which a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processing device, the steps of the above-mentioned patchwork search method for video images are executed, or, as above-mentioned The steps of the stitching method of the video image.
本公开提供的一种视频图像的拼缝搜索方法、视频图像的拼接方法和装置,首先获取第一视频中每帧视频图像的能量图;然后针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果;针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果。该方式基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。The present disclosure provides a video image patchwork search method, video image splicing method and device, firstly obtain the energy map of each frame of the video image in the first video; then for the first frame of the video image in the first video, based on the first The energy map of a frame of video image, determine the patchwork search result of the first frame of video image; for each frame of video image in the first video except the first frame, based on the patchwork search of the previous frame of video image of the current video image As a result, the range of the seam search area of the current video image is determined; within the range of the seam search area, the seam search result of the current video image is determined based on the energy map of the current video image. This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
附图说明Description of drawings
为了更清楚地说明本公开具体实施方式或相关技术中的技术方案,下面将对具体实施方式或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the specific embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the specific embodiments or descriptions of related technologies. Obviously, the accompanying drawings in the following description are For some implementations of the present disclosure, those skilled in the art can also obtain other drawings based on these drawings without making creative efforts.
图1为本公开实施例提供的一种电子***的结构示意图;FIG. 1 is a schematic structural diagram of an electronic system provided by an embodiment of the present disclosure;
图2为本公开实施例提供的一种视频图像的拼缝搜索方法的流程图;FIG. 2 is a flow chart of a seam search method for a video image provided by an embodiment of the present disclosure;
图3为本公开实施例提供的另一种视频图像的拼缝搜索方法的流程图;FIG. 3 is a flow chart of another method for patchwork search of video images provided by an embodiment of the present disclosure;
图4为本公开实施例提供的又一种视频图像的拼缝搜索方法的流程图;FIG. 4 is a flow chart of another method for patchwork search of video images provided by an embodiment of the present disclosure;
图5为本公开实施例提供的一种神经网络模型训练过程示意图;FIG. 5 is a schematic diagram of a neural network model training process provided by an embodiment of the present disclosure;
图6为本公开实施例提供的一种视频图像的拼接方法的流程图;FIG. 6 is a flow chart of a video image splicing method provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种等距投影图片展开方式示意图;FIG. 7 is a schematic diagram of an equidistant projection picture expansion method provided by an embodiment of the present disclosure;
图8为本公开实施例提供的一种互相关计算示意图;FIG. 8 is a schematic diagram of a cross-correlation calculation provided by an embodiment of the present disclosure;
图9为本公开实施例提供的一种图片对齐示意图;FIG. 9 is a schematic diagram of picture alignment provided by an embodiment of the present disclosure;
图10为本公开实施例提供的一种视频图像的拼缝搜索装置的结构示意图;FIG. 10 is a schematic structural diagram of a video image patchwork search device provided by an embodiment of the present disclosure;
图11为本公开实施例提供的一种视频图像的拼接装置的结构示意图。FIG. 11 is a schematic structural diagram of a video image splicing device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将结合实施例对本公开的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solution of the present disclosure will be clearly and completely described below in conjunction with the embodiments. Apparently, the described embodiments are part of the embodiments of the present disclosure, but not all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present disclosure.
近年来,基于人工智能的计算机视觉、深度学习、机器学习、图像处理、图像识别等技术研究取得了重要进展。人工智能(Artificial Intelligence,AI)是研究、开发用于模拟、延伸人的智能的理论、方法、技术及应用***的新兴科学技术。人工智能学科是一门综合性学科,涉及芯片、大数据、云计算、物联网、分布式存储、深度学习、机器学习、 神经网络等诸多技术种类。计算机视觉作为人工智能的一个重要分支,具体是让机器识别世界,计算机视觉技术通常包括人脸识别、活体检测、指纹识别与防伪验证、生物特征识别、人脸检测、行人检测、目标检测、行人识别、图像处理、图像识别、图像语义理解、图像检索、文字识别、视频处理、视频内容识别、行为识别、三维重建、虚拟现实、增强现实、同步定位与地图构建(SLAM)、计算摄影、机器人导航与定位等技术。随着人工智能技术的研究和进步,该项技术在众多领域展开了应用,例如安防、城市管理、交通管理、楼宇管理、园区管理、人脸通行、人脸考勤、物流管理、仓储管理、机器人、智能营销、计算摄影、手机影像、云服务、智能家居、穿戴设备、无人驾驶、自动驾驶、智能医疗、人脸支付、人脸解锁、指纹解锁、人证核验、智慧屏、智能电视、摄像机、移动互联网、网络直播、美颜、美妆、医疗美容、智能测温等领域。In recent years, artificial intelligence-based computer vision, deep learning, machine learning, image processing, image recognition and other technologies have made important progress. Artificial Intelligence (AI) is an emerging science and technology that studies and develops theories, methods, technologies and application systems for simulating and extending human intelligence. The subject of artificial intelligence is a comprehensive subject that involves many technologies such as chips, big data, cloud computing, Internet of Things, distributed storage, deep learning, machine learning, and neural networks. As an important branch of artificial intelligence, computer vision is specifically to allow machines to recognize the world. Computer vision technology usually includes face recognition, liveness detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, target detection, pedestrian detection, etc. Recognition, image processing, image recognition, image semantic understanding, image retrieval, text recognition, video processing, video content recognition, behavior recognition, 3D reconstruction, virtual reality, augmented reality, simultaneous localization and map construction (SLAM), computational photography, robotics Navigation and positioning technologies. With the research and progress of artificial intelligence technology, this technology has been applied in many fields, such as security, urban management, traffic management, building management, park management, face access, face attendance, logistics management, warehouse management, robots , smart marketing, computational photography, mobile imaging, cloud services, smart home, wearable devices, unmanned driving, automatic driving, smart medical care, face payment, face unlock, fingerprint unlock, witness verification, smart screen, smart TV, Cameras, mobile Internet, webcasting, beauty, cosmetics, medical beauty, intelligent temperature measurement and other fields.
目前,全景视频拼接是指将具有重叠视场的多个视频进行拼接,以得到360°全景视场视频;全景视频拼接技术可应用于运动相机、远程会议或安防监控等应用场景中。一般可以通过对多个广角相机进行组合的方式实现全景视频拼接,多个广角相机的覆盖视场角需要超过360°,可以使用鱼眼相机来降低广角相机的数量,两个视场角大于180°的鱼眼相机拼接即可生成360°全景视频。在全景视频拼接过程中,通常需要对多个视频中的每帧图像的拼缝进行搜索,然后基于搜索到的拼缝实现对每帧图像的拼接;相关技术中的图像拼接算法主要应用于单张图像的拼接,该图像拼接算法应用于不同鱼眼相机时,所得到的拼接图像的质量不稳定,即,对于不同鱼眼相机的适配能力较差;并且,采用该图像拼接算法对待拼接的多个视频进行拼接时,容易前后帧图像的拼缝区域差异较大,拼接后的视频在播放过程中出现抖动,影响了全景视频的拼接效果。基于此,本公开实施例提供了一种视频图像的拼缝搜索方法、视频图像的拼接方法和装置,该技术可以应用于对多个视频的拼接应用中,该技术可采用相应的软件和硬件实现,以下对本公开实施例进行详细介绍。At present, panoramic video stitching refers to the stitching of multiple videos with overlapping fields of view to obtain a 360° panoramic video; panoramic video stitching technology can be applied to application scenarios such as sports cameras, remote conferences, or security monitoring. Generally, panoramic video stitching can be realized by combining multiple wide-angle cameras. The coverage field of view of multiple wide-angle cameras needs to exceed 360°. Fisheye cameras can be used to reduce the number of wide-angle cameras. The two field of view are greater than 180° ° fisheye camera stitching can generate 360 ° panoramic video. In the panoramic video mosaic process, it is usually necessary to search for the seams of each frame of images in multiple videos, and then realize the mosaic of each frame of images based on the searched seams; the image mosaic algorithm in the related art is mainly applied to a single When the image stitching algorithm is applied to different fisheye cameras, the quality of the stitched images obtained is unstable, that is, the adaptability to different fisheye cameras is poor; and, using the image stitching algorithm to be stitched When multiple videos are stitched together, the stitching area of the front and rear frame images is likely to be quite different, and the stitched video will shake during playback, which affects the stitching effect of the panoramic video. Based on this, an embodiment of the present disclosure provides a video image seam search method, a video image splicing method and a device, this technology can be applied to multiple video splicing applications, and this technology can use corresponding software and hardware Implementation, the following describes the embodiments of the present disclosure in detail.
实施例一Embodiment one
首先,参照图1来描述用于实现本公开实施例的视频图像的拼缝搜索方法、视频图像的拼接方法和装置的示例电子***100。First, an example electronic system 100 for implementing the video image seam search method, the video image splicing method and the device according to the embodiments of the present disclosure will be described with reference to FIG. 1 .
如图1所示的一种电子***的结构示意图,电子***100可以包括一个或多个处理设备102、一个或多个存储装置104、输入装置106、输出装置108以及一个或多个图像采集设备110,这些组件通过总线***112和/或其它形式的连接机构(未示出)互连。应当注意,图1所示的电子***100的组件和结构只是示例性的,而非限制性的,根据需要,电子***也可以具有其他组件和结构。As shown in FIG. 1 , a schematic structural diagram of an electronic system, the electronic system 100 may include one or more processing devices 102, one or more storage devices 104, an input device 106, an output device 108, and one or more image acquisition devices 110, these components are interconnected via a bus system 112 and/or other forms of connection mechanisms (not shown). It should be noted that the components and structure of the electronic system 100 shown in FIG. 1 are only exemplary rather than limiting, and the electronic system may also have other components and structures as required.
处理设备102可以是网关,也可以为智能终端,或者是包含中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元的设备,可以对电子***100中的其它组件的数据进行处理,还可以控制电子***100中的其它组件以执行期望的功能。The processing device 102 may be a gateway, or an intelligent terminal, or a device including a central processing unit (CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities, and can control other devices in the electronic system 100. The data of the components are processed, and other components in the electronic system 100 can also be controlled to perform desired functions.
存储装置104可以包括一个或多个计算机程序产品,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令,处理设备102可以运行程序指令,以实现下文的本公开实施例中(由处理设备实现)的客户端功能以及/或者其它期望的功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据,例如应用程序使用和/或产生的各种数据等。Storage device 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions can be stored on the computer-readable storage medium, and the processing device 102 can execute the program instructions to implement the client functions (implemented by the processing device) in the following embodiments of the present disclosure and/or other desired Function. Various application programs and various data, such as various data used and/or generated by the application programs, can also be stored in the computer-readable storage medium.
输入装置106可以是用户用来输入指令的装置,并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或多个。The input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
输出装置108可以向外部(例如,用户)输出各种信息(例如,图像或声音),并且可以包括显示器、扬声器等中的一个或多个。The output device 108 may output various information (eg, images or sounds) to the outside (eg, a user), and may include one or more of a display, a speaker, and the like.
图像采集设备110可以采集预览视频帧或图像数据,并且将采集到的预览视频帧或图像数据存储在存储装置104中以供其它组件使用。The image capture device 110 can capture preview video frames or image data, and store the captured preview video frames or image data in the storage device 104 for use by other components.
示例性地,用于实现根据本公开实施例的视频图像的拼缝搜索方法、视频图像的拼接方法和装置的示例电子***中的各器件可以集成设置,也可以分散设置,诸如将处理设备102、存储装置104、输入装置106和输出装置108集成设置于一体,而将图像采集设备110设置于可以采集到目标图像的指定位置。当上述电子***中的各器件集成设置时,该电子***可以被实现为诸如相机、智能手机、平板电脑、计算机、车载终端等智能终端。Exemplarily, each device in the example electronic system for realizing the seam search method for video images, the stitching method for video images and the device according to the embodiments of the present disclosure may be integrated or distributed, such as the processing device 102 , the storage device 104, the input device 106 and the output device 108 are integrated into one body, and the image acquisition device 110 is arranged at a designated position where the target image can be collected. When the components in the above electronic system are integrated, the electronic system can be realized as an intelligent terminal such as a camera, a smart phone, a tablet computer, a computer, and a vehicle-mounted terminal.
实施例二Embodiment two
本实施例提供了一种视频图像的拼缝搜索方法,如图2所示,该方法可以包括如下步骤:This embodiment provides a patchwork search method for video images, as shown in Figure 2, the method may include the following steps:
步骤S202,获取第一视频中每帧视频图像的能量图;其中,能量图用于指示视频图像中指定对象的位置区域和边 缘。Step S202, obtaining an energy map of each frame of video image in the first video; wherein, the energy map is used to indicate the location area and edge of a specified object in the video image.
上述第一视频可以是通过相机或摄像头等装置采集到的视频,比如,可以是通过广角相机或鱼眼相机等采集到的视频等;上述能量图可以采用灰度图等形式表示,比如,如果能量图为灰度图的表示形式,则该能量图中,像素点的灰度值越高,通常表示该像素点的能量越大,相应的能量值越高,反之,像素点的灰度值越低,通常表示该像素点的能量越小,相应的能量值越低,该能量图中能量值可以采用归一化的表示方式,即能量值可以是0到1之间的值,采用0到1的能量分布;上述指定对象可以是视频图像中的任一对象,比如,该指定对象可以是视频图像中的人或动物等;上述位置区域可以理解为该指定对象在视频图像中所占用的区域;上述边缘可以理解为视频图像中,该指定对象对应的外缘轮廓。在实际实现时,当需要对视频图像的拼缝进行搜索时,通常需要先获取第一视频中每帧视频图像所对应的能量图,每帧视频图像的能量图可以指示该帧视频图像中,指定对象所处的位置区域和边缘轮廓。The above-mentioned first video can be a video collected by a device such as a camera or a camera, for example, it can be a video collected by a wide-angle camera or a fisheye camera, etc.; the above-mentioned energy map can be expressed in a form such as a grayscale image, for example, if The energy map is a representation of a grayscale image. In this energy map, the higher the gray value of a pixel, it usually means that the energy of the pixel is greater, and the corresponding energy value is higher. On the contrary, the gray value of the pixel The lower the value, the lower the energy of the pixel, and the lower the corresponding energy value. The energy value in the energy map can be expressed in a normalized manner, that is, the energy value can be a value between 0 and 1, and 0 is used. energy distribution to 1; the specified object can be any object in the video image, for example, the specified object can be a person or animal in the video image; the above-mentioned position area can be understood as the occupied area of the specified object in the video image The above-mentioned edge can be understood as the outer edge contour corresponding to the specified object in the video image. In actual implementation, when it is necessary to search for the patchwork of video images, it is usually necessary to obtain the energy map corresponding to each frame of video image in the first video, and the energy map of each frame of video image can indicate that in the frame of video image, Specifies the location area and edge outline where the object is located.
步骤S204,针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果;其中,拼缝搜索结果包括:视频图像与目标图像的拼缝区域;目标图像为第二视频中与视频图像对应的视频图像。Step S204, for the first frame of video image in the first video, based on the energy map of the first frame of video image, determine the patchwork search result of the first frame of video image; wherein, the patchwork search result includes: the difference between the video image and the target image patchwork area; the target image is a video image corresponding to the video image in the second video.
上述第二视频可以是通过相机或摄像头等装置采集到的视频,比如,可以是通过广角相机或鱼眼相机等采集到的视频等;上述拼缝区域可以理解为对视频图像与目标图像拼接时,相应的拼接缝合线对应的区域;上述第一视频中的视频图像和该第二视频中的视频图像可以是一一对应的;比如,两个鱼眼相机在同一场景下同时拍摄两段视频,即上述第一视频和第二视频,两个鱼眼相机的视野不同,在第二视频中,与第一视频的第一帧视频图像对应的视频图像即为第二视频的第一帧视频图像,该第二视频的第一帧视频图像对应上述目标图像;在实际实现时,对于第一视频中的第一帧视频图像来说,可以先确定第二视频中,与该第一视频中的第一帧视频图像对应的目标图像,然后可以基于该第一帧视频图像的能量图,确定该第一帧视频图像与对应的目标图像之间的拼缝区域,以确定该第一帧视频图像的拼缝搜索结果,比如,可以基于该第一帧视频图像的能量图,从能量较小的,相对静态的区域中搜索该第一视频图像与对应的目标图像的拼缝区域等。The above-mentioned second video can be a video collected by devices such as a camera or a camera, for example, it can be a video collected by a wide-angle camera or a fisheye camera, etc.; , the area corresponding to the corresponding splicing seam line; the video image in the above-mentioned first video and the video image in the second video may have a one-to-one correspondence; for example, two fisheye cameras shoot two videos at the same time in the same scene , that is, the above-mentioned first video and second video, the fields of view of the two fisheye cameras are different, in the second video, the video image corresponding to the first frame video image of the first video is the first frame video of the second video Image, the first frame video image of the second video corresponds to the above-mentioned target image; in actual implementation, for the first frame video image in the first video, it can be determined first that the second video is the same as the first video image The target image corresponding to the first frame of video image, and then based on the energy map of the first frame of video image, determine the patchwork area between the first frame of video image and the corresponding target image, to determine the first frame of video The patchwork search result of the image, for example, can be based on the energy map of the first frame of video image, and the patchwork area between the first video image and the corresponding target image can be searched from areas with less energy and relatively static.
步骤S206,针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果。Step S206, for each frame of video image in the first video except the first frame, based on the patchwork search result of the previous frame video image of the current video image, determine the range of the patchwork search area of the current video image; Within the area range, the patchwork search result of the current video image is determined based on the energy map of the current video image.
上述拼缝搜索区域范围可以理解为对搜索拼缝搜索结果所限定的约束范围;在实际实现时,对于第一视频中除第一帧视频图像以外的每帧视频图像来说,可以基于当前帧视频图像的前一帧视频图像的拼缝搜索结果,为该当前帧视频图像的拼缝搜索结果约束相应的拼缝搜索区域范围,在该拼缝搜索区域范围内,基于当前帧视频图像的能量图确定当前帧视频图像的拼缝搜索结果;比如,可以基于前一帧视频图像的拼缝搜索结果,增加预设的约束条件,以约束当前帧视频图像的拼缝搜索范围,在所确定的拼缝搜索范围内,可以基于当前帧视频图像的能量图,从能量较小的,相对静态的区域中搜索当前帧视频图像与对应的目标图像的拼缝区域等。The range of the above-mentioned patchwork search area can be understood as the restricted range of the search patchwork search results; in actual implementation, for each frame of video image in the first video except the first frame of video image, it can be based on the current frame The patchwork search result of the previous frame video image of the video image constrains the corresponding patchwork search area range for the patchwork search result of the current frame video image, within the patchwork search area range, based on the energy of the current frame video image The figure determines the patchwork search result of the current frame video image; for example, based on the patchwork search result of the previous frame video image, a preset constraint condition can be added to constrain the patchwork search range of the current frame video image, within the determined Within the patchwork search range, based on the energy map of the current frame video image, the patchwork area between the current frame video image and the corresponding target image can be searched from relatively static areas with low energy.
上述视频图像的拼缝搜索方法,首先获取第一视频中每帧视频图像的能量图;然后针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果;针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果。该方式基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。The patchwork search method of the above-mentioned video image first obtains the energy map of each frame of video image in the first video; then, for the first frame of video image in the first video, based on the energy map of the first frame of video image, determine the The patchwork search result of image; For each frame video image except the first frame in the first video, based on the patchwork search result of the previous frame video image of the current video image, determine the patchwork search area range of the current video image; Within the range of the patchwork search area, determine the patchwork search result of the current video image based on the energy map of the current video image. This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
实施例三Embodiment three
本实施例提供了另一种视频图像的拼缝搜索方法,该方法在上述实施例方法的基础上实现,如图3所示,该方法可以包括如下步骤:This embodiment provides another method for patchwork search of video images, which is implemented on the basis of the methods of the above-mentioned embodiments, as shown in Figure 3, the method may include the following steps:
步骤S302,获取第一视频中每帧视频图像的显著性目标能量图、运动目标能量图和边缘能量图。Step S302, acquiring the salient object energy map, the moving object energy map and the edge energy map of each frame of the video image in the first video.
上述显著性目标能量图可以指示视频图像中指定的第一对象的位置区域,即,第一对象在该视频图像中所占用的区域,该第一对象通常是视频图像中最引人注目的对象,也可以理解为该视频图像中的显著性目标对象、被关注对象或主体等,比如,在包含人物的视频图像中,人物通常是该视频图像中的第一对象等;显著性目标能量图中,第一对象的位置区域的能量值通常相对较高。The above-mentioned saliency target energy map may indicate the location area of the first object specified in the video image, that is, the area occupied by the first object in the video image, and the first object is usually the most eye-catching object in the video image , can also be understood as the salient target object, focused object or subject in the video image, for example, in a video image containing a character, the character is usually the first object in the video image, etc.; the salient target energy map In , the energy value of the location region of the first object is usually relatively high.
上述运动目标能量图可以指示视频图像中指定的第二对象的位置区域,即,第二对象在该视频图像中所占用的区域,该第二对象通常是视频图像中的运动目标,比如,视频图像中包含的行驶中的车辆等;运动目标能量图中,第二 对象的位置区域的能量值通常相对较高。The above-mentioned moving object energy map may indicate the location area of the second object specified in the video image, that is, the area occupied by the second object in the video image, the second object is usually a moving object in the video image, for example, a video In the moving vehicle, etc. contained in the image; in the moving object energy map, the energy value of the location area of the second object is usually relatively high.
上述边缘能量图可以指示视频图像中指定的第三对象的边缘,即,第三对象对应的外缘轮廓,该第三对象通常包括上述第一对象和第二对象,还可以包括视频图像中所包含的其他对象等;边缘能量图中,第三对象的边缘的能量值通常相对较高。The above edge energy map may indicate the edge of the third object specified in the video image, that is, the outer edge contour corresponding to the third object, the third object generally includes the first object and the second object, and may also include all Other objects included, etc.; in the edge energy map, the energy value of the edge of the third object is usually relatively high.
在实际实现时,当需要对视频图像的拼缝进行搜索时,通常需要先获取第一视频中每帧视频图像所对应的显著性目标能量图、运动目标能量图和边缘能量图,以基于三种目标能量图确定每帧视频图像的拼缝搜索结果。In actual implementation, when it is necessary to search the patchwork of video images, it is usually necessary to obtain the saliency target energy map, moving target energy map and edge energy map corresponding to each frame of the video image in the first video, to base on the three A target energy map determines the patchwork search results for each video frame.
具体的,该步骤S302可以通过下述步骤一至步骤三来实现:Specifically, this step S302 can be implemented through the following steps 1 to 3:
步骤一,针对第一视频中的每帧视频图像,将该视频图像输入至预设神经网络模型中,以通过预设神经网络模型输出该帧视频图像的显著性目标能量图。Step 1: For each frame of video image in the first video, input the video image into a preset neural network model, so as to output a saliency target energy map of the frame of video image through the preset neural network model.
上述预设神经网络模型可以通过多种卷积神经网络实现,如残差网络、VGG网络等,该预设神经网络模型可以是任意大小的卷积神经网络模型,比如,可以是resnet34_05x等;在实际实现时,当需要构造第一视频中每帧视频图像的显著性目标能量图时,可以通过神经网络等方式实现,比如,可以将每帧视频图像输入至该预设神经网络中,通过该预设神经网络可以检测出每帧视频图像中指定的第一对象,并输出每帧视频图像的显著性目标能量图,以指示每帧视频图像中指定的第一对象的位置区域。当然也可以采用其他方式确定每帧视频图像的显著性目标能量图,具体可以参考相关技术中的实现方式,在此不再赘述。The above preset neural network model can be realized by various convolutional neural networks, such as residual network, VGG network, etc. The preset neural network model can be a convolutional neural network model of any size, for example, it can be resnet34_05x, etc.; In actual implementation, when it is necessary to construct the saliency target energy map of each frame of video image in the first video, it can be realized by means of neural network, for example, each frame of video image can be input into the preset neural network, through which The preset neural network can detect the specified first object in each frame of video image, and output a saliency target energy map of each frame of video image to indicate the position area of the specified first object in each frame of video image. Of course, other methods may also be used to determine the saliency target energy map of each frame of video image, for details, reference may be made to implementation methods in related technologies, which will not be repeated here.
步骤二,基于该帧视频图像中的运动目标,确定该帧视频图像的运动目标能量图。 Step 2, based on the moving target in the frame of video image, determine the moving target energy map of the frame of video image.
在实际实现时,可以基于每帧视频图像中的运动目标,采用光流计算等方式确定每帧视频图像的运动目标能量图,比如,针对第一视频中除第一帧以外的每帧视频图像,可以基于当前帧视频图像以及前一帧视频图像,确定出运动目标,进而基于确定出的运动目标,确定当前帧视频图像的运动目标能量图;其中,对运动目标的检测可以理解为将图像序列或视频中发生空间位置变化的物体作为前景提出并标示的过程。当然也可以采用其他方式确定每帧视频图像的运动目标能量图,具体可以参考相关技术中的实现方式,在此不再赘述。In actual implementation, based on the moving object in each frame of video image, the energy map of the moving object of each frame of video image can be determined by means of optical flow calculation, for example, for each frame of video image in the first video except the first frame , the moving target can be determined based on the current frame video image and the previous frame video image, and then based on the determined moving target, the moving target energy map of the current frame video image can be determined; wherein, the detection of the moving target can be understood as image The process of proposing and marking the objects whose spatial positions change in the sequence or video as the foreground. Of course, other methods may also be used to determine the energy map of the moving target of each frame of video image. For details, reference may be made to implementation methods in related technologies, which will not be repeated here.
步骤三,对该帧视频图像中每个对象进行边缘检测,得到该帧视频图像的边缘能量图。Step 3: Perform edge detection on each object in the frame of video image to obtain an edge energy map of the frame of video image.
边缘检测的目的是找到视频图像中亮度变化剧烈的像素点构成的集合,表现出来往往是轮廓。在实际实现时,可以采用光流计算等方式,对每帧视频图像中所包含的各个对象进行边缘检测,确定出各个对象的轮廓,并得到每帧视频图像的边缘能量图。当然也可以采用其他方式确定每帧视频图像的边缘能量图,具体可以参考相关技术中的实现方式,在此不再赘述。The purpose of edge detection is to find a collection of pixels with sharp changes in brightness in the video image, which often appear as contours. In actual implementation, optical flow calculation and other methods can be used to perform edge detection on each object contained in each frame of video image, determine the outline of each object, and obtain the edge energy map of each frame of video image. Of course, other methods may also be used to determine the edge energy map of each frame of video image, for details, reference may be made to implementation methods in related technologies, which will not be repeated here.
步骤S304,针对每帧视频图像,融合该帧视频图像所对应的显著性目标能量图、运动目标能量图和边缘能量图,得到该帧视频图像的能量图。Step S304 , for each frame of video image, fusing the corresponding salient object energy map, moving object energy map and edge energy map of the frame of video image to obtain the energy map of the frame of video image.
对于每帧视频图像来说,当得到该帧视频图像的显著性目标能量图、运动目标能量图和边缘能量图后,可以对这三种能量图进行融合,得到该帧视频图像对应的融合后的能量图,即该帧视频图像的能量图;融合后的能量图中包含了该帧视频图像中显著性目标对象的位置区域和边缘,运动目标的位置区域和边缘,以及其他对象的边缘。For each frame of video image, after obtaining the saliency target energy map, moving target energy map and edge energy map of the frame video image, these three energy maps can be fused to obtain the fused The energy map of the frame video image, that is, the energy map of the frame video image; the fused energy map contains the position area and edge of the salient target object in the frame video image, the position area and edge of the moving target, and the edges of other objects.
步骤S306,针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,采用动态规划算法,计算第一帧视频图像的拼缝搜索结果;其中,拼缝搜索结果包括:视频图像与目标图像的拼缝区域;目标图像为第二视频中与视频图像对应的视频图像。Step S306, for the first frame of video image in the first video, based on the energy map of the first frame of video image, using a dynamic programming algorithm to calculate the patchwork search result of the first frame of video image; wherein, the patchwork search result includes: video The patchwork area between the image and the target image; the target image is a video image corresponding to the video image in the second video.
上述动态规划算法可以通过拆分问题、定义问题状态和状态之间的关系,使得问题能够以递推的方式去解决;也是将待求解的问题分解为若干个子问题,按顺序求解子问题,前一子问题的解,为后一子问题的求解提供了有用的信息;在求解任一子问题时,列出各种可能的局部解,通过决策保留那些有可能达到最优的局部解,丢弃其他局部解;依次解决各子问题,最后一个子问题就是初始问题的解;本实施例中,对于第一视频中第一帧视频图像来说,可以基于该第一帧视频图像的能量图,通过动态规划算法,搜索出该第一帧视频图像的多个拼缝搜索结果,并从多个拼缝搜索结果中选取出最优的拼缝搜索结果,该最优的拼缝搜索结果通常是从相对静态的、能量较小的背景中,搜索到的第一帧视频图像与对应的目标图像的拼缝,该拼缝的形状可能是不规则的,该最优的拼缝搜索结果通常会避开第一帧视频图像中所包含的显著性目标对象的位置区域和边缘,运动目标的位置区域和边缘,以及其他对象的边缘。The above dynamic programming algorithm can solve the problem recursively by splitting the problem and defining the relationship between the problem state and the state; it also decomposes the problem to be solved into several sub-problems, and solves the sub-problems in order. The solution of a sub-problem provides useful information for the solution of the latter sub-problem; when solving any sub-problem, list various possible local solutions, keep those local solutions that are likely to be optimal through decision-making, and discard Other local solutions; each sub-problem is solved in turn, and the last sub-problem is the solution of the initial problem; in this embodiment, for the first frame video image in the first video, based on the energy map of the first frame video image, Through the dynamic programming algorithm, a plurality of patchwork search results of the first frame video image are searched out, and the optimal patchwork search result is selected from the multiple patchwork search results, and the optimal patchwork search result is usually From a relatively static background with low energy, the seam between the first frame of video image and the corresponding target image searched may be irregular in shape, and the optimal seam search result is usually Avoid the position area and edge of the salient target object contained in the first frame of video image, the position area and edge of the moving target, and the edge of other objects.
步骤S308,针对第一视频中除第一帧以外的每帧视频图像,在当前视频图像的前一帧视频图像的拼缝搜索结果的基础上,增加预设约束条件,确定当前视频图像的拼缝搜索区域范围。Step S308, for each frame of video image in the first video except the first frame, on the basis of the patchwork search result of the previous frame of the current video image, add preset constraints to determine the patchwork of the current video image The range of the seam search area.
上述预设约束条件可以根据实际需求进行设置,比如,对于第一视频中除第一帧以外的每帧视频图像,该约束条件可以是在前一帧拼缝搜索结果的基础上,约束上下边缘的差距小于50个像素等,即约束当前帧视频图像的拼缝搜索结果与前一帧拼缝搜索结果的上下边缘的差距不能超过50个像素,基于该约束条件确定当前帧视频图像的拼缝搜索范 围。The above preset constraints can be set according to actual needs. For example, for each frame of video image in the first video except the first frame, the constraints can be based on the search results of the previous frame, constraining the upper and lower edges The gap is less than 50 pixels, etc., that is, the gap between the patchwork search result of the current frame video image and the upper and lower edges of the previous frame patchwork search result cannot exceed 50 pixels, and the patchwork of the current frame video image is determined based on this constraint search range.
步骤S310,在拼缝搜索区域范围内,基于当前视频图像的能量图,采用动态规划算法,确定当前视频图像的拼缝搜索结果。Step S310 , within the range of the patchwork search area, based on the energy map of the current video image, a dynamic programming algorithm is used to determine the patchwork search result of the current video image.
针对第一视频中除第一帧以外的每帧视频图像,当确定当前帧视频图像的拼缝搜索区域范围后,可以基于当前帧视频图像的能量图,采用动态规划算法,计算当前帧视频图像的拼缝搜索结果;该拼缝搜索结果通常会避开当前帧视频图像中所包含的显著性目标对象的位置区域和边缘,运动目标的位置区域和边缘,以及其他对象的边缘;通常是从相对静态的、能量较小的背景中,搜索到的当前帧视频图像与对应的目标图像的拼缝。另外,在第一视频中可能存在无法正常搜索到的视频图像,对于这类视频图像,可以采用步骤S306中,基于单帧视频图像搜索的方式,确定这类视频图像中每帧视频图像的拼缝搜索结果。For each frame of video image in the first video except the first frame, after determining the range of the patchwork search area of the current frame video image, based on the energy map of the current frame video image, a dynamic programming algorithm can be used to calculate the current frame video image The patchwork search result; the patchwork search result usually avoids the position area and edge of the salient target object contained in the current frame video image, the position area and edge of the moving target, and the edge of other objects; usually from In a relatively static background with low energy, the seam between the searched current frame video image and the corresponding target image. In addition, there may be video images that cannot be normally searched in the first video. For such video images, the method of searching based on single-frame video images in step S306 can be used to determine the collage of each frame of video images in such video images. search results.
上述视频图像的拼缝搜索方法,首先获取第一视频中每帧视频图像的显著性目标能量图、运动目标能量图和边缘能量图;针对每帧视频图像,融合该帧视频图像所对应的显著性目标能量图、运动目标能量图和边缘能量图,得到该帧视频图像的能量图;然后针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,采用动态规划算法,计算第一帧视频图像的拼缝搜索结果;针对第一视频中除第一帧以外的每帧视频图像,在当前视频图像的前一帧视频图像的拼缝搜索结果的基础上,增加预设约束条件,确定当前视频图像的拼缝搜索区域范围。最后在拼缝搜索区域范围内,基于当前视频图像的能量图,采用动态规划算法,确定当前视频图像的拼缝搜索结果。该方式基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。The patchwork search method for the above video image first obtains the salient target energy map, moving target energy map and edge energy map of each frame of video image in the first video; The energy map of the sexual target, the energy map of the moving target and the edge energy map are obtained to obtain the energy map of the frame video image; then, for the first frame video image in the first video, based on the energy map of the first frame video image, a dynamic programming algorithm is used, Calculate the patchwork search result of the first frame of video image; for each frame of video image in the first video except the first frame, on the basis of the patchwork search result of the previous frame of the current video image, add a preset Constraints to determine the range of the patchwork search area for the current video image. Finally, within the range of the patchwork search area, based on the energy map of the current video image, a dynamic programming algorithm is used to determine the patchwork search result of the current video image. This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
实施例四Embodiment four
本实施例提供了又一种视频图像的拼缝搜索方法,该方法在上述实施例方法的基础上实现,该方法中,第一视频中,每帧视频图像与该视频图像对应的目标图像具有重叠区域,重叠区域在视频图像中对应的区域为第一重叠区域,在目标图像中对应的区域为第二重叠区域;在实际实现时,第一视频中的每帧视频图像,与第二视频中所对应的目标图像之间具有重叠区域,上述第一重叠区域可以理解为,在第一视频中的每帧视频图像中,处于重叠区域的一部分图片对应的区域;上述第二重叠区域可以理解为,在第二视频中的每帧目标图像中,处于重叠区域的一部分图片对应的区域;如图4所示,该方法可以包括如下步骤:This embodiment provides yet another method for patchwork search of a video image, which is implemented on the basis of the method in the above-mentioned embodiment. In this method, in the first video, each frame of the video image has a corresponding target image of the video image. The overlapping area, the area corresponding to the overlapping area in the video image is the first overlapping area, and the corresponding area in the target image is the second overlapping area; in actual implementation, each frame of video image in the first video, and the second video There is an overlapping area between the target images corresponding to , and the above-mentioned first overlapping area can be understood as, in each frame of video image in the first video, the area corresponding to a part of the pictures in the overlapping area; the above-mentioned second overlapping area can be understood For, in each frame of the target image in the second video, the area corresponding to a part of the pictures in the overlapping area; as shown in Figure 4, the method may include the following steps:
步骤S402,获取第一视频中每帧视频图像的能量图;其中,能量图用于指示视频图像中指定对象的位置区域和边缘。Step S402, acquiring an energy map of each frame of video image in the first video; wherein, the energy map is used to indicate the location area and edge of a specified object in the video image.
步骤S404,针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果;其中,拼缝搜索结果包括:视频图像与目标图像的拼缝区域;目标图像为第二视频中与视频图像对应的视频图像。Step S404, for the first frame of video image in the first video, based on the energy map of the first frame of video image, determine the patchwork search result of the first frame of video image; wherein, the patchwork search result includes: the video image and the target image patchwork area; the target image is a video image corresponding to the video image in the second video.
步骤S406,针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果。Step S406, for each frame of video image in the first video except the first frame, based on the patchwork search result of the previous frame of the current video image, determine the range of the patchwork search area of the current video image; Within the area range, the patchwork search result of the current video image is determined based on the energy map of the current video image.
步骤S408,针对每帧视频图像,将该帧视频图像中第一重叠区域对应的图像,与该帧视频图像对应的目标图像中第二重叠区域对应的图像,输入至预先训练好的神经网络模型中,得到该帧视频图像的拼缝预测结果;其中,拼缝预测结果包括:视频图像与对应的目标图像的拼缝预测区域。Step S408, for each frame of video image, input the image corresponding to the first overlapping area in the frame of video image and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image to the pre-trained neural network model , the seam prediction result of the frame of video image is obtained; wherein, the seam prediction result includes: the seam prediction area of the video image and the corresponding target image.
上述预先训练好的神经网络模型可以通过多种卷积神经网络实现,如Unet网络、残差网络、VGG网络等,该预设神经网络模型可以是任意大小的卷积神经网络模型,比如,可以是resnet34_05x等;在实际实现时,可以基于上述执行过程,采用动态规划算法,确定第一视频中每帧视频图像的拼缝搜索结果,但是采用动态规划算法的耗时较长,执行效率较低,因此,可以基于神经网络的方式蒸馏上述拼缝搜索方式,通过神经网络的方式可以实现硬件加速,加快处理速度,提高处理效率。具体实现时,针对每帧视频图像,可以将该帧视频图像中第一重叠区域对应的图像,与该帧视频图像对应的目标图像中第二重叠区域对应的图像,输入至该预先训练好的神经网络模型中,通过该预先训练好的神经网络模型,就可以得到该帧视频图像与对应的目标图像的拼缝预测结果,该拼缝预测结果中包含该帧视频图像与对应的目标图像的拼缝预测区域。The above-mentioned pre-trained neural network model can be realized by a variety of convolutional neural networks, such as Unet network, residual network, VGG network, etc. The preset neural network model can be a convolutional neural network model of any size, for example, it can be It is resnet34_05x, etc.; in actual implementation, based on the above execution process, the dynamic programming algorithm can be used to determine the seam search results of each frame of the video image in the first video, but the dynamic programming algorithm takes a long time and the execution efficiency is low , therefore, the above patchwork search method can be distilled based on the neural network, and the hardware acceleration can be realized through the neural network, the processing speed can be accelerated, and the processing efficiency can be improved. During specific implementation, for each frame of video image, the image corresponding to the first overlapping area in the frame of video image and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image can be input to the pre-trained In the neural network model, through the pre-trained neural network model, the patchwork prediction result of the frame video image and the corresponding target image can be obtained, and the patchwork prediction result includes the frame of video image and the corresponding target image. Seam prediction area.
上述预先训练好的神经网络模型,通过下述步骤四至步骤九确定:The above-mentioned pre-trained neural network model is determined through the following steps 4 to 9:
步骤四,获取包含连续多组待拼接图像对的训练样本,以及每组待拼接图像对的拼缝搜索结果。Step 4: Obtain training samples containing consecutive sets of image pairs to be stitched, and seam search results for each set of image pairs to be stitched.
上述待拼接图像对中通常包含待拼接的两张图像;上述训练样本也可以称为多帧数据集,其中通常包含连续的多组待拼接图像对;具体的,可以采用下述方式构造该训练样本:首先获取单张图像,该单张图像可以以overlap_left(重 叠_左)表示,将该单张图像进行geometrictransform(几何变换)得到与该单张图像配对的待拼接图像,该待拼接图像可以以overlap_right(重叠_右)表示,该单张图像与待拼接图像组成一组待拼接图像对;然后基于geometrictransform生成序列图片,即生成多组待拼接图像对,以模拟视频场景,多组待拼接图像对中的overlap_right和overlap_left可以通过编号进行区分。上述geometrictransform通常包括对图像的随机平移或旋转操作等,所生成的多组待拼接图像对中的每个待拼接图像可能会有黑边。在实际实现时,当需要得到训练好的神经网络模型时,通常需要先获取到包含多组待拼接图像对的训练样本,以及每组待拼接图像对的拼缝搜索结果,其中,每组待拼接图像对的拼缝搜索结果可以采用上述相关步骤中的方式确定,将每组待拼接图像对的拼缝搜索结果作为GT(Ground Truth,表示有监督学习的训练集的分类准确性,用于证明或者推翻某个假设)训练初始神经网络以实现每帧视频图像的拼缝区域的预测。The above-mentioned image pairs to be spliced usually include two images to be spliced; the above-mentioned training samples can also be called multi-frame data sets, which usually contain multiple consecutive groups of image pairs to be spliced; specifically, the following methods can be used to construct the training Sample: First obtain a single image, which can be represented by overlap_left (overlap_left), perform geometrictransform (geometric transformation) on the single image to obtain an image to be stitched paired with the single image, and the image to be stitched can be Indicated by overlap_right (overlap_right), the single image and the image to be stitched form a group of image pairs to be stitched; then a sequence of pictures is generated based on geometrictransform, that is, multiple sets of image pairs to be stitched are generated to simulate a video scene, and multiple groups to be stitched The overlap_right and overlap_left in the image pair can be distinguished by number. The above geometrictransform usually includes random translation or rotation operations on images, etc., and each image to be stitched in the generated multiple sets of image pairs to be stitched may have black borders. In actual implementation, when it is necessary to obtain a well-trained neural network model, it is usually necessary to obtain training samples containing multiple sets of image pairs to be stitched, and the seam search results of each set of image pairs to be stitched, wherein each set of image pairs to be stitched The patchwork search result of the stitched image pair can be determined by the method in the above-mentioned related steps, and the patchwork search result of each group of image pairs to be stitched is used as GT (Ground Truth, representing the classification accuracy of the training set of supervised learning, used for Prove or overthrow a hypothesis) train the initial neural network to realize the prediction of the patchwork area of each frame of video image.
步骤五,获取预设拼缝模板;其中,预设拼缝模板中包括预设拼缝区域。Step 5, obtaining a preset patchwork template; wherein, the preset patchwork template includes a preset patchwork area.
上述预设拼缝模板中的预设拼缝区域可以设置为左半边为1、右半边为0的拼缝掩码,也可以设置为全为1的拼缝掩码;其中,拼缝掩码为0表示黑色,拼缝掩码为1表示白色;在实际实现时,多组待拼接图像对中的第一组待拼接图像对,可以理解为模拟视频中的第一帧overlap_right_1和第一帧overlap_left_1,对于第一组待拼接图像对来说,通常需要为该第一组待拼接图像对预先提供一个包含预设拼缝区域的预设拼缝模板。The preset patchwork area in the above preset patchwork template can be set as a patchwork mask whose left half is 1 and right half is 0, or can be set as a patchwork mask with all 1s; wherein, the patchwork mask 0 means black, and the patchwork mask is 1 means white; in actual implementation, the first group of image pairs to be stitched among multiple groups of image pairs to be stitched can be understood as the first frame overlap_right_1 and the first frame in the simulated video overlap_left_1, for the first group of image pairs to be stitched, it is usually necessary to provide a preset stitching template including a preset stitching area for the first set of image pairs to be stitched.
步骤六,针对第一组待拼接图像对,将第一组待拼接图像对和预设拼缝模板输入至初始神经网络模型中,以通过初始神经网络模型输出第一组待拼接图像对的拼缝预测结果。Step 6, for the first group of image pairs to be stitched, input the first group of image pairs to be stitched and the preset stitching template into the initial neural network model, so as to output the stitching of the first group of image pairs to be stitched through the initial neural network model Seam prediction results.
上述初始神经网络模型可以通过多种卷积神经网络实现,如残差网络、VGG网络等;在实际实现时,参见图5所示的一种神经网络模型训练过程示意图,对于第一组待拼接图像来说,当获取到上述预设拼缝模板后,可以将第一组待拼接图像对中的overlap_right_1、overlap_left_1和mask预设模板(对应预设拼缝模板)输入至NN(Neural Network,类神经网络),即初始神经网络模型中,以第一组待拼接图像对的拼缝搜索结果作为GT,输出seam_mask_left_1,即第一组的待拼接图像对的拼缝预测结果;该拼缝预测结果中包括:第一组的待拼接图像对中的overlap_right_1和overlap_left_1的拼缝预测区域。The above-mentioned initial neural network model can be realized through various convolutional neural networks, such as residual network, VGG network, etc.; in actual implementation, refer to the schematic diagram of a neural network model training process shown in Figure 5, for the first group to be spliced For images, after obtaining the above-mentioned preset patchwork templates, you can input the overlap_right_1, overlap_left_1 and mask preset templates (corresponding to the preset patchwork templates) in the first group of image pairs to be stitched into NN (Neural Network, class Neural network), that is, in the initial neural network model, the seam search result of the first group of image pairs to be stitched is used as GT, and seam_mask_left_1 is output, which is the seam prediction result of the first group of image pairs to be stitched; the seam prediction result includes: the patchwork prediction areas of overlap_right_1 and overlap_left_1 in the image pair to be stitched in the first group.
步骤七,针对除第一组待拼接图像对的每组待拼接图像对,将该组待拼接图像对,以及相邻的前一组待拼接图像对的拼缝预测结果,输入至初始神经网络模型中,以通过初始神经网络模型输出该组待拼接图像对的拼缝预测结果。Step 7: For each group of image pairs to be stitched except the first group of image pairs to be stitched, input the seam prediction results of the group of image pairs to be stitched and the adjacent previous group of image pairs to be stitched into the initial neural network In the model, the patchwork prediction results of the group of image pairs to be stitched are output through the initial neural network model.
在实际实现时,上述除第一组待拼接图像对的每组待拼接图像对,可以理解为包括模拟视频中的除第一帧overlap_right_1外的每帧overlap_right,以及除第一帧overlap_left_1外的每帧overlap_left;在实际实现时,对于除第一组待拼接图像对的每组待拼接图像对来说,可以将当前组的待拼接图像对中的overlap_right和overlap_left,以及相邻的前一组的待拼接图像对的拼缝预测结果,输入至初始神经网络模型中,以当前组的待拼接图像对的拼缝搜索结果作为GT,输出当前组的待拼接图像对的拼缝预测结果;该拼缝预测结果中包括:当前组的待拼接图像对中的overlap_right和overlap_left的拼缝预测区域;如图5所示,第二帧overlap_right_2、overlap_left_2,以及第一帧的seam_mask_left_1,即第一组的待拼接图像对的拼缝预测结果,输入至NN,即初始神经网络模型中,以第二组待拼接图像对的拼缝搜索结果作为GT,输出seam_mask_left_2,即第二组的待拼接图像对的拼缝预测结果;依次类推,输出除第一组待拼接图像对的每组待拼接图像对的拼缝预测结果。In actual implementation, each group of image pairs to be spliced except the first group of image pairs to be spliced above can be understood as including every frame overlap_right except the first frame overlap_right_1 in the analog video, and every frame except the first frame overlap_left_1 Frame overlap_left; in actual implementation, for each group of image pairs to be stitched except the first group of image pairs to be stitched, the overlap_right and overlap_left in the current group of image pairs to be stitched, and the adjacent previous group The seam prediction results of the image pairs to be stitched are input into the initial neural network model, and the seam search results of the image pairs to be stitched in the current group are used as GT, and the seam prediction results of the image pairs to be stitched in the current group are output; The seam prediction results include: the seam prediction areas of overlap_right and overlap_left in the image pair to be stitched in the current group; The seam prediction result of the stitched image pair is input to NN, that is, the initial neural network model, and the seam search result of the second group of image pairs to be stitched is used as GT, and seam_mask_left_2 is output, which is the seam_mask_left_2 of the second group of image pairs to be stitched. Seam prediction results; and so on, output the seam prediction results of each group of image pairs to be stitched except the first group of image pairs to be stitched.
步骤八,基于该组待拼接图像对的拼缝搜索结果和预设的损失函数,计算该组待拼接图像对的拼缝预测结果的损失值。Step 8: Calculate the loss value of the seam prediction result of the group of image pairs to be stitched based on the seam search results of the group of image pairs to be stitched and a preset loss function.
上述损失函数可以用于评价初始神经网络模型输出的拼缝预测结果与对应的真实的拼缝搜索结果不一致的程度,可以通过上述损失值表示该不一致的程度,该损失函数可以以Loss表示;在实际实现时,上述损失函数的设计方式可以如下:The above loss function can be used to evaluate the degree of inconsistency between the seam prediction results output by the initial neural network model and the corresponding real seam search results, and the degree of inconsistency can be represented by the above loss value, and the loss function can be represented by Loss; In actual implementation, the design method of the above loss function can be as follows:
loss=loss_continue*scale+loss_gt;loss=loss_continue*scale+loss_gt;
其中,loss_continue=Max(L1(mask_cur-mask_prev),margin);Among them, loss_continue=Max(L1(mask_cur-mask_prev),margin);
loss_gt=L1(mask_cur-mask_gt);loss_gt=L1(mask_cur-mask_gt);
上述loss_continue表示当前组待拼接图像对的输出与前一组待拼接图像对的输出之间的连续性损失;scale表示连续性损失权重占比,一般设置为0.1,loss_gt表示当前组待拼接图像对的输出与对应的GT之间计算得到的L1损失;mask_cur表示对当前组待拼接图像对进行预测得到的拼缝掩码,即对当前组待拼接图像对进行预测得到的拼缝预测结果;mask_prev表示对前一组待拼接图像对进行预测得到的拼缝掩码,即对前一组待拼接图像对进行预测得到的拼缝预测结果;margin为不一致性的允许阈值,一般设置为0.2;mask_gt为当前组待拼接图像对的预测拼缝真值,该预测拼缝真值即为当前组待拼接图像对的真实的拼缝搜索结果。The above loss_continue indicates the continuity loss between the output of the current group of image pairs to be stitched and the output of the previous group of image pairs to be stitched; scale indicates the weight ratio of the continuity loss, generally set to 0.1, and loss_gt indicates the current group of image pairs to be stitched The L1 loss calculated between the output of and the corresponding GT; mask_cur represents the patchwork mask obtained by predicting the current group of image pairs to be stitched, that is, the patchwork prediction result obtained by predicting the current group of image pairs to be stitched; mask_prev Indicates the patchwork mask obtained by predicting the previous group of image pairs to be stitched, that is, the patchwork prediction result obtained by predicting the previous group of image pairs to be stitched; margin is the allowable threshold of inconsistency, generally set to 0.2; mask_gt is the predicted true seam value of the current group of image pairs to be stitched, and the predicted true seam value is the real seam search result of the current group of image pairs to be stitched.
步骤九,基于损失值更新初始神经网络模型的权重参数;继续执行获取包含连续多组待拼接图像对的训练样本的 步骤,直至初始神经网络模型收敛,得到神经网络模型。Step 9: Update the weight parameters of the initial neural network model based on the loss value; continue to perform the step of obtaining training samples containing multiple consecutive sets of image pairs to be stitched until the initial neural network model converges to obtain the neural network model.
上述权重参数可以包括初始神经网络模型中的所有参数,如卷积核参数等,在对初始神经网络模型进行训练时,通常需要基于该组待拼接图像对的拼缝预测结果的损失值,更新初始神经网络模型中的所有参数,以对该初始神经网络模型进行训练。然后继续执行获取包含连续多组待拼接图像对的训练样本的步骤,直至初始神经网络模型收敛,或损失值收敛,最终得到训练完成的神经网络模型。The above weight parameters may include all parameters in the initial neural network model, such as convolution kernel parameters, etc. When training the initial neural network model, it is usually necessary to update the loss value based on the seam prediction results of the group of image pairs to be stitched. All parameters in the initial neural network model to train on. Then continue to perform the step of obtaining training samples including multiple sets of continuous image pairs to be stitched until the initial neural network model converges, or the loss value converges, and finally the trained neural network model is obtained.
上述视频图像的拼缝搜索方法,针对每帧视频图像,将该帧视频图像中第一重叠区域对应的图像,与该帧视频图像对应的目标图像中第二重叠区域对应的图像,输入至预先训练好的神经网络模型中,得到该帧视频图像的拼缝预测结果;并且还公开了该神经网络模型的训练过程,通过该神经网络对拼缝区域进行预测,可以加快对拼缝区域进行预测的处理速度,提高处理效率。The patchwork search method of the above-mentioned video image, for each frame of video image, the image corresponding to the first overlapping area in the frame of video image, and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image, input to the pre-set In the trained neural network model, the patchwork prediction result of the frame video image is obtained; and the training process of the neural network model is also disclosed, and the patchwork area is predicted through the neural network, which can speed up the prediction of the patchwork area processing speed and improve processing efficiency.
实施例五Embodiment five
本实施例提供了一种视频图像的拼接方法,如图6所示,该方法包括如下步骤:This embodiment provides a method for splicing video images, as shown in Figure 6, the method includes the following steps:
步骤S602,获取第一鱼眼视频和第二鱼眼视频;其中,第一鱼眼视频和第二鱼眼视频中的鱼眼视频图像具有重叠区域。Step S602, acquiring the first fisheye video and the second fisheye video; wherein, the fisheye video images in the first fisheye video and the second fisheye video have overlapping areas.
鱼眼镜头是一种短焦距、大视场的特殊摄像镜头,其视场角可以接近或超过180度;在实际实现时,可以在同一场景下,在同一时间,通过第一鱼眼镜头拍摄第一鱼眼视频,通过第二鱼眼镜头拍摄第二鱼眼视频,第一鱼眼镜头和第二鱼眼镜头对应的视野通常不同,并且第一鱼眼视频和第二鱼眼视频就有重叠视场,由于第一鱼眼镜头和第二鱼眼镜头同时拍摄,因此,相对应的第一鱼眼视频和第二鱼眼视频中的鱼眼视频图像通常具有具有重叠区域,比如,第一鱼眼视频中的第一帧鱼眼视频图像与第二鱼眼视频中的第一帧鱼眼视频图像具有重叠区域,第一鱼眼视频中的第二帧鱼眼视频图像与第二鱼眼视频中的第二帧鱼眼视频图像具有重叠区域,以此类推。The fisheye lens is a special camera lens with a short focal length and a large field of view. Its field of view can be close to or exceed 180 degrees; The first fisheye video, the second fisheye video is shot through the second fisheye lens, the field of view corresponding to the first fisheye lens and the second fisheye lens are usually different, and the first fisheye video and the second fisheye video have Overlapping fields of view, since the first fisheye lens and the second fisheye lens are shot at the same time, therefore, the fisheye video images in the corresponding first fisheye video and the second fisheye video usually have an overlapping area, for example, the first fisheye video The first frame fisheye video image in a fisheye video and the first frame fisheye video image in the second fisheye video have an overlapping area, and the second frame fisheye video image in the first fisheye video and the second fisheye video The second fisheye video image in the eye video has overlapping regions, and so on.
步骤S604,提取第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及第二鱼眼视频中每帧鱼眼视频图像的第二目标区域。Step S604, extracting the first target area of each frame of fisheye video image in the first fisheye video, and the second target area of each frame of fisheye video image in the second fisheye video.
上述目标区域可以理解为鱼眼视频图像中的有效区域,即包含被拍摄对象的区域;在实际实现时,通过鱼眼镜头采集的原始的鱼眼视频图像一般为方形或长方形,在该方形或长方形区域中存在一圆形区域,对应的是鱼眼视频图像的有效区域,该方形或长方形区域中除圆形区域以外的区域通常为黑色背景区域,在获取到第一鱼眼视频和第二鱼眼视频后,需要提取出第一鱼眼视频中每帧鱼眼视频图像的圆形区域对应的第一目标区域,以及提取出第二鱼眼视频中每帧鱼眼视频图像的圆形区域对应的第二目标区域;具体的,可以以预设像素值作为阈值,过滤鱼眼视频图像中的黑色背景区域,比如,可以以像素值20作为阈值等,以提取出鱼眼视频图像的圆形有效区域。The above-mentioned target area can be understood as the effective area in the fisheye video image, that is, the area containing the object to be photographed; in actual implementation, the original fisheye video image collected by the fisheye lens is generally square or rectangular. There is a circular area in the rectangular area, which corresponds to the effective area of the fisheye video image. The area other than the circular area in the square or rectangular area is usually a black background area. After the first fisheye video and the second After the fisheye video, it is necessary to extract the first target area corresponding to the circular area of each frame of fisheye video image in the first fisheye video, and extract the circular area of each frame of fisheye video image in the second fisheye video The corresponding second target area; specifically, the preset pixel value can be used as the threshold to filter the black background area in the fisheye video image, for example, the pixel value of 20 can be used as the threshold, etc., to extract the circle of the fisheye video image shape effective area.
步骤S606,针对相互对应的两帧鱼眼视频图像,基于两帧鱼眼视频图像对应的第一目标区域和第二目标区域,以及预先获取到的更新展开参数值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片。Step S606, for the two corresponding fisheye video images, based on the first target area and the second target area corresponding to the two fisheye video images, and the pre-acquired update expansion parameter value, determine The first equidistant projection picture after the frame of fisheye video image is expanded, and the second equidistant projection picture of the frame of fisheye video image in the second fisheye video after expansion.
上述等距投影图片也可以称为等距圆柱投影图片;上述更新展开参数通常包括与鱼眼视频图像展开相关的鱼眼镜头参数等,比如,可以包括视场角参数值、光心的相关参数值或鱼眼旋转角参数值等;在实际实现时,将鱼眼视频图像展开为等距投影图片的方式通常如下:首先将提取出目标区域,即提取出圆形有效区域后的鱼眼视频图像,从二维鱼眼坐标转换为三维坐标,再将该提取圆形有效区域后的鱼眼视频图像的球面坐标按经纬映射展开,得到该鱼眼视频图像对应的展开后的等距投影图片。The above-mentioned equidistant projection picture can also be called equidistant cylindrical projection picture; the above-mentioned update expansion parameters usually include fisheye lens parameters related to fisheye video image expansion, for example, can include field angle parameter values, related parameters of optical center value or fisheye rotation angle parameter value, etc.; in actual implementation, the way to expand the fisheye video image into an equidistant projection picture is usually as follows: first, extract the target area, that is, the fisheye video after extracting the circular effective area The image is converted from two-dimensional fisheye coordinates to three-dimensional coordinates, and then the spherical coordinates of the fisheye video image after extracting the circular effective area are expanded according to latitude and longitude mapping, and the expanded equidistant projection picture corresponding to the fisheye video image is obtained .
下面对鱼眼视频图像展开为等距投影图片的过程作进一步说明,参见图7所示的一种等距投影图片展开方式示意图,在鱼眼视频图像的目标区域对应的Normalised fisheye coordinates(归一化鱼眼坐标)中,坐标位置为(x,y)的像素点为例,坐标位置为(x,y)的像素点距离该坐标系中原点的距离为r,与x轴方向的夹角为θ,通过转换公式φ=r aperture/2和θ=atan2(y,x),将提取出圆形有效区域后的鱼眼视频图像从归一化鱼眼坐标转换为三维坐标,即图7中,从2D fisheye to 3D vector(二维鱼眼坐标到三维向量)的过程,归一化鱼眼坐标系中的像素点(x,y)对应转换到三维坐标系中的P(x,y,z),然后通过转换公式longitude=atan2(P y,P x),latitude=atan2(P z,√(P x 2+P y 2)),得到相应的经度值和纬度值,即图7中,从3D vector to longitude/latitude(三维向量得到经度值/纬度值)的过程;再通过x=longitude/π和y=2latitude/π,将三维坐标系按照经纬度映射展开,得到鱼眼视频图像的目标区域中,坐标位置为(x,y)的像素点在等距投影图片中的对应位置,即图7中,从3D vector to 2D equirectangular(三维向量得到二维等距投影图片)的过程;基于目标区域中每个像素点在等距投影图片中的对应位置以及像素值,得到该鱼眼视频图像对应的等距投影图片。 The process of expanding the fisheye video image into an equidistant projection picture will be further described below. Referring to the schematic diagram of an equidistant projection picture expansion method shown in FIG. 7, the Normalized fisheye coordinates corresponding to the target area of the fisheye video image In the unified fisheye coordinates), the pixel point whose coordinate position is (x, y) is taken as an example, the distance between the pixel point whose coordinate position is (x, y) and the origin in the coordinate system is r, and the distance between the pixel point and the x-axis direction The angle is θ, through the conversion formula φ=r aperture/2 and θ=atan2(y, x), the fisheye video image after extracting the circular effective area is converted from normalized fisheye coordinates to three-dimensional coordinates, that is, In 7, in the process from 2D fisheye to 3D vector (two-dimensional fisheye coordinates to three-dimensional vector), the pixel point (x, y) in the normalized fisheye coordinate system corresponds to P(x, y) in the three-dimensional coordinate system y, z), and then by converting the formula longitude=atan2(P y , P x ), latitude=atan2(P z , √(P x 2 +P y 2 )), the corresponding longitude and latitude values are obtained, that is, In 7, the process from 3D vector to longitude/latitude (the longitude value/latitude value obtained from the three-dimensional vector); and then through x=longitude/π and y=2latitude/π, the three-dimensional coordinate system is expanded according to the longitude and latitude mapping, and the fisheye video is obtained In the target area of the image, the corresponding position of the pixel point whose coordinate position is (x, y) in the equidistant projection picture, that is, in Fig. Process: Obtain the equidistant projection picture corresponding to the fisheye video image based on the corresponding position and pixel value of each pixel in the target area in the equidistant projection picture.
由上述过程可知,可以将提取目标区域后的鱼眼视频图像输入到相应的等距投影展开模块中,即可输出等距投影 图片,上述转换公式中需要使用鱼眼镜头视场角aperture这一参数,即已知鱼眼镜头的视场角后,就可以依据上述转换公式进行展开;上述公式中的r值可以根据(x,y)的坐标值确定,具体可以参考相关技术中的确定方式,在此不再赘述。以上展开方式可以计算得到图片变换的remap参数,即上述转换公式表示的坐标的映射关系,由于remap参数是采用逐像素串行处理方式,因此,处理过程比较耗时,处理效率比较低,基于分块warp进行处理能够提升硬件实现的效率,其中,分块warp是对鱼眼视频图像的目标区域进行分块,可以对所得到的多个分块后的小图像并行处理,因此可以提高处理效率。It can be seen from the above process that the fisheye video image after extracting the target area can be input into the corresponding equidistant projection expansion module, and the equidistant projection image can be output. The above conversion formula needs to use the fisheye lens field of view aperture Parameters, that is, after the field of view of the fisheye lens is known, it can be expanded according to the above conversion formula; the value of r in the above formula can be determined according to the coordinate value of (x, y), for details, refer to the determination method in related technologies , which will not be repeated here. The above expansion method can calculate the remap parameter of the image transformation, that is, the mapping relationship of the coordinates expressed by the above conversion formula. Since the remap parameter is processed pixel by pixel serially, the processing process is time-consuming and the processing efficiency is relatively low. The processing of block warp can improve the efficiency of hardware implementation. Among them, block warp is to block the target area of fisheye video image, and can process the obtained multiple block small images in parallel, so the processing efficiency can be improved. .
相应的,也可以将等距投影图片反向转换为鱼眼视频图像的目标区域,如图7所示,具体的,可以通过转换公式longitude=xπ,latitude=yπ/2,P x=cos(latitude)cos(longitude),P y=cos(latitude)sin(longitude),P z=sin(latitude),r=2atan2(√(P x 2+P z2),Py)/aperture,θ=atan2(P z,P x),x=rcos(θ)和y=rsin(θ),实现将等距投影图片转换为鱼眼视频图像的目标区域的过程。 Correspondingly, the equidistant projection picture can also be reverse-converted into the target area of the fisheye video image, as shown in Figure 7, specifically, the conversion formula longitude=xπ, latitude=yπ/2, P x =cos( latitude)cos(longitude), P y =cos(latitude)sin(longitude), P z =sin(latitude), r=2atan2(√(P x 2 +P z 2),Py)/aperture, θ=atan2 (P z , P x ), x=rcos(θ) and y=rsin(θ), realize the process of converting the equidistant projection picture into the target area of the fisheye video image.
在实际实现时,由于鱼眼相机的制造及安装存在工艺误差,为避免这些误差对等距投影图片展开过程的影响,需要对相应的展开参数,如视场角参数值、光心的相关参数值或鱼眼旋转角参数值等进行更新,得到更新展开参数;然后针对相互对应的两帧鱼眼视频图像,基于两帧鱼眼视频图像分别对应的第一目标区域和第二目标区域,以及该更新展开参数,参照上述等距投影图片展开方式,将第一目标区域展开为第一等距投影图片,将第二目标区域展开为第二等距投影图片。In actual implementation, due to process errors in the manufacture and installation of fisheye cameras, in order to avoid the impact of these errors on the development process of equidistant projection images, it is necessary to adjust the corresponding expansion parameters, such as the field of view parameter value and the related parameters of the optical center. value or fisheye rotation angle parameter value, etc. are updated to obtain updated expansion parameters; then for two corresponding fisheye video images, based on the first target area and the second target area corresponding to the two frames of fisheye video images respectively, and The updated expansion parameters refer to the above-mentioned equidistant projection picture expansion method, and expand the first target area into the first isometric projection picture, and expand the second target area into the second isometric projection picture.
步骤S608,基于相互对应的第一等距投影图片和第二等距投影图片,确定拼缝搜索结果;其中,拼缝搜索结果采用前述实施例中的方法确定。Step S608, based on the corresponding first equidistant projection picture and the second equidistant projection picture, determine the patchwork search result; wherein, the patchwork search result is determined by the method in the foregoing embodiments.
在实际实现时,当获取到上述第一等距投影图片和第二等距投影图片后,可以基于相互对应的第一等距投影图片和第二等距投影图片,确定对应的拼缝搜索结果;比如,基于第一鱼眼视频中第一帧鱼眼视频图像展开后的第一等距投影图片,与第二鱼眼视频中第一帧鱼眼视频图像展开后的第二等距投影图片,可以按照前述实施例中确定拼缝搜索结果的方案,确定该第一等距投影图片和第二等距投影图片的拼缝搜索结果。In actual implementation, after the first equidistant projection picture and the second equidistant projection picture are obtained, the corresponding patchwork search result can be determined based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other ; For example, based on the first equidistant projection picture after the expansion of the first frame of fisheye video image in the first fisheye video, and the second equidistant projection picture after the expansion of the first frame of fisheye video image in the second fisheye video The seam search results of the first equidistant projection picture and the second equidistant projection picture may be determined according to the solutions for determining the seam search results in the foregoing embodiments.
步骤S610,基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定视频图像的视频拼接结果。Step S610, based on the seam search results corresponding to each group of two corresponding fisheye video images, determine the video stitching results of the video images.
在实际实现时,可以按照上述步骤确定每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,基于所得到的多个拼缝搜索结果,可以确定视频图像的视频拼接结果。In actual implementation, the seam search results corresponding to each group of two corresponding fisheye video images can be determined according to the above steps, and the video stitching results of the video images can be determined based on the obtained multiple seam search results.
上述视频图像的拼接方法,首先获取第一鱼眼视频和第二鱼眼视频;提取第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及第二鱼眼视频中每帧鱼眼视频图像的第二目标区域。然后针对相互对应的两帧鱼眼视频图像,基于两帧鱼眼视频图像对应的第一目标区域和第二目标区域,以及预先获取到的更新展开参数值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片。最后基于相互对应的第一等距投影图片和第二等距投影图片,确定拼缝搜索结果;基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定视频图像的视频拼接结果。该方式基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。The splicing method of the above-mentioned video images first obtains the first fisheye video and the second fisheye video; The second target area of the eye video image. Then, for the corresponding two frames of fisheye video images, based on the first target area and the second target area corresponding to the two frames of fisheye video images, and the pre-acquired update expansion parameter value, determine the frame in the first fisheye video The first equidistant projection picture after the fisheye video image is expanded, and the second equidistant projection picture after the frame of the fisheye video image in the second fisheye video is expanded. Finally, based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other, determine the patchwork search result; based on the patchwork search results corresponding to each group of two frames of fisheye video images corresponding to each other, determine the video splicing of the video image result. This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
实施例六Embodiment six
本实施例提供了另一种视频图像的拼接方法,该方法在上述实施例方法的基础上实现,该方法包括如下步骤:This embodiment provides another method for splicing video images, which is implemented on the basis of the method in the above-mentioned embodiment, and the method includes the following steps:
步骤802,获取第一鱼眼视频和第二鱼眼视频;其中,第一鱼眼视频和第二鱼眼视频中的鱼眼视频图像具有重叠区域。Step 802, acquire the first fisheye video and the second fisheye video; wherein, the fisheye video images in the first fisheye video and the second fisheye video have overlapping areas.
步骤804,提取第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及第二鱼眼视频中每帧鱼眼视频图像的第二目标区域。Step 804, extract the first target area of each frame of fisheye video image in the first fisheye video, and the second target area of each frame of fisheye video image in the second fisheye video.
步骤806,针对相互对应的两帧鱼眼视频图像,基于两帧鱼眼视频图像对应的第一目标区域和第二目标区域,以及预先获取到的更新展开参数值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片。Step 806, for the two corresponding fisheye video images, based on the first target area and the second target area corresponding to the two fisheye video images, and the pre-acquired update expansion parameter value, determine The first equidistant projection picture after the frame of fisheye video image is expanded, and the second equidistant projection picture of the frame of fisheye video image in the second fisheye video after expansion.
上述更新展开参数值包括:视场角参数值、光心在x轴方向的参数值、光心在y轴方向的参数值和鱼眼旋转角参数值;该视场角参数值可以是鱼眼镜头的已知标准视场角与视场角偏移量的结合,其中,视场角偏移量可以以aperture_shift表示,比如,鱼眼镜头的已知标准视场角为190度,由于鱼眼相机的制造和安装工艺误差,实际的视场角可能会存在偏移,预估其视场角偏移量为+5度,则该视场角参数值为190+5=195度;上述光心在x轴方向的参数值、光心在y轴方向的参数值可以理解为,考虑光心偏移量后的光心在x轴方向和y轴方向的坐标,其中,光心在x轴方 向的偏移量可以以x_shift表示,光心在y轴方向的偏移量可以以y_shift表示;上述鱼眼旋转角参数值可以理解为考虑鱼眼旋转角偏移量后的鱼眼旋转角,其中,鱼眼旋转角偏移量可以以rotation_shift表示;该更新展开参数值预先通过下述步骤十一至步骤十五确定:The above update expansion parameter value includes: field angle parameter value, parameter value of optical center in x-axis direction, parameter value of optical center in y-axis direction and fisheye rotation angle parameter value; the field angle parameter value can be fisheye The combination of the known standard angle of view of the lens and the offset of the angle of view, where the offset of the angle of view can be represented by aperture_shift, for example, the known standard angle of view of the fisheye lens is 190 degrees, because the fisheye Due to the manufacturing and installation process errors of the camera, the actual field of view may be shifted, and the estimated field of view shift is +5 degrees, so the value of the field of view parameter is 190+5=195 degrees; the above light The parameter value of the center in the x-axis direction and the parameter value of the optical center in the y-axis direction can be understood as the coordinates of the optical center in the x-axis direction and the y-axis direction after considering the optical center offset, where the optical center is in the x-axis The offset of the direction can be expressed by x_shift, and the offset of the optical center in the y-axis direction can be expressed by y_shift; the above fisheye rotation angle parameter value can be understood as the fisheye rotation angle after considering the fisheye rotation angle offset, Among them, the fisheye rotation angle offset can be represented by rotation_shift; the value of the updated expansion parameter is determined in advance through the following steps 11 to 15:
步骤十一,获取每个展开参数的初始展开参数值和预设偏移范围。Step eleven, obtaining the initial expansion parameter value and preset offset range of each expansion parameter.
上述初始展开参数值通常包括视场角初始参数值、光心在x轴方向的初始参数值、光心在y轴方向的初始参数值和鱼眼旋转角初始参数值,初始展开参数值通常可以从鱼眼相机给定的相机参数中获取;上述预设偏移范围通常包括每个展开参数对应的偏移范围,该预设偏移范围通常是技术人员根据经验值对每个展开参数预估的偏移范围;为方便说明,以双目鱼眼相机为例,在实际实现时,对于已有的双目鱼眼相机来说,可以从该鱼眼相机给定的相机参数中获取上述四种展开参数分别对应的初始展开参数值,根据经验为每种展开参数设定合适的偏移范围,该偏移范围一般不宜过大,比如,视场角的初始展开参数值为190度,根据经验设定其对应的预设偏移范围为±10度等。The above-mentioned initial expansion parameter values usually include the initial parameter value of the field of view, the initial parameter value of the optical center in the x-axis direction, the initial parameter value of the optical center in the y-axis direction, and the initial parameter value of the fisheye rotation angle. The initial expansion parameter value can usually be Obtained from the camera parameters given by the fisheye camera; the above preset offset range usually includes the offset range corresponding to each expansion parameter, and the preset offset range is usually estimated by the technician for each expansion parameter based on empirical values For the convenience of description, take a binocular fisheye camera as an example. In actual implementation, for an existing binocular fisheye camera, the above four parameters can be obtained from the given camera parameters of the fisheye camera. Each expansion parameter corresponds to the initial expansion parameter value. According to experience, set an appropriate offset range for each expansion parameter. Generally, the offset range should not be too large. For example, the initial expansion parameter value of the field of view is 190 degrees. According to It is empirically set that the corresponding preset offset range is ±10 degrees or the like.
步骤十二,基于每个展开参数的初始展开参数值和预设偏移范围,对每个展开参数进行采样,得到每个展开参数的采样值,基于每个展开参数的采样值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第三等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第四等距投影图片。Step 12: Sampling each expansion parameter based on the initial expansion parameter value and preset offset range of each expansion parameter to obtain the sampling value of each expansion parameter, and determining the first A third equidistant projection picture of the frame of fisheye video image in the fisheye video after expansion, and a fourth equidistant projection picture of the frame of fisheye video image in the second fisheye video after expansion.
在实际实现时,当获取到每个展开参数的初始展开参数值和预设偏移范围后,可以针对上述四个展开参数,结合每个展开参数的初始展开参数值和预设偏移范围进行随机采样,四个展开参数中,每个展开参数取一个采样值,组成一组采样值,比如,四个展开参数中,每个展开参数的偏移量都有两个可选的值,分别为0和1,则四个展开参数的采样值共有2 4种组合方式,即有16种组合方式,每种组合方式对应一组采样值;基于每一组采样值,对第一鱼眼视频中,已提取目标区域的每帧鱼眼视频图像进行展开,得到展开后的第三等距投影图片,对第二鱼眼视频中,已提取目标区域的每帧鱼眼视频图像进行展开,得到展开后的第四等距投影图片。 In actual implementation, after obtaining the initial expansion parameter value and preset offset range of each expansion parameter, the above four expansion parameters can be combined with the initial expansion parameter value and preset offset range of each expansion parameter. Random sampling. Among the four expansion parameters, each expansion parameter takes a sampling value to form a set of sampling values. For example, among the four expansion parameters, the offset of each expansion parameter has two optional values, respectively is 0 and 1, then there are 24 combinations of the sampling values of the four expansion parameters, that is, there are 16 combinations, and each combination corresponds to a set of sampling values; based on each set of sampling values, the first fisheye video In the second fisheye video, each frame of the fisheye video image of the extracted target area is expanded to obtain the third expanded equidistant projection picture, and the second fisheye video is expanded to obtain the fisheye video image of each frame of the extracted target area Unfolded image of the fourth isometric projection.
步骤十三,提取第三等距投影图片和第四等距投影图片的第四重叠区域。Step 13, extracting a fourth overlapping region of the third equidistant projection picture and the fourth equidistant projection picture.
对展开后的相互对应的第三等距投影图片和第四等距投影图片提取第四重叠区域,该第四重叠区域通常包括:该第四重叠区域在第三等距投影图片中所对应的部分图片区域,以及在第四等距投影图片中所对应的部分图片区域。A fourth overlapping area is extracted from the expanded corresponding third equidistant projection picture and fourth equidistant projection picture, and the fourth overlapping area generally includes: the fourth overlapping area corresponding to the third equidistant projection picture A partial picture area, and a corresponding partial picture area in the fourth equidistant projection picture.
步骤十四,对第四重叠区域进行互相关计算,得到第一互相关计算结果。Step fourteen, perform cross-correlation calculation on the fourth overlapping area to obtain the first cross-correlation calculation result.
在提取出上述第四重叠区域后,可以对该第四重叠区域中包含的在第三等距投影图片中所对应的部分图片区域,以及在第四等距投影图片中所对应的部分图片区域进行互相关计算,得到第一互相关计算结果,该第一互相关计算结果可以反映这两个部分图片区域之间相似性的量度。参见图8所示的一种互相关计算示意图,图8包括f和g两个信号,f*g表示两个信号的互相关结算结果,可以看出,互相关计算结果的数据越大,通常表示两个部分图片区域越相似,反之,互相关计算结果的数值越小,通常表示两个部分图片区域越不相似。After the above-mentioned fourth overlapping area is extracted, the partial picture area corresponding to the third equidistant projection picture contained in the fourth overlapping area and the corresponding partial picture area in the fourth equidistant projection picture can be A cross-correlation calculation is performed to obtain a first cross-correlation calculation result, and the first cross-correlation calculation result can reflect a measure of the similarity between the two partial picture regions. Referring to a cross-correlation calculation schematic diagram shown in Figure 8, Figure 8 includes two signals f and g, f*g represents the cross-correlation settlement result of the two signals, it can be seen that the larger the data of the cross-correlation calculation result, usually Indicates that the two partial image regions are more similar, and conversely, the smaller the value of the cross-correlation calculation result, usually indicates that the two partial image regions are less similar.
步骤十五,基于第一互相关计算结果和预设迭代次数,确定更新展开参数值。Step fifteen, based on the first cross-correlation calculation result and the preset number of iterations, determine and update the expansion parameter value.
上述预设迭代次数可以根据实际需求进行设定,比如,可以设定预设迭代次数为10000次等,在实际实现时,可以基于上述第一互相关计算结果,以及预先设置好的预设迭代次数,确定上述更新展开参数。The above-mentioned preset number of iterations can be set according to actual needs. For example, the preset number of iterations can be set to 10,000 times. In actual implementation, it can be based on the above-mentioned first cross-correlation calculation results and the preset preset iterations The number of times to determine the above-mentioned update expansion parameters.
具体的,该步骤十五可以通过下述步骤A至步骤C来实现:Specifically, this step fifteen can be achieved through the following steps A to C:
步骤A,按预设迭代次数,重复执行基于每个展开参数的初始展开参数值和预设偏移范围,对每个展开参数进行采样的步骤,得到多个第一互相关计算结果。In step A, the step of sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter is repeatedly performed according to a preset number of iterations to obtain a plurality of first cross-correlation calculation results.
为方便理解,假设预设迭代次数为10000次,则将基于每个展开参数的初始展开参数值和预设偏移范围,对每个展开参数进行采样的步骤,重复执行10000次,这样就可以得到10000个第一互相关计算结果。For the convenience of understanding, assuming that the preset number of iterations is 10,000, the step of sampling each expansion parameter will be repeated 10,000 times based on the initial expansion parameter value and the preset offset range of each expansion parameter, so that 10,000 first cross-correlation calculation results are obtained.
步骤B,从多个第一互相关计算结果中,选取数值最大的第一互相关计算结果。Step B, selecting the first cross-correlation calculation result with the largest value from the multiple first cross-correlation calculation results.
由于第一互相关计算结果的数值越大,表示两个部分图片区域之间相似性越高,相应的展开参数的采样值更合适,因此,可以从计算得到的多个第一互相关计算结果中,选取出数值最大的第一互相关计算结果。Since the larger the value of the first cross-correlation calculation result, the higher the similarity between the two partial picture regions, and the more appropriate sampling value of the corresponding expansion parameter, therefore, the multiple first cross-correlation calculation results obtained from the calculation Among them, the first cross-correlation calculation result with the largest value is selected.
步骤C,将数值最大的第一互相关计算结果对应的展开参数的采样值确定为更新展开参数值。Step C, determining the sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the largest value as the updated expansion parameter value.
数值最大的第一互相关计算结果表示两个部分图片区域之间的相似性最高,因此,其对应的展开参数的采样值最合适,可以将该展开参数的采样值确定为上述更新展开参数值。The first cross-correlation calculation result with the largest value indicates the highest similarity between the two partial image regions, therefore, the corresponding sampling value of the expansion parameter is the most appropriate, and the sampling value of the expansion parameter can be determined as the above-mentioned updated expansion parameter value .
上述离线自适应展开参数优化过程可以使算法自适应地得到双目鱼眼相机的最优展开参数,对于装配固定的双目鱼眼相机,该离线计算过程只需要进行一次。The above-mentioned offline adaptive deployment parameter optimization process can make the algorithm adaptively obtain the optimal deployment parameters of the binocular fisheye camera. For a fixed binocular fisheye camera, the offline calculation process only needs to be performed once.
步骤808,将相互对应的第一等距投影图片和第二等距投影图片进行对齐。Step 808, align the corresponding first equidistant projection picture and the second equidistant projection picture.
由于离线优化获取的更新展开参数一般只能对某一固定景深的场景良好对齐,在景深变化明显的情况下需要对图片进行动态对齐,动态对齐的方式可以包括两种,一种是基于特征点匹配的动态对齐方式,另一种是基于互相关最大 化的affine参数搜索的方式;下面分别对两种实现方式进行介绍;Since the updated expansion parameters obtained by offline optimization can only be well aligned for a scene with a certain fixed depth of field, it is necessary to dynamically align the images when the depth of field changes significantly. There are two ways of dynamic alignment, one is based on feature points The matching dynamic alignment method, and the other is the affine parameter search method based on cross-correlation maximization; the following two implementation methods are introduced respectively;
上述基于特征点匹配的动态对齐方式可以通过下述步骤二十至步骤二十二来实现:The above-mentioned dynamic alignment method based on feature point matching can be realized through the following steps 20 to 22:
步骤二十,从第一等距投影图片中提取第一特征点,从第二等距投影图片中提取第二特征点。 Step 20, extracting a first feature point from the first equidistant projection picture, and extracting a second feature point from the second equidistant projection picture.
为了能够更好的进行图片对齐,通常需要在等距投影图片中选择具有代表性的区域,如等距投影图片中的角点、边缘点、暗区的亮点或亮区的暗点等,上述第一特征点可以是从第一等距投影图片中提取出的角点、边缘点、暗区的亮点或亮区的暗点等;上述第二特征点可以是从第二等距投影图片中提取的角点、边缘点、暗区的亮点或亮区的暗点等;在实际实现时,可以基于sift(Scale-Invariant Feature Transform,尺度不变特征变换)等方式提取上述等距投影图片的特征点;其中,sift是用于图像处理领域的一种描述,这种描述具有尺度不变性,可在图像中检测出关键点,是一种局部特征描述子。In order to better align the pictures, it is usually necessary to select representative areas in the equidistant projection pictures, such as corner points, edge points, bright spots in dark areas or dark spots in bright areas in equidistant projection pictures, etc., the above The first feature point can be a corner point, an edge point, a bright spot in a dark area or a dark point in a bright area etc. extracted from the first equidistant projection picture; the above-mentioned second feature point can be extracted from the second equidistant projection picture The extracted corner points, edge points, bright spots in dark areas or dark points in bright areas, etc.; in actual implementation, the above equidistant projection images can be extracted based on sift (Scale-Invariant Feature Transform, scale-invariant feature transformation) and other methods Feature points; Among them, sift is a description used in the field of image processing. This description is scale-invariant and can detect key points in the image. It is a local feature descriptor.
步骤二十一,基于第一特征点和第二特征点,确定匹配特征点对。Step 21: Determine matching feature point pairs based on the first feature point and the second feature point.
当通过上述步骤提取到第一等距投影图片中的第一特征点和第二等距投影图片中的第二特征点后,可以采用分块的方式对第一特征点和第二特征点进行特征点匹配,参见图9所示的一种图片对齐示意图,图9中,img1对应上述第一等距投影图片,img2对应上述第二等距投影图片,分块的具体方式可以为img1的左边半部分与img2的右半部分进行特征点匹配,img1的右半部分与img2的左半部分进行特征点匹配,得到相应的匹配特征点对,基于得到的匹配特征点对将img1和img2进行对齐,对齐的方式可以是使img1和img2在w维度上的匹配特征点差距最小,这样可以使拼接后的重叠区域尽量对齐。比如,可以通过计算两组特征点的128维的关键点的欧式距离实现,欧式距离越小,则相似度越高,当欧式距离小于设定的阈值时,可以判定为匹配成功。After the first feature point in the first equidistant projection picture and the second feature point in the second equidistant projection picture are extracted through the above steps, the first feature point and the second feature point can be processed in a block manner. For feature point matching, refer to a picture alignment schematic diagram shown in Figure 9. In Figure 9, img1 corresponds to the above-mentioned first equidistant projection picture, and img2 corresponds to the above-mentioned second equidistant projection picture. The specific method of block can be the left side of img1 The half part matches the feature points with the right half of img2, and the right half of img1 matches the feature points with the left half of img2 to obtain the corresponding matching feature point pairs. Based on the obtained matching feature point pairs, img1 and img2 are aligned. , the alignment method can be to minimize the gap between the matching feature points of img1 and img2 in the w dimension, so that the overlapping areas after splicing can be aligned as much as possible. For example, it can be realized by calculating the Euclidean distance of the 128-dimensional key points of two groups of feature points. The smaller the Euclidean distance, the higher the similarity. When the Euclidean distance is smaller than the set threshold, it can be determined that the matching is successful.
步骤二十二,基于匹配特征点对,将第一等距投影图片和第二等距投影图片进行对齐。Step 22: Align the first equidistant projection picture and the second equidistant projection picture based on the matching feature point pairs.
基于上述确定好的匹配特征点对,将第一等距投影图片和第二等距投影图片进行对齐。对每个相互对应的第一等距投影图片和第二等距投影图片都采用基于特征点匹配的方式进行动态对齐处理。Based on the determined matching feature point pairs, the first equidistant projection picture and the second equidistant projection picture are aligned. Each corresponding first equidistant projection picture and second equidistant projection picture are dynamically aligned based on feature point matching.
上述基于互相关最大化的affine参数搜索的方式可以通过下述步骤三十至步骤三十三来实现:The above method of searching for affine parameters based on the maximization of cross-correlation can be realized through the following steps 30 to 33:
步骤三十,将第二等距投影图片按预设方向移动。 Step 30, moving the second equidistant projection picture in a preset direction.
上述预设方向可以是对第二等距投影图片进行横向移动等,比如,参见图9所示的一种图片对齐示意图,对于展开后的图像img1和img2,可以横向移动img2来与img1进行对齐,比如,可以先使img2向左移动,再使img2向右移动,在移动过程中,img2和img1之间的匹配度通常会发生变化。The above-mentioned preset direction may be to move the second equidistant projection picture laterally, for example, refer to a picture alignment schematic diagram shown in FIG. , for example, img2 can be moved to the left first, and then img2 can be moved to the right. During the moving process, the matching degree between img2 and img1 usually changes.
步骤三十一,在移动过程中,提取第一等距投影图片和第二等距投影图片的多个第五重叠区域。Step 31: During the moving process, multiple fifth overlapping regions of the first equidistant projection picture and the second equidistant projection picture are extracted.
通过移动图9中的img2来提取重叠区域,具体可以基于aperture视场角参数估计出对应的重叠区域,比如,对于双目鱼眼相机来说,如果两个鱼眼镜头中,每个鱼眼镜头的aperture视场角是180度,则重叠区域为0,如果每个鱼眼镜头的aperture视场角大于180度,就可以根据两个鱼眼镜头的aperture视场角的重叠角度,估计出相应的重叠区域。在实际实现时,在移动第二等距投影图片的过程中,可以提取第一等距投影图片和第二等距投影图片的多个第五重叠区域。The overlapping area is extracted by moving img2 in Figure 9. Specifically, the corresponding overlapping area can be estimated based on the aperture angle of view parameter. For example, for a binocular fisheye camera, if two fisheye lenses, each fisheye The aperture angle of view of the lens is 180 degrees, and the overlapping area is 0. If the aperture angle of each fisheye lens is greater than 180 degrees, it can be estimated based on the overlap angle of the aperture angles of the two fisheye lenses. corresponding overlapping regions. During actual implementation, multiple fifth overlapping regions of the first equidistant projection picture and the second equidistant projection picture may be extracted during the process of moving the second equidistant projection picture.
步骤三十二,对多个第五重叠区域分别进行互相关计算,得到多个第二互相关计算结果。Step 32: Carry out cross-correlation calculations on the plurality of fifth overlapping areas respectively, to obtain multiple second cross-correlation calculation results.
在提取出上述多个第五重叠区域后,可以每个第五重叠区域中包含的第一等距投影图片中所对应的部分图片区域,以及在第二等距投影图片中所对应的部分图片区域进行互相关计算,得到各自对应的第二互相关计算结果,通过第二互相关计算结果反映这两个部分图片区域之间相似性。After extracting the plurality of fifth overlapping areas, the corresponding partial picture area in the first equidistant projection picture included in each fifth overlapping area, and the corresponding partial picture in the second equidistant projection picture Cross-correlation calculations are performed on the regions to obtain respective corresponding second cross-correlation calculation results, and the similarity between the two partial image regions is reflected by the second cross-correlation calculation results.
步骤三十三,基于多个第二互相关计算结果,将第一等距投影图片和第二等距投影图片进行对齐。Step 33: Align the first equidistant projection picture and the second equidistant projection picture based on the multiple second cross-correlation calculation results.
具体的,该步骤三十三可以通过下述步骤H至步骤K来实现:Specifically, this step 33 can be realized through the following steps H to K:
步骤H,从多个第二互相关计算结果中,选取数值最大的第二互相关计算结果。Step H, selecting the second cross-correlation calculation result with the largest value from the multiple second cross-correlation calculation results.
由于第二互相关计算结果的数值越大,表示两个部分图片区域之间相似性越高,因此,可以从计算得到的多个第二互相关计算结果中,选取出数值最大的第二互相关计算结果。Since the larger the value of the second cross-correlation calculation result, the higher the similarity between the two partial picture regions, therefore, the second cross-correlation with the largest value can be selected from the calculated second cross-correlation calculation results. related calculation results.
步骤I,获取数值最大的第二互相关计算结果对应的第五重叠区域在第一等距投影图片中对应的第一边界像素点的位置坐标,以及在第二等距投影图片中对应的第二边界像素点的位置坐标。 Step 1, obtaining the position coordinates of the first boundary pixel point corresponding to the fifth overlapping area corresponding to the second cross-correlation calculation result with the largest value in the first equidistant projection picture, and the corresponding first boundary pixel point in the second equidistant projection picture The position coordinates of the two boundary pixels.
步骤J,基于第一边界像素点的位置坐标和第二边界像素点的位置坐标,计算仿射变换矩阵。Step J, calculating an affine transformation matrix based on the position coordinates of the first boundary pixel point and the position coordinates of the second boundary pixel point.
步骤K,基于仿射变换矩阵,将第一等距投影图片和第二等距投影图片进行对齐。Step K, aligning the first equidistant projection picture and the second equidistant projection picture based on the affine transformation matrix.
搜索使重叠区域互相关最大化的对齐参数,该对齐参数也可以称为图像边界点,该图像边界点即为上述数值最大的第二互相关计算结果对应的第五重叠区域在第一等距投影图片中对应的第一边界像素点的位置坐标,以及在第二等距投影图片中对应的第二边界像素点的位置坐标;基于搜索到的第一边界像素点的位置坐标和第二边界像素点的位置 坐标,计算第二等距投影图片的affine参数,其中,该affine参数也可以称为仿射变换矩阵,基于该仿射变换矩阵,将第二等距投影图片投影变换到与第一等距投影图片良好对齐。Searching for an alignment parameter that maximizes the cross-correlation of the overlapping area, the alignment parameter can also be called an image boundary point, and the image boundary point is the fifth overlapping area corresponding to the second cross-correlation calculation result with the largest value above at the first equidistant The position coordinates of the corresponding first boundary pixel points in the projection picture, and the position coordinates of the corresponding second boundary pixel points in the second equidistant projection picture; based on the searched position coordinates of the first boundary pixel points and the second boundary The position coordinates of the pixel points are used to calculate the affine parameter of the second equidistant projection picture, wherein the affine parameter can also be called an affine transformation matrix. Based on the affine transformation matrix, the projection transformation of the second equidistant projection picture is An isometric projection picture is well aligned.
步骤810,基于对齐后的第一等距投影图片和第二等距投影图片,提取第三重叠区域。Step 810, extracting a third overlapping region based on the aligned first equidistant projection picture and the second equidistant projection picture.
基于上述对齐后的第一等距投影图片和第二等距投影图片提取第三重叠区域,具体的,可以基于特征点匹配的方式提取第三重叠区域,比如,在确定匹配特征点对后,可以将第一等距投影图片和第二等距投影图片按照匹配特征点对进行合对,就可以提取出第三重叠区域;也可以基于aperture先验信息确定第三重叠区域,比如,对于双目鱼眼相机来说,如果两个鱼眼镜头中,每个鱼眼镜头的aperture视场角是180度,则重叠区域为0,如果每个鱼眼镜头的aperture视场角大于180度,就可以根据两个鱼眼镜头的aperture视场角的重叠角度,估计出相应的第三重叠区域。The third overlapping area is extracted based on the aligned first equidistant projection picture and the second equidistant projection picture. Specifically, the third overlapping area can be extracted based on feature point matching. For example, after the matching feature point pair is determined, The first equidistant projection picture and the second equidistant projection picture can be combined according to matching feature point pairs, and the third overlapping region can be extracted; the third overlapping region can also be determined based on aperture prior information, for example, for double For fisheye cameras, if the aperture angle of each fisheye lens is 180 degrees among the two fisheye lenses, the overlapping area is 0. If the aperture angle of each fisheye lens is greater than 180 degrees, The corresponding third overlapping area can be estimated according to the overlapping angle of the aperture angles of view of the two fisheye lenses.
步骤812,基于第三重叠区域,对第二等距投影图片进行光照补偿,以使光照补偿后的第二等距投影图片中每个像素的像素值,与相对应的第一等距投影图片中每个像素的像素值相匹配。Step 812: Perform illumination compensation on the second equidistant projection picture based on the third overlapping area, so that the pixel value of each pixel in the second equidistant projection picture after illumination compensation is the same as that of the corresponding first equidistant projection picture Match the pixel value of each pixel in the .
基于提取得到的第三重叠区域,对第二等距投影图片进行光照补偿,比如,可以采用直方图匹配的方式,将第二等距投影图片中每个像素的像素值分布,映射到与第一等距投影图片中每个像素的像素值相似的分布;当然也可以采用其他光照补偿方式,具体可以参考相关技术中的光照补偿方式,在此不再赘述。Based on the extracted third overlapping region, light compensation is performed on the second equidistant projection picture. For example, a histogram matching method can be used to map the pixel value distribution of each pixel in the second equidistant projection picture to the A distribution of pixel values of each pixel in the equidistant projection image is similar; of course, other lighting compensation methods can also be used, for details, please refer to lighting compensation methods in related technologies, which will not be repeated here.
步骤814,基于第一等距投影图片,以及光照补偿后的第二等距投影图片,确定拼缝搜索结果。Step 814 , based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation, determine a patchwork search result.
在实际实现时,当获取到上述第一等距投影图片,以及光照补偿后的第二等距投影图片后,可以按照前述实施例中确定拼缝搜索结果的方案,确定该第一等距投影图片,以及对应的光照补偿后的第二等距投影图片的拼缝搜索结果。In actual implementation, after obtaining the above-mentioned first equidistant projection picture and the second equidistant projection picture after illumination compensation, the first equidistant projection can be determined according to the scheme of determining the seam search result in the foregoing embodiment picture, and the corresponding seam search results of the second equidistant projection picture after illumination compensation.
步骤816,针对每组相互对应的两帧鱼眼视频图像,基于该组中两帧鱼眼视频图像对应的拼缝搜索结果,确定该组中两帧鱼眼视频图像对应的融合的重叠区域。Step 816, for each group of two corresponding fisheye video images, based on the patchwork search results corresponding to the two fisheye video images in the group, determine the fused overlapping area corresponding to the two fisheye video images in the group.
在实际实现时,针对每组相互对应的两帧鱼眼视频图像,在得到每组中两帧鱼眼视频图像对应的拼缝搜索结果后,可以通过融合操作,得到每组中两帧鱼眼视频图像对应的重叠区域的融合结果。In actual implementation, for each group of two frames of fisheye video images corresponding to each other, after obtaining the seam search results corresponding to the two frames of fisheye video images in each group, the two frames of fisheye video images in each group can be obtained through the fusion operation Fusion results of overlapping regions corresponding to video images.
步骤818,将融合的重叠区域替换该组中两帧鱼眼视频图像对应的第三重叠区域,得到该组中两帧鱼眼视频图像的图像拼接结果。Step 818, replace the fused overlapping area with the third overlapping area corresponding to the two frames of fisheye video images in the group, and obtain an image splicing result of the two frames of fisheye video images in the group.
在实际实现时,对于每组中两帧鱼眼视频图像来说,可以使用对应的融合的重叠区域替换所对应的对齐后的第三重叠区域,即可得到该组中两帧鱼眼视频图像的图像拼接结果。In actual implementation, for two frames of fisheye video images in each group, the corresponding fused overlapping region can be used to replace the corresponding aligned third overlapping region, and the two frames of fisheye video images in the group can be obtained image stitching results.
另外,需要说明的是,还可以基于光流的方式得到每组中两帧鱼眼视频图像的图像拼接结果,具体的,计算第一等距投影图片,以及光照补偿后的第二等距投影图片的重叠区域的光流信息,基于光流信息计算光照补偿后的第二等距投影图片的remap(重映射)变换参数,基于计算得到的remap变换参数,将光照补偿后的第二等距投影图片对应的重叠区域的光流信息,与第一等距投影图片对应的重叠区域的光流信息进行融合,从而实现第一等距投影图片和光照补偿后的第二等距投影图片的重叠区域的融合,对于不需要每帧更新的情况,可以将该remap变换参数合并到对提取目标区域后的鱼眼视频图片展开为等距投影图片时的remap参数中,以降低计算量;对于光流方式进行融合的图片可以直接得到融合后的图片,即得到图像拼接结果。In addition, it should be noted that the image mosaic results of two frames of fisheye video images in each group can also be obtained based on optical flow. Specifically, the first equidistant projection picture and the second equidistant projection after illumination compensation Based on the optical flow information of the overlapping area of the picture, the remap (remapping) transformation parameter of the second equidistant projection image after illumination compensation is calculated based on the optical flow information, and the second equidistant projection image after illumination compensation is calculated based on the calculated remap transformation parameter. The optical flow information of the overlapping area corresponding to the projection picture is fused with the optical flow information of the overlapping area corresponding to the first equidistant projection picture, so as to realize the overlapping of the first equidistant projection picture and the second equidistant projection picture after illumination compensation For the fusion of regions, for the situation that does not need to be updated every frame, the remap transformation parameters can be incorporated into the remap parameters when the fisheye video image after extracting the target area is expanded into an equidistant projection image to reduce the amount of calculation; for light The images fused in the stream mode can directly obtain the fused image, that is, the image stitching result.
步骤820,基于每组中两帧鱼眼视频图像的图像拼接结果,确定视频图像的视频拼接结果。Step 820, based on the image stitching results of the two frames of fisheye video images in each group, determine the video stitching results of the video images.
按照时间顺序,将每组中两帧鱼眼视频图像的图像拼接结果组成视频图像的视频拼接结果。In time order, the image stitching results of the two frames of fisheye video images in each group are combined to form the video stitching results of the video images.
上述视频图像的拼接方法,首先将相互对应的第一等距投影图片和第二等距投影图片进行对齐;基于对齐后的第一等距投影图片和第二等距投影图片,提取第三重叠区域;然后基于第三重叠区域,对第二等距投影图片进行光照补偿,以使光照补偿后的第二等距投影图片中每个像素的像素值,与相对应的第一等距投影图片中每个像素的像素值相匹配;基于第一等距投影图片,以及光照补偿后的第二等距投影图片,确定拼缝搜索结果;最后针对每组相互对应的两帧鱼眼视频图像,基于该组中两帧鱼眼视频图像对应的拼缝搜索结果,确定该组中两帧鱼眼视频图像对应的融合的重叠区域。将融合的重叠区域替换该组中两帧鱼眼视频图像对应的第三重叠区域,得到该组中两帧鱼眼视频图像的图像拼接结果;基于每组中两帧鱼眼视频图像的图像拼接结果,确定视频图像的视频拼接结果。该方式基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。The mosaic method of the above-mentioned video images firstly aligns the corresponding first equidistant projection picture and the second equidistant projection picture; based on the aligned first equidistant projection picture and the second equidistant projection picture, a third overlapping area; then based on the third overlapping area, light compensation is performed on the second equidistant projection picture, so that the pixel value of each pixel in the second equidistant projection picture after light compensation is the same as that of the corresponding first equidistant projection picture Match the pixel value of each pixel in ; Based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation, determine the patchwork search result; Finally, for each group of two corresponding fisheye video images, Based on the patchwork search results corresponding to the two frames of fish-eye video images in the group, the fused overlapping areas corresponding to the two frames of fish-eye video images in the group are determined. Replace the fused overlapping area with the third overlapping area corresponding to the two frames of fisheye video images in the group to obtain the image mosaic result of the two frames of fisheye video images in the group; based on the image mosaic of the two frames of fisheye video images in each group As a result, a video stitching result of the video images is determined. This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
上述视频图像的拼接方法,通过离线自适应展开参数优化过程能够提升视频图像的拼接对齐效果,提升对不同的双目摄鱼眼模组的适配能力;通过动态精对齐算法可以提升拼接算法对于不同景深场景的拼接效果;另外,基于视频图像的拼缝预测融合的方式能够提升视频图像拼接的稳定性,改善全景视频的拼接效果;并且,上述方式中还采用神经网络蒸馏、分块warp代替remap操作的方式,实现硬件加速。The stitching method of the above video images can improve the splicing and alignment effect of the video images through the offline adaptive expansion parameter optimization process, and improve the adaptability to different binocular fisheye modules; the dynamic fine alignment algorithm can improve the stitching algorithm for The stitching effect of scenes with different depths of field; in addition, the method of seam prediction and fusion based on video images can improve the stability of video image stitching and improve the stitching effect of panoramic videos; moreover, the above method also uses neural network distillation and block warp instead The way of remap operation realizes hardware acceleration.
实施例七Embodiment seven
本实施例提供了视频图像的拼缝搜索装置,如图10所示,该装置包括:第一获取模块100,可以被配置成用于获取第一视频中每帧视频图像的能量图;其中,能量图用于指示视频图像中指定对象的位置区域和边缘;第一确定模块101,可以被配置成用于针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果;其中,拼缝搜索结果包括:视频图像与目标图像的拼缝区域;目标图像为第二视频中与视频图像对应的视频图像;第二确定模块102,可以被配置成用于针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果。This embodiment provides a video image seam search device, as shown in FIG. 10 , the device includes: a first acquisition module 100, which may be configured to acquire an energy map of each frame of the video image in the first video; wherein, The energy map is used to indicate the location area and edge of the specified object in the video image; the first determination module 101 may be configured to determine, based on the energy map of the first frame of video image, for the first frame of video image in the first video The patchwork search result of the first frame video image; Wherein, the patchwork search result includes: the patchwork area of the video image and the target image; The target image is the video image corresponding to the video image in the second video; the second determination module 102, It may be configured to determine the seam search area range of the current video image based on the seam search result of the previous frame of the video image of the current video image for each frame of video image except the first frame in the first video; Within the range of the patchwork search area, the patchwork search result of the current video image is determined based on the energy map of the current video image.
上述视频图像的拼缝搜索装置,首先获取第一视频中每帧视频图像的能量图;然后针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,确定第一帧视频图像的拼缝搜索结果;针对第一视频中除第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图确定当前视频图像的拼缝搜索结果。该装置基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。The above-mentioned patchwork search device for video images first obtains the energy map of each frame of video images in the first video; then, for the first frame of video images in the first video, based on the energy map of the first frame of video images, determine the The patchwork search result of image; For each frame video image except the first frame in the first video, based on the patchwork search result of the previous frame video image of the current video image, determine the patchwork search area range of the current video image; Within the range of the patchwork search area, determine the patchwork search result of the current video image based on the energy map of the current video image. The device determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then determines the patchwork search area range in the patchwork The patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
可选地,第一获取模块100还可以被配置成用于:获取第一视频中每帧视频图像的显著性目标能量图、运动目标能量图和边缘能量图;针对每帧视频图像,融合该帧视频图像所对应的显著性目标能量图、运动目标能量图和边缘能量图,得到该帧视频图像的能量图。Optionally, the first obtaining module 100 may also be configured to: obtain the salient object energy map, the moving object energy map and the edge energy map of each frame of video image in the first video; for each frame of video image, fuse the The saliency target energy map, the moving target energy map and the edge energy map corresponding to the frame video image are obtained to obtain the energy map of the frame video image.
可选地,第一获取模块100还可以被配置成用于:针对第一视频中的每帧视频图像,将该视频图像输入至预设神经网络模型中,以通过预设神经网络模型输出该帧视频图像的显著性目标能量图;基于该帧视频图像中的运动目标,确定该帧视频图像的运动目标能量图;对该帧视频图像中每个对象进行边缘检测,得到该帧视频图像的边缘能量图。Optionally, the first acquisition module 100 may also be configured to: for each frame of video image in the first video, input the video image into the preset neural network model, so as to output the frame through the preset neural network model. The salient target energy map of the frame video image; based on the moving target in the frame video image, determine the moving target energy map of the frame video image; edge detection is performed on each object in the frame video image to obtain the frame video image Edge energy map.
可选地,第一确定模块101还可以被配置成用于:针对第一视频中第一帧视频图像,基于第一帧视频图像的能量图,采用动态规划算法,计算第一帧视频图像的拼缝搜索结果。Optionally, the first determination module 101 may also be configured to: for the first frame of video image in the first video, based on the energy map of the first frame of video image, using a dynamic programming algorithm to calculate the energy of the first frame of video image Patchwork search results.
可选地,第二确定模块102还可以被配置成用于:针对第一视频中除第一帧以外的每帧视频图像,在当前视频图像的前一帧视频图像的拼缝搜索结果的基础上,增加预设约束条件,确定当前视频图像的拼缝搜索区域范围;在拼缝搜索区域范围内,基于当前视频图像的能量图,采用动态规划算法,确定当前视频图像的拼缝搜索结果。Optionally, the second determining module 102 may also be configured to: for each frame of video image in the first video except the first frame, based on the patchwork search result of the video image of the previous frame of the current video image Above, add preset constraint conditions to determine the patchwork search area of the current video image; within the patchwork search area, based on the energy map of the current video image, a dynamic programming algorithm is used to determine the patchwork search result of the current video image.
可选地,第一视频中,每帧视频图像与该视频图像对应的目标图像具有重叠区域,重叠区域在视频图像中对应的区域为第一重叠区域,在目标图像中对应的区域为第二重叠区域;该装置还用于:针对每帧视频图像,将该帧视频图像中第一重叠区域对应的图像,与该帧视频图像对应的目标图像中第二重叠区域对应的图像,输入至预先训练好的神经网络模型中,得到该帧视频图像的拼缝预测结果;其中,拼缝预测结果包括:视频图像与目标图像的拼缝预测区域。Optionally, in the first video, each frame of video image has an overlapping area with the target image corresponding to the video image, the area corresponding to the overlapping area in the video image is the first overlapping area, and the corresponding area in the target image is the second overlapping area. overlapping area; the device is also used for: for each frame of video image, the image corresponding to the first overlapping area in the frame of video image, and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image, input to the pre-set In the trained neural network model, a patchwork prediction result of the video image of the frame is obtained; wherein, the patchwork prediction result includes: a patchwork prediction area of the video image and the target image.
可选地,该装置还包括神经网络模型确定模块,该神经网络模型确定模块可以被配置成用于:获取包含连续多组待拼接图像对的训练样本,以及每组待拼接图像对的拼缝搜索结果;针对除第一组待拼接图像对的每组待拼接图像对,将该组待拼接图像对,以及相邻的前一组待拼接图像对的拼缝预测结果,输入至初始神经网络模型中,以通过初始神经网络模型输出该组待拼接图像对的拼缝预测结果;基于该组待拼接图像对的拼缝搜索结果和预设的损失函数,计算该组待拼接图像对的拼缝预测结果的损失值;基于损失值更新初始神经网络模型的权重参数;继续执行获取包含连续多组待拼接图像对的训练样本的步骤,直至初始神经网络模型收敛,得到神经网络模型。Optionally, the device further includes a neural network model determination module, the neural network model determination module may be configured to: acquire training samples comprising multiple consecutive groups of image pairs to be stitched, and the stitching of each group of image pairs to be stitched Search results; for each group of image pairs to be stitched except the first group of image pairs to be stitched, input the seam prediction results of the group of image pairs to be stitched and the adjacent previous group of image pairs to be stitched into the initial neural network In the model, the seam prediction results of the group of image pairs to be stitched are output through the initial neural network model; based on the seam search results of the group of image pairs to be stitched and the preset loss function, the stitching of the group of image pairs to be stitched is calculated. The loss value of the seam prediction result; update the weight parameters of the initial neural network model based on the loss value; continue to perform the step of obtaining training samples containing multiple consecutive groups of image pairs to be stitched until the initial neural network model converges, and the neural network model is obtained.
可选地,该神经网络模型确定模块还可以被配置成用于:获取预设拼缝模板;其中,预设拼缝模板中包括预设拼缝区域;针对第一组待拼接图像对,将第一组待拼接图像对和预设拼缝模板输入至初始神经网络模型中,以通过初始神经网络模型输出第一组待拼接图像对的拼缝预测结果。Optionally, the neural network model determination module can also be configured to: obtain a preset patchwork template; wherein, the preset patchwork template includes a preset patchwork area; for the first group of image pairs to be stitched, the The first group of image pairs to be stitched and the preset patchwork template are input into the initial neural network model, so as to output the patchwork prediction results of the first group of image pairs to be stitched through the initial neural network model.
本公开实施例所提供的视频图像的拼缝搜索装置,其实现原理及产生的技术效果和前述视频图像的拼缝搜索方法实施例相同,为简要描述,视频图像的拼缝搜索装置实施例部分未提及之处,可参考前述视频图像的拼缝搜索方法实施例中相应内容。The implementation principle and technical effect of the video image seam search device provided by the embodiment of the present disclosure are the same as the above-mentioned video image seam search method embodiment, for a brief description, the video image seam search device embodiment part For the parts not mentioned, reference may be made to the corresponding content in the foregoing embodiments of the method for patchwork search of video images.
实施例八Embodiment eight
本实施例提供了一种视频图像的拼接装置,如图11所示,该装置包括:第二获取模块110,可以被配置成用于获取第一鱼眼视频和第二鱼眼视频;其中,第一鱼眼视频和第二鱼眼视频中的鱼眼视频图像具有重叠区域;提取模块111,可以被配置成用于提取第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及第二鱼眼视频中每帧鱼眼视频图像的第二目标区域;第三确定模块112,可以被配置成用于针对相互对应的两帧鱼眼视频图像,基于两帧鱼眼视频图像对应 的第一目标区域和第二目标区域,以及预先获取到的更新展开参数值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片;第四确定模块113,可以被配置成用于基于相互对应的第一等距投影图片和第二等距投影图片,确定拼缝搜索结果;其中,拼缝搜索结果采用上述视频图像的拼缝搜索装置确定;第五确定模块114,可以被配置成用于基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定视频图像的视频拼接结果。This embodiment provides a video image splicing device, as shown in FIG. 11 , the device includes: a second acquisition module 110, which can be configured to acquire a first fisheye video and a second fisheye video; wherein, The fisheye video images in the first fisheye video and the second fisheye video have overlapping regions; the extraction module 111 can be configured to extract the first target region of each frame of fisheye video images in the first fisheye video, And the second target area of each frame of fisheye video image in the second fisheye video; the third determination module 112 may be configured to, for two frames of fisheye video images corresponding to each other, based on the correspondence between the two frames of fisheye video images The first target area and the second target area, and the pre-acquired update expansion parameter value, determine the first equidistant projection picture after the frame of the fisheye video image in the first fisheye video is expanded, and the second fisheye video The second equidistant projection picture after the frame of the fisheye video image is expanded; the fourth determination module 113 may be configured to determine the patchwork based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other Search results; Wherein, the patchwork search result adopts the above-mentioned patchwork search device for video images to determine; the fifth determination module 114 can be configured to be based on the patchwork search results corresponding to each group of two frames of fisheye video images corresponding to each other , to determine the video stitching result of the video image.
上述视频图像的拼接装置,首先获取第一鱼眼视频和第二鱼眼视频;提取第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及第二鱼眼视频中每帧鱼眼视频图像的第二目标区域。然后针对相互对应的两帧鱼眼视频图像,基于两帧鱼眼视频图像对应的第一目标区域和第二目标区域,以及预先获取到的更新展开参数值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片。最后基于相互对应的第一等距投影图片和第二等距投影图片,确定拼缝搜索结果;基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定视频图像的视频拼接结果。该装置基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。The splicing device of the above-mentioned video images first acquires the first fisheye video and the second fisheye video; The second target area of the eye video image. Then, for the corresponding two frames of fisheye video images, based on the first target area and the second target area corresponding to the two frames of fisheye video images, and the pre-acquired update expansion parameter value, determine the frame in the first fisheye video The first equidistant projection picture after the fisheye video image is expanded, and the second equidistant projection picture after the frame of the fisheye video image in the second fisheye video is expanded. Finally, based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other, determine the patchwork search result; based on the patchwork search results corresponding to each group of two frames of fisheye video images corresponding to each other, determine the video splicing of the video image result. The device determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then determines the patchwork search area range in the patchwork The patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
可选地,第四确定模块113还可以被配置成用于:将相互对应的第一等距投影图片和第二等距投影图片进行对齐;基于对齐后的第一等距投影图片和第二等距投影图片,提取第三重叠区域;基于第三重叠区域,对第二等距投影图片进行光照补偿,以使光照补偿后的第二等距投影图片中每个像素的像素值,与相对应的第一等距投影图片中每个像素的像素值相匹配;基于第一等距投影图片,以及光照补偿后的第二等距投影图片,确定拼缝搜索结果。Optionally, the fourth determination module 113 may also be configured to: align the corresponding first isometric projection picture and the second isometric projection picture; based on the aligned first isometric projection picture and the second The equidistant projection picture extracts the third overlapping area; based on the third overlapping area, light compensation is performed on the second equidistant projection picture, so that the pixel value of each pixel in the second equidistant projection picture after light compensation is the same as The pixel values of each pixel in the corresponding first equidistant projection picture are matched; based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation, a patchwork search result is determined.
可选地,该装置还包括参数值确定模块,更新展开参数值包括:视场角参数值、光心在x轴方向的参数值、光心在y轴方向的参数值和鱼眼旋转角参数值;参数值确定模块可以被配置成用于:获取每个展开参数的初始展开参数值和预设偏移范围;基于每个展开参数的初始展开参数值和预设偏移范围,对每个展开参数进行采样,得到每个展开参数的采样值,基于每个展开参数的采样值,确定第一鱼眼视频中该帧鱼眼视频图像展开后的第三等距投影图片,以及第二鱼眼视频中该帧鱼眼视频图像展开后的第四等距投影图片;提取第三等距投影图片和第四等距投影图片的第四重叠区域;对第四重叠区域进行互相关计算,得到第一互相关计算结果;基于第一互相关计算结果和预设迭代次数,确定更新展开参数值。Optionally, the device also includes a parameter value determination module, and updating the expansion parameter value includes: field angle parameter value, parameter value of the optical center in the x-axis direction, parameter value of the optical center in the y-axis direction, and fisheye rotation angle parameter value; the parameter value determination module may be configured to: obtain the initial expansion parameter value and the preset offset range of each expansion parameter; based on the initial expansion parameter value and the preset offset range of each expansion parameter, for each Sampling by expanding the parameters to obtain the sampling value of each expansion parameter, based on the sampling value of each expansion parameter, determine the third equidistant projection picture after the expansion of the frame of the fisheye video image in the first fisheye video, and the second fisheye video The fourth equidistant projection picture after the frame fisheye video image in the eye video is expanded; extract the fourth overlapping area of the third equidistant projection picture and the fourth equidistant projection picture; perform cross-correlation calculation on the fourth overlapping area, and obtain The first cross-correlation calculation result; based on the first cross-correlation calculation result and the preset iteration number, determine and update the expansion parameter value.
可选地,参数值确定模块还可以被配置成用于:按预设迭代次数,重复执行基于每个展开参数的初始展开参数值和预设偏移范围,对每个展开参数进行采样的步骤,得到多个第一互相关计算结果;从多个第一互相关计算结果中,选取数值最大的第一互相关计算结果;将数值最大的第一互相关计算结果对应的展开参数的采样值确定为更新展开参数值。Optionally, the parameter value determination module may also be configured to: repeatedly perform the step of sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter according to a preset number of iterations , to obtain a plurality of first cross-correlation calculation results; from the plurality of first cross-correlation calculation results, select the first cross-correlation calculation result with the largest value; the sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the largest value OK to update the expanded parameter value.
可选地,第四确定模块113还可以被配置成用于:从第一等距投影图片中提取第一特征点,从第二等距投影图片中提取第二特征点;基于第一特征点和第二特征点,确定匹配特征点对;基于匹配特征点对,将第一等距投影图片和第二等距投影图片进行对齐。Optionally, the fourth determining module 113 may also be configured to: extract the first feature point from the first equidistant projection picture, and extract the second feature point from the second equidistant projection picture; based on the first feature point and the second feature point to determine a matching feature point pair; based on the matching feature point pair, align the first equidistant projection picture and the second equidistant projection picture.
可选地,第四确定模块113还可以被配置成用于:将第二等距投影图片按预设方向移动;在移动过程中,提取第一等距投影图片和第二等距投影图片的多个第五重叠区域;对多个第五重叠区域分别进行互相关计算,得到多个第二互相关计算结果;基于多个第二互相关计算结果,将第一等距投影图片和第二等距投影图片进行对齐。Optionally, the fourth determining module 113 may also be configured to: move the second isometric projection picture in a preset direction; during the moving process, extract the information of the first isometric projection picture and the second isometric projection picture A plurality of fifth overlapping areas; cross-correlation calculations are performed on the plurality of fifth overlapping areas to obtain a plurality of second cross-correlation calculation results; based on the plurality of second cross-correlation calculation results, the first equidistant projection picture and the second Isometric projection pictures for alignment.
可选地,第四确定模块113还可以被配置成用于:从多个第二互相关计算结果中,选取数值最大的第二互相关计算结果;获取数值最大的第二互相关计算结果对应的第五重叠区域在第一等距投影图片中对应的第一边界像素点的位置坐标,以及在第二等距投影图片中对应的第二边界像素点的位置坐标;基于第一边界像素点的位置坐标和第二边界像素点的位置坐标,计算仿射变换矩阵;基于仿射变换矩阵,将第一等距投影图片和第二等距投影图片进行对齐。Optionally, the fourth determination module 113 may also be configured to: select the second cross-correlation calculation result with the largest value from the multiple second cross-correlation calculation results; obtain the second cross-correlation calculation result corresponding to the maximum value The position coordinates of the corresponding first boundary pixel point in the first equidistant projection picture of the fifth overlapping region of , and the position coordinates of the corresponding second boundary pixel point in the second equidistant projection picture; based on the first boundary pixel point and the position coordinates of the second boundary pixel point to calculate an affine transformation matrix; based on the affine transformation matrix, the first equidistant projection picture and the second equidistant projection picture are aligned.
可选地,第四确定模块113还可以被配置成用于:针对每组相互对应的两帧鱼眼视频图像,基于该组中两帧鱼眼视频图像对应的拼缝搜索结果,确定该组中两帧鱼眼视频图像对应的融合的重叠区域;将融合的重叠区域替换该组中两帧鱼眼视频图像对应的第三重叠区域,得到该组中两帧鱼眼视频图像的图像拼接结果;基于每组中两帧鱼眼视频图像的图像拼接结果,确定视频图像的视频拼接结果。Optionally, the fourth determination module 113 may also be configured to: for each group of two corresponding fisheye video images, based on the patchwork search results corresponding to the two fisheye video images in the group, determine the group The fused overlapping area corresponding to the two frames of fisheye video images in the group; replace the fused overlapping area with the third overlapping area corresponding to the two frames of fisheye video images in this group, and obtain the image stitching result of the two frames of fisheye video images in this group ; Based on the image stitching results of the two frames of fisheye video images in each group, determine the video stitching results of the video images.
本公开实施例所提供的视频图像的拼接装置,其实现原理及产生的技术效果和前述视频图像的拼接方法实施例相同,为简要描述,视频图像的拼接装置实施例部分未提及之处,可参考前述视频图像的拼接方法实施例中相应内容。The video image splicing device provided by the embodiment of the present disclosure has the same realization principle and technical effect as the aforementioned embodiment of the video image splicing method. For a brief description, the part of the video image splicing device that is not mentioned in the embodiment is as follows: Reference may be made to the corresponding content in the aforementioned video image splicing method embodiments.
实施例九Embodiment nine
本实施例提供了一种电子***,其特征在于,电子***包括:图像采集设备、处理设备和存储装置;图像采集设 备,用于获取预览视频帧或图像数据;存储装置上存储有计算机程序,计算机程序在被处理设备运行时执行如上述视频图像的拼缝搜索方法,或者,上述视频图像的拼接方法。This embodiment provides an electronic system, which is characterized in that the electronic system includes: an image acquisition device, a processing device, and a storage device; the image acquisition device is used to obtain a preview video frame or image data; a computer program is stored on the storage device, When the computer program is run by the processing device, it executes the above-mentioned seam search method for video images, or the above-mentioned stitching method for video images.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的电子***的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the electronic system described above can refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
本公开实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,其特征在于,计算机程序被处理设备运行时执行如上述视频图像的拼缝搜索方法的步骤,或者,上述视频图像的拼接方法的步骤。An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and is characterized in that, when the computer program is run by the processing device, the steps of the above-mentioned patchwork search method for video images are executed, Or, the steps of the above video image splicing method.
本公开实施例所提供的视频图像的拼缝搜索方法、视频图像的拼接方法和装置的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。The computer program product of the video image patchwork search method, the video image splicing method and the device provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the above-mentioned For the methods described in the method embodiments, reference may be made to the method embodiments for specific implementation, and details are not repeated here.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the essence of the technical solution of the present disclosure or the part that contributes to the related technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present disclosure. scope.
工业实用性Industrial Applicability
本公开提供了一种视频图像的拼缝搜索方法、视频图像的拼接方法和装置,获取第一视频中每帧视频图像的能量图;针对第一帧视频图像,基于其能量图确定其拼缝搜索结果;针对其余每帧视频图像,基于前一帧视频图像的拼缝搜索结果,确定拼缝搜索区域范围;在该范围内,基于当前视频图像的能量图确定其拼缝搜索结果。该方式基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。The present disclosure provides a video image patchwork search method, a video image splicing method and a device, which acquire the energy map of each frame of video image in the first video; for the first frame of video image, determine its patchwork based on its energy map Search results; for each remaining frame of video image, based on the patchwork search result of the previous frame of video image, determine the range of the patchwork search area; within this range, determine its patchwork search result based on the energy map of the current video image. This method determines the patchwork search result based on the energy map of the video image, and, for video images other than the first frame, first determines the patchwork search area range based on the patchwork search result of the previous frame of video image, and then The patchwork search result is determined within the search area. This method of constraining the patchwork search area can reduce the difference between the patchwork area of the front and back frame video images, alleviate the jitter problem of the stitched video during playback, and improve the panoramic video quality. Stitching effect.
此外,可以理解的是,本申请的视频图像的拼缝搜索方法、视频图像的拼接方法和装置是可以重现的,并且可以用在多种工业应用中。例如,本申请的视频图像的拼缝搜索方法、视频图像的拼接方法和装置可以用于视频处理技术领域。In addition, it can be understood that the video image seam search method, video image splicing method and device of the present application are reproducible and can be used in various industrial applications. For example, the seam searching method for video images, the splicing method and device for video images of the present application can be used in the technical field of video processing.

Claims (20)

  1. 一种视频图像的拼缝搜索方法,其特征在于,所述方法包括:A patchwork search method for video images, characterized in that the method comprises:
    获取第一视频中每帧视频图像的能量图;其中,所述能量图用于指示所述视频图像中指定对象的位置区域和边缘;Acquiring an energy map of each frame of a video image in the first video; wherein the energy map is used to indicate the location area and edge of a specified object in the video image;
    针对所述第一视频中第一帧视频图像,基于所述第一帧视频图像的能量图,确定所述第一帧视频图像的拼缝搜索结果;其中,所述拼缝搜索结果包括:视频图像与目标图像的拼缝区域;所述目标图像为第二视频中与所述视频图像对应的视频图像;For the first frame of video image in the first video, based on the energy map of the first frame of video image, determine the patchwork search result of the first frame of video image; wherein, the patchwork search result includes: video The patchwork area between the image and the target image; the target image is a video image corresponding to the video image in the second video;
    针对所述第一视频中除所述第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定所述当前视频图像的拼缝搜索区域范围;在所述拼缝搜索区域范围内,基于所述当前视频图像的能量图确定所述当前视频图像的拼缝搜索结果。For each frame of video image in the first video except the first frame, based on the seam search result of the previous frame video image of the current video image, determine the seam search area range of the current video image; Within the range of the patchwork search area, determine the patchwork search result of the current video image based on the energy map of the current video image.
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一视频中每帧视频图像的能量图的步骤包括:The method according to claim 1, wherein the step of obtaining the energy map of each frame of video image in the first video comprises:
    获取所述第一视频中每帧视频图像的显著性目标能量图、运动目标能量图和边缘能量图;Acquiring the salient object energy map, moving object energy map and edge energy map of each frame of video image in the first video;
    针对所述每帧视频图像,融合该帧视频图像所对应的显著性目标能量图、运动目标能量图和边缘能量图,得到该帧视频图像的能量图。For each frame of video image, the saliency target energy map, moving target energy map and edge energy map corresponding to the frame of video image are fused to obtain the energy map of the frame of video image.
  3. 根据权利要求2所述的方法,其特征在于,所述获取所述第一视频中每帧视频图像的显著性目标能量图、运动目标能量图和边缘能量图的步骤包括:The method according to claim 2, wherein the step of obtaining the salient target energy map, moving target energy map and edge energy map of each frame of video image in the first video comprises:
    针对所述第一视频中的每帧视频图像,将该视频图像输入至预设神经网络模型中,以通过所述预设神经网络模型输出该帧视频图像的显著性目标能量图;For each frame of video image in the first video, input the video image into a preset neural network model, so as to output a saliency target energy map of the frame of video image through the preset neural network model;
    基于该帧视频图像中的运动目标,确定该帧视频图像的运动目标能量图;Based on the moving target in the frame of video image, determine the moving target energy map of the frame of video image;
    对该帧视频图像中每个对象进行边缘检测,得到该帧视频图像的边缘能量图。Edge detection is performed on each object in the frame of video image to obtain an edge energy map of the frame of video image.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述针对所述第一视频中第一帧视频图像,基于所述第一帧视频图像的能量图,确定所述第一帧视频图像的拼缝搜索结果的步骤包括:The method according to any one of claims 1 to 3, wherein, for the first frame video image in the first video, based on the energy map of the first frame video image, determining the second The steps of patchwork search results of a frame of video images include:
    针对所述第一视频中第一帧视频图像,基于所述第一帧视频图像的能量图,采用动态规划算法,计算所述第一帧视频图像的拼缝搜索结果。For the first frame of video image in the first video, based on the energy map of the first frame of video image, a dynamic programming algorithm is used to calculate the patchwork search result of the first frame of video image.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述针对所述第一视频中除所述第一帧以外的每帧视频图像,基于当前视频图像的前一帧视频图像的拼缝搜索结果,确定所述当前视频图像的拼缝搜索区域范围;在所述拼缝搜索区域范围内,基于所述当前视频图像的能量图确定所述当前视频图像的拼缝搜索结果的步骤包括:The method according to any one of claims 1 to 4, wherein, for each frame of video image in the first video except the first frame, based on the previous frame video of the current video image, The patchwork search result of the image, determining the patchwork search area range of the current video image; within the patchwork search area range, determining the patchwork search result of the current video image based on the energy map of the current video image The steps include:
    针对所述第一视频中除所述第一帧以外的每帧视频图像,在当前视频图像的前一帧视频图像的拼缝搜索结果的基础上,增加预设约束条件,确定当前视频图像的拼缝搜索区域范围;For each frame of video image in the first video except the first frame, on the basis of the seam search results of the previous frame of the current video image, add preset constraints to determine the current video image Seam search area range;
    在所述拼缝搜索区域范围内,基于所述当前视频图像的能量图,采用动态规划算法,确定所述当前视频图像的拼缝搜索结果。Within the range of the patchwork search area, based on the energy map of the current video image, a dynamic programming algorithm is used to determine a patchwork search result of the current video image.
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,所述第一视频中,每帧视频图像与该视频图像对应的目标图像具有重叠区域,所述重叠区域在所述视频图像中对应的区域为第一重叠区域,在所述目标图像中对应的区域为第二重叠区域;所述方法还包括:The method according to any one of claims 1 to 5, wherein in the first video, each frame of video image has an overlapping area with the target image corresponding to the video image, and the overlapping area is within the frame of the video. The corresponding area in the image is the first overlapping area, and the corresponding area in the target image is the second overlapping area; the method also includes:
    针对所述每帧视频图像,将该帧视频图像中第一重叠区域对应的图像,与该帧视频图像对应的目标图像中第二重叠区域对应的图像,输入至预先训练好的神经网络模型中,得到所述该帧视频图像的拼缝预测结果;其中,所述拼缝预测结果包括:所述视频图像与对应的所述目标图像的拼缝预测区域。For each frame of video image, the image corresponding to the first overlapping area in the frame of video image and the image corresponding to the second overlapping area in the target image corresponding to the frame of video image are input into the pre-trained neural network model , to obtain a seam prediction result of the frame of video image; wherein, the seam prediction result includes: a seam prediction area corresponding to the video image and the target image.
  7. 根据权利要求6所述的方法,其特征在于,所述预先训练好的神经网络模型,通过下述方式确定:The method according to claim 6, wherein the pre-trained neural network model is determined in the following manner:
    获取包含连续多组待拼接图像对的训练样本,以及每组待拼接图像对的拼缝搜索结果;Obtain training samples containing multiple consecutive sets of image pairs to be stitched, and the patchwork search results of each set of image pairs to be stitched;
    针对除第一组待拼接图像对的每组待拼接图像对,将该组待拼接图像对,以及相邻的前一组待拼接图像对的拼缝预测结果,输入至初始神经网络模型中,以通过所述初始神经网络模型输出该组待拼接图像对的拼缝预测结果;For each group of image pairs to be stitched except the first group of image pairs to be stitched, input the seam prediction results of the group of image pairs to be stitched and the adjacent previous group of image pairs to be stitched into the initial neural network model, Outputting the seam prediction results of the group of image pairs to be stitched through the initial neural network model;
    基于该组待拼接图像对的拼缝搜索结果和预设的损失函数,计算该组待拼接图像对的拼缝预测结果的损失值;Based on the seam search results of the group of image pairs to be stitched and a preset loss function, calculate the loss value of the seam prediction results of the group of image pairs to be stitched;
    基于所述损失值更新所述初始神经网络模型的权重参数;继续执行获取包含连续多组待拼接图像对的训练样本的步骤,直至所述初始神经网络模型收敛,得到所述神经网络模型。Updating the weight parameters of the initial neural network model based on the loss value; continuing to perform the step of obtaining training samples including multiple consecutive sets of image pairs to be stitched until the initial neural network model converges to obtain the neural network model.
  8. 根据权利要求7所述的方法,其特征在于,获取包含连续多组待拼接图像对的训练样本,以及每组待拼接图像对的拼缝搜索结果的步骤之后,所述方法还包括:The method according to claim 7, characterized in that after the step of obtaining training samples comprising multiple consecutive groups of image pairs to be stitched and the seam search results of each group of image pairs to be stitched, the method further comprises:
    获取预设拼缝模板;其中,所述预设拼缝模板中包括预设拼缝区域;Obtain a preset patchwork template; wherein, the preset patchwork template includes a preset patchwork area;
    针对第一组待拼接图像对,将所述第一组待拼接图像对和所述预设拼缝模板输入至所述初始神经网络模型中,以通过所述初始神经网络模型输出所述第一组待拼接图像对的拼缝预测结果。For the first group of image pairs to be stitched, input the first group of image pairs to be stitched and the preset stitching template into the initial neural network model, so as to output the first The patchwork prediction results of the image pairs to be stitched together.
  9. 一种视频图像的拼接方法,其特征在于,所述方法包括:A method for splicing video images, characterized in that the method comprises:
    获取第一鱼眼视频和第二鱼眼视频;其中,所述第一鱼眼视频和所述第二鱼眼视频中的鱼眼视频图像具有重叠区域;Acquiring the first fisheye video and the second fisheye video; wherein, the fisheye video images in the first fisheye video and the second fisheye video have overlapping regions;
    提取所述第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及所述第二鱼眼视频中每帧鱼眼视频图像的第二目标区域;Extract the first target area of each frame of fisheye video image in the first fisheye video, and the second target area of each frame of fisheye video image in the second fisheye video;
    针对相互对应的两帧鱼眼视频图像,基于所述两帧鱼眼视频图像对应的所述第一目标区域和所述第二目标区域,以及预先获取到的更新展开参数值,确定所述第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及所述第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片;For two frames of fisheye video images corresponding to each other, based on the first target area and the second target area corresponding to the two frames of fisheye video images, and the pre-acquired update expansion parameter value, determine the first The first equidistant projection picture after the frame fisheye video image in a fisheye video is expanded, and the second equidistant projection picture after the frame fisheye video image is expanded in the second fisheye video;
    基于相互对应的所述第一等距投影图片和所述第二等距投影图片,确定拼缝搜索结果;其中,所述拼缝搜索结果采用上述权利要求1至8中任一项所述的方法确定;Based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other, determine the patchwork search result; wherein, the patchwork search result adopts the method described in any one of the above claims 1 to 8 Method determination;
    基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定所述视频图像的视频拼接结果。Based on the seam search results corresponding to each group of two frames of fisheye video images corresponding to each other, a video stitching result of the video images is determined.
  10. 根据权利要求9所述的方法,其特征在于,所述基于相互对应的所述第一等距投影图片和所述第二等距投影图片,确定拼缝搜索结果的步骤包括:The method according to claim 9, wherein the step of determining the seam search result based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other comprises:
    将相互对应的所述第一等距投影图片和所述第二等距投影图片进行对齐;aligning the first equidistant projection picture and the second equidistant projection picture corresponding to each other;
    基于对齐后的所述第一等距投影图片和所述第二等距投影图片,提取第三重叠区域;extracting a third overlapping region based on the aligned first equidistant projection picture and the second equidistant projection picture;
    基于所述第三重叠区域,对所述第二等距投影图片进行光照补偿,以使光照补偿后的所述第二等距投影图片中每个像素的像素值,与相对应的所述第一等距投影图片中每个像素的像素值相匹配;Based on the third overlapping area, perform illumination compensation on the second equidistant projection picture, so that the pixel value of each pixel in the second equidistant projection picture after illumination compensation is the same as the corresponding first equidistant projection picture. match the pixel value of each pixel in an equidistant projection image;
    基于所述第一等距投影图片,以及光照补偿后的所述第二等距投影图片,确定所述拼缝搜索结果。The seam search result is determined based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation.
  11. 根据权利要求9或10所述的方法,其特征在于,所述更新展开参数值包括:视场角参数值、光心在x轴方向的参数值、光心在y轴方向的参数值和鱼眼旋转角参数值;所述更新展开参数值预先通过下述方式确定:The method according to claim 9 or 10, wherein the update expansion parameter value includes: field angle parameter value, parameter value of the optical center in the x-axis direction, parameter value of the optical center in the y-axis direction and fish The eye rotation angle parameter value; the update expansion parameter value is determined in advance by the following method:
    获取每个展开参数的初始展开参数值和预设偏移范围;Obtain the initial expansion parameter value and preset offset range of each expansion parameter;
    基于所述每个展开参数的初始展开参数值和预设偏移范围,对所述每个展开参数进行采样,得到所述每个展开参数的采样值,基于所述每个展开参数的采样值,确定所述第一鱼眼视频中该帧鱼眼视频图像展开后的第三等距投影图片,以及所述第二鱼眼视频中该帧鱼眼视频图像展开后的第四等距投影图片;Based on the initial expansion parameter value and preset offset range of each expansion parameter, sampling each expansion parameter to obtain a sampling value of each expansion parameter, based on the sampling value of each expansion parameter , determine the third equidistant projection picture after the frame of the fisheye video image in the first fisheye video is expanded, and the fourth equidistant projection picture after the frame of the fisheye video image in the second fisheye video is expanded ;
    提取所述第三等距投影图片和所述第四等距投影图片的第四重叠区域;extracting a fourth overlapping region of the third equidistant projection picture and the fourth equidistant projection picture;
    对所述第四重叠区域进行互相关计算,得到第一互相关计算结果;performing a cross-correlation calculation on the fourth overlapping area to obtain a first cross-correlation calculation result;
    基于所述第一互相关计算结果和预设迭代次数,确定所述更新展开参数值。Based on the first cross-correlation calculation result and a preset number of iterations, the updated expansion parameter value is determined.
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述第一互相关计算结果和预设迭代次数,确定所述更新展开参数值的步骤包括:The method according to claim 11, wherein the step of determining the update expansion parameter value based on the first cross-correlation calculation result and the preset number of iterations comprises:
    按所述预设迭代次数,重复执行基于所述每个展开参数的初始展开参数值和预设偏移范围,对所述每个展开参数进行采样的步骤,得到多个第一互相关计算结果;Repeating the step of sampling each unfolding parameter based on the initial unfolding parameter value and preset offset range of each unfolding parameter according to the preset number of iterations to obtain a plurality of first cross-correlation calculation results ;
    从所述多个第一互相关计算结果中,选取数值最大的第一互相关计算结果;From the plurality of first cross-correlation calculation results, selecting the first cross-correlation calculation result with the largest value;
    将所述数值最大的第一互相关计算结果对应的展开参数的采样值确定为所述更新展开参数值。The sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the largest value is determined as the updated expansion parameter value.
  13. 根据权利要求10至12中任一项所述的方法,其特征在于,所述将相互对应的所述第一等距投影图片和所述第二等距投影图片进行对齐的步骤包括:The method according to any one of claims 10 to 12, wherein the step of aligning the first equidistant projection picture and the second equidistant projection picture corresponding to each other comprises:
    从所述第一等距投影图片中提取第一特征点,从所述第二等距投影图片中提取第二特征点;extracting a first feature point from the first equidistant projection picture, and extracting a second feature point from the second equidistant projection picture;
    基于所述第一特征点和所述第二特征点,确定匹配特征点对;determining matching feature point pairs based on the first feature point and the second feature point;
    基于所述匹配特征点对,将所述第一等距投影图片和所述第二等距投影图片进行对齐。Aligning the first equidistant projection picture and the second equidistant projection picture based on the matching feature point pair.
  14. 根据权利要求10至12中任一项所述的方法,其特征在于,所述将相互对应的所述第一等距投影图片和所述第二等距投影图片进行对齐的步骤包括:The method according to any one of claims 10 to 12, wherein the step of aligning the first equidistant projection picture and the second equidistant projection picture corresponding to each other comprises:
    将所述第二等距投影图片按预设方向移动;moving the second equidistant projection picture in a preset direction;
    在移动过程中,提取所述第一等距投影图片和所述第二等距投影图片的多个第五重叠区域;During the moving process, extracting a plurality of fifth overlapping regions of the first equidistant projection picture and the second equidistant projection picture;
    对所述多个第五重叠区域分别进行互相关计算,得到多个第二互相关计算结果;performing cross-correlation calculations on the plurality of fifth overlapping regions respectively to obtain a plurality of second cross-correlation calculation results;
    基于所述多个第二互相关计算结果,将所述第一等距投影图片和所述第二等距投影图片进行对齐。Aligning the first equidistant projection picture and the second equidistant projection picture based on the plurality of second cross-correlation calculation results.
  15. 根据权利要求14所述的方法,其特征在于,所述基于所述多个第二互相关计算结果,将所述第一等距投影图片和所述第二等距投影图片进行对齐的步骤包括:The method according to claim 14, wherein the step of aligning the first equidistant projection picture and the second equidistant projection picture based on the plurality of second cross-correlation calculation results comprises :
    从所述多个第二互相关计算结果中,选取数值最大的第二互相关计算结果;Selecting the second cross-correlation calculation result with the largest value from the plurality of second cross-correlation calculation results;
    获取所述数值最大的第二互相关计算结果对应的第五重叠区域在所述第一等距投影图片中对应的第一边界像素点的位置坐标,以及在所述第二等距投影图片中对应的第二边界像素点的位置坐标;Obtain the position coordinates of the first boundary pixel corresponding to the fifth overlapping area corresponding to the second cross-correlation calculation result with the largest value in the first equidistant projection picture, and obtain the position coordinates of the first boundary pixel in the second equidistant projection picture The position coordinates of the corresponding second boundary pixel point;
    基于所述第一边界像素点的位置坐标和所述第二边界像素点的位置坐标,计算仿射变换矩阵;calculating an affine transformation matrix based on the position coordinates of the first boundary pixel point and the position coordinates of the second boundary pixel point;
    基于所述仿射变换矩阵,将所述第一等距投影图片和所述第二等距投影图片进行对齐。Aligning the first equidistant projection picture and the second equidistant projection picture based on the affine transformation matrix.
  16. 根据权利要求10至15中任一项所述的方法,其特征在于,所述基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定所述视频图像的视频拼接结果的步骤包括:The method according to any one of claims 10 to 15, characterized in that, based on the seam search results corresponding to each group of two frames of fisheye video images corresponding to each other, determine the video stitching results of the video images Steps include:
    针对每组相互对应的两帧鱼眼视频图像,基于该组中两帧鱼眼视频图像对应的拼缝搜索结果,确定该组中两帧鱼眼视频图像对应的融合的重叠区域;For each group of two frames of fisheye video images corresponding to each other, based on the patchwork search results corresponding to the two frames of fisheye video images in the group, determine the overlapping area of fusion corresponding to the two frames of fisheye video images in the group;
    将所述融合的重叠区域替换该组中两帧鱼眼视频图像对应的所述第三重叠区域,得到该组中两帧鱼眼视频图像的图像拼接结果;The overlapping area of described fusion is replaced described the 3rd overlapping area corresponding to two frames of fisheye video images in this group, obtains the image splicing result of two frames of fisheye video images in this group;
    基于每组中两帧鱼眼视频图像的图像拼接结果,确定所述视频图像的视频拼接结果。Based on the image stitching results of the two frames of fisheye video images in each group, the video stitching results of the video images are determined.
  17. 一种视频图像的拼缝搜索装置,其特征在于,所述装置包括:A patchwork search device for video images, characterized in that the device comprises:
    第一获取模块,被配置成用于获取第一视频中每帧视频图像的能量图;其中,所述能量图用于指示所述视频图像中指定对象的位置区域和边缘;The first acquisition module is configured to acquire an energy map of each frame of video image in the first video; wherein the energy map is used to indicate the location area and edge of a specified object in the video image;
    第一确定模块,被配置成用于针对所述第一视频中第一帧视频图像,基于所述第一帧视频图像的能量图,确定所述第一帧视频图像的拼缝搜索结果;其中,所述拼缝搜索结果包括:视频图像与目标图像的拼缝区域;所述目标图像为第二视频中与所述视频图像对应的视频图像;The first determination module is configured to determine the patchwork search result of the first frame video image based on the energy map of the first frame video image in the first video; wherein , the patchwork search result includes: a patchwork area between the video image and the target image; the target image is a video image corresponding to the video image in the second video;
    第二确定模块,被配置成用于针对所述第一视频中除所述第一帧以外的每帧视频图像,基于当前视频图像的前一 帧视频图像的拼缝搜索结果,确定所述当前视频图像的拼缝搜索区域范围;在所述拼缝搜索区域范围内,基于所述当前视频图像的能量图确定所述当前视频图像的拼缝搜索结果。The second determining module is configured to determine the current A range of a patchwork search area of a video image; within the range of the patchwork search area, determine a patchwork search result of the current video image based on an energy map of the current video image.
  18. 一种视频图像的拼接装置,其特征在于,所述装置包括:A splicing device for video images, characterized in that the device comprises:
    第二获取模块,被配置成用于获取第一鱼眼视频和第二鱼眼视频;其中,所述第一鱼眼视频和所述第二鱼眼视频中的鱼眼视频图像具有重叠区域;The second acquisition module is configured to acquire the first fisheye video and the second fisheye video; wherein, the fisheye video images in the first fisheye video and the second fisheye video have an overlapping area;
    提取模块,被配置成用于提取所述第一鱼眼视频中每帧鱼眼视频图像的第一目标区域,以及所述第二鱼眼视频中每帧鱼眼视频图像的第二目标区域;An extraction module configured to extract a first target area of each frame of fisheye video image in the first fisheye video, and a second target area of each frame of fisheye video image in the second fisheye video;
    第三确定模块,被配置成用于针对相互对应的两帧鱼眼视频图像,基于所述两帧鱼眼视频图像对应的所述第一目标区域和所述第二目标区域,以及预先获取到的更新展开参数值,确定所述第一鱼眼视频中该帧鱼眼视频图像展开后的第一等距投影图片,以及所述第二鱼眼视频中该帧鱼眼视频图像展开后的第二等距投影图片;The third determination module is configured to, for two frames of fisheye video images corresponding to each other, based on the first target area and the second target area corresponding to the two frames of fisheye video images, and the pre-acquired The updated expansion parameter value of the first fisheye video determines the first equidistant projection picture after the frame of the fisheye video image in the first fisheye video is expanded, and the first equidistant projection picture after the frame of the fisheye video image in the second fisheye video is expanded Two equidistant projection pictures;
    第四确定模块,被配置成用于基于相互对应的所述第一等距投影图片和所述第二等距投影图片,确定拼缝搜索结果;其中,所述拼缝搜索结果采用上述权利要求17所述的视频图像的拼缝搜索装置确定;The fourth determination module is configured to determine a patchwork search result based on the first equidistant projection picture and the second equidistant projection picture corresponding to each other; wherein the patchwork search result adopts the above claims The patchwork search device of the video image described in 17 is determined;
    第五确定模块,被配置成用于基于每组相互对应的两帧鱼眼视频图像对应的拼缝搜索结果,确定所述视频图像的视频拼接结果。The fifth determination module is configured to determine the video splicing result of the video images based on the patchwork search results corresponding to each group of two corresponding fisheye video images.
  19. 一种电子***,其特征在于,所述电子***包括:图像采集设备、处理设备和存储装置;An electronic system, characterized in that the electronic system includes: an image acquisition device, a processing device, and a storage device;
    所述图像采集设备,被配置成用于获取预览视频帧或图像数据;The image acquisition device is configured to acquire preview video frames or image data;
    所述存储装置上存储有计算机程序,所述计算机程序在被所述处理设备运行时执行根据权利要求1至8中任一项所述的视频图像的拼缝搜索方法,或者,根据权利要求9至16中任一项所述的视频图像的拼接方法。A computer program is stored on the storage device, and when the computer program is run by the processing device, the method for patchwork search of a video image according to any one of claims 1 to 8 is executed, or, according to claim 9 The splicing method of any one of to 16 video images.
  20. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其特征在于,所述计算机程序被处理设备运行时执行根据权利要求1至8中任一项所述的视频图像的拼缝搜索方法的步骤,或者,根据权利要求9至16中任一项所述的视频图像的拼接方法的步骤。A computer-readable storage medium, the computer-readable storage medium is stored with a computer program, characterized in that, when the computer program is executed by a processing device, the video image according to any one of claims 1 to 8 is executed The steps of the patchwork search method, or the steps of the video image stitching method according to any one of claims 9 to 16.
PCT/CN2022/098992 2021-08-04 2022-06-15 Splicing seam search method and apparatus for video image, and video image splicing method and apparatus WO2023011013A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110893253.6 2021-08-04
CN202110893253.6A CN113793382A (en) 2021-08-04 2021-08-04 Video image splicing seam searching method and video image splicing method and device

Publications (1)

Publication Number Publication Date
WO2023011013A1 true WO2023011013A1 (en) 2023-02-09

Family

ID=78877131

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/098992 WO2023011013A1 (en) 2021-08-04 2022-06-15 Splicing seam search method and apparatus for video image, and video image splicing method and apparatus

Country Status (2)

Country Link
CN (1) CN113793382A (en)
WO (1) WO2023011013A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452426A (en) * 2023-06-16 2023-07-18 广汽埃安新能源汽车股份有限公司 Panorama stitching method and device
CN117541764A (en) * 2024-01-09 2024-02-09 北京大学 Image stitching method, electronic equipment and storage medium
CN117544862A (en) * 2024-01-09 2024-02-09 北京大学 Image stitching method based on image moment parallel processing

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113793382A (en) * 2021-08-04 2021-12-14 北京旷视科技有限公司 Video image splicing seam searching method and video image splicing method and device
CN114708354B (en) * 2022-03-04 2023-06-23 广东省国土资源测绘院 Method, equipment, medium and product for drawing embedded line

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060115181A1 (en) * 2004-11-30 2006-06-01 Yining Deng System and method of aligning images
CN101533474A (en) * 2008-03-12 2009-09-16 三星电子株式会社 Character and image recognition system based on video image and method thereof
CN106651767A (en) * 2016-12-30 2017-05-10 北京星辰美豆文化传播有限公司 Panoramic image obtaining method and apparatus
US20180035047A1 (en) * 2016-07-29 2018-02-01 Multimedia Image Solution Limited Method for stitching together images taken through fisheye lens in order to produce 360-degree spherical panorama
CN108093221A (en) * 2017-12-27 2018-05-29 南京大学 A kind of real-time video joining method based on suture
CN110009567A (en) * 2019-04-09 2019-07-12 三星电子(中国)研发中心 For fish-eye image split-joint method and device
CN113793382A (en) * 2021-08-04 2021-12-14 北京旷视科技有限公司 Video image splicing seam searching method and video image splicing method and device

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997609A (en) * 2014-06-12 2014-08-20 四川川大智胜软件股份有限公司 Multi-video real-time panoramic fusion splicing method based on CUDA
CN104408701B (en) * 2014-12-03 2018-10-09 中国矿业大学 A kind of large scene video image joining method
KR20160115466A (en) * 2015-03-27 2016-10-06 한국전자통신연구원 Apparatus and method for panoramic video stiching
CN104794683B (en) * 2015-05-05 2016-03-23 中国人民解放军国防科学技术大学 Based on the video-splicing method scanned around gradual change piece area planar
CN105096239B (en) * 2015-07-02 2019-02-22 北京旷视科技有限公司 Method for registering images and its device and image split-joint method and its device
CN106210535A (en) * 2016-07-29 2016-12-07 北京疯景科技有限公司 The real-time joining method of panoramic video and device
CN107333064B (en) * 2017-07-24 2020-11-13 广东工业大学 Spherical panoramic video splicing method and system
CN110519528B (en) * 2018-05-22 2021-09-24 杭州海康威视数字技术股份有限公司 Panoramic video synthesis method and device and electronic equipment
CN110660023B (en) * 2019-09-12 2020-09-29 中国测绘科学研究院 Video stitching method based on image semantic segmentation
CN111709877B (en) * 2020-05-22 2023-05-02 浙江四点灵机器人股份有限公司 Image fusion method for industrial detection
CN111915483B (en) * 2020-06-24 2024-03-19 北京迈格威科技有限公司 Image stitching method, device, computer equipment and storage medium
CN112508849A (en) * 2020-11-09 2021-03-16 中国科学院信息工程研究所 Digital image splicing detection method and device
CN112862685B (en) * 2021-02-09 2024-02-23 北京迈格威科技有限公司 Image stitching processing method, device and electronic system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060115181A1 (en) * 2004-11-30 2006-06-01 Yining Deng System and method of aligning images
CN101533474A (en) * 2008-03-12 2009-09-16 三星电子株式会社 Character and image recognition system based on video image and method thereof
US20180035047A1 (en) * 2016-07-29 2018-02-01 Multimedia Image Solution Limited Method for stitching together images taken through fisheye lens in order to produce 360-degree spherical panorama
CN106651767A (en) * 2016-12-30 2017-05-10 北京星辰美豆文化传播有限公司 Panoramic image obtaining method and apparatus
CN108093221A (en) * 2017-12-27 2018-05-29 南京大学 A kind of real-time video joining method based on suture
CN110009567A (en) * 2019-04-09 2019-07-12 三星电子(中国)研发中心 For fish-eye image split-joint method and device
CN113793382A (en) * 2021-08-04 2021-12-14 北京旷视科技有限公司 Video image splicing seam searching method and video image splicing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUO HONGYI, ZHANG HONGMIN; LI PINGPING: "Video Stitching Method for Minimizing Parallax Artifacts", COMPUTER ENGINEERING AND APPLICATIONS, HUABEI JISUAN JISHU YANJIUSUO, CN, vol. 56, no. 20, 15 October 2020 (2020-10-15), CN , pages 186 - 190, XP093032380, ISSN: 1002-8331, DOI: 10.3778/j.issn.1002-8331.1907-0308 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452426A (en) * 2023-06-16 2023-07-18 广汽埃安新能源汽车股份有限公司 Panorama stitching method and device
CN116452426B (en) * 2023-06-16 2023-09-05 广汽埃安新能源汽车股份有限公司 Panorama stitching method and device
CN117541764A (en) * 2024-01-09 2024-02-09 北京大学 Image stitching method, electronic equipment and storage medium
CN117544862A (en) * 2024-01-09 2024-02-09 北京大学 Image stitching method based on image moment parallel processing
CN117544862B (en) * 2024-01-09 2024-03-29 北京大学 Image stitching method based on image moment parallel processing
CN117541764B (en) * 2024-01-09 2024-04-05 北京大学 Image stitching method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113793382A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
WO2023011013A1 (en) Splicing seam search method and apparatus for video image, and video image splicing method and apparatus
CN111126304B (en) Augmented reality navigation method based on indoor natural scene image deep learning
CN110515452B (en) Image processing method, image processing device, storage medium and computer equipment
CN112771539B (en) Employing three-dimensional data predicted from two-dimensional images using neural networks for 3D modeling applications
Huang et al. Indoor depth completion with boundary consistency and self-attention
US20230116250A1 (en) Computing images of dynamic scenes
Zhu et al. Generative adversarial frontal view to bird view synthesis
US20180012411A1 (en) Augmented Reality Methods and Devices
CN109815843B (en) Image processing method and related product
CN101422035B (en) Light source estimation device, light source estimation system, light source estimation method, device having increased image resolution, and method for increasing image resolution
Matzen et al. Nyc3dcars: A dataset of 3d vehicles in geographic context
WO2023024697A1 (en) Image stitching method and electronic device
CN111382613B (en) Image processing method, device, equipment and medium
CN114095662B (en) Shooting guide method and electronic equipment
JP2012155391A (en) Posture state estimation device and posture state estimation method
WO2023279799A1 (en) Object identification method and apparatus, and electronic system
Zhang et al. Video extrapolation in space and time
Fu et al. Image Stitching Techniques Applied to Plane or 3D Models: A Review
Han et al. Relating view directions of complementary-view mobile cameras via the human shadow
CN116977804A (en) Image fusion method, electronic device, storage medium and computer program product
CN115620403A (en) Living body detection method, electronic device, and storage medium
Yang et al. Towards generic 3d tracking in RGBD videos: Benchmark and baseline
KR20180069312A (en) Method for tracking of object using light field video and apparatus thereof
CN113592777A (en) Image fusion method and device for double-shooting and electronic system
Lin et al. A Multi‐Person Selfie System via Augmented Reality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22851729

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE