WO2024022301A1 - 视角路径获取方法、装置、电子设备及介质 - Google Patents

视角路径获取方法、装置、电子设备及介质 Download PDF

Info

Publication number
WO2024022301A1
WO2024022301A1 PCT/CN2023/108962 CN2023108962W WO2024022301A1 WO 2024022301 A1 WO2024022301 A1 WO 2024022301A1 CN 2023108962 W CN2023108962 W CN 2023108962W WO 2024022301 A1 WO2024022301 A1 WO 2024022301A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
frame image
perspective
key frame
image
Prior art date
Application number
PCT/CN2023/108962
Other languages
English (en)
French (fr)
Inventor
符峥
龙良曲
姜文杰
Original Assignee
影石创新科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 影石创新科技股份有限公司 filed Critical 影石创新科技股份有限公司
Publication of WO2024022301A1 publication Critical patent/WO2024022301A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • G06T3/047Fisheye or wide-angle transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/08Projecting images onto non-planar surfaces, e.g. geodetic screens
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present application relates to the field of video processing technology, and in particular to a perspective path acquisition method, device, electronic equipment and media.
  • Panoramic video refers to all scenes around an observation point in space, consisting of all the light that the observation point can receive.
  • the perspective path of the panoramic video can be obtained.
  • Embodiments of the present invention provide a viewing angle path acquisition method, device, electronic equipment and media, which can acquire the viewing angle path of a panoramic video.
  • embodiments of the present invention provide a method for obtaining a perspective path, which includes: for a first key frame image of a panoramic video, obtaining a perspective target of the first key frame image, wherein each key frame of the panoramic video The image is obtained by performing frame extraction processing on the panoramic video; according to the perspective target of the first key frame image, the perspective target of the target frame image of the panoramic video is obtained, wherein the target frame image is located in the first Between the key frame image and the second key frame image, the second key frame image is the next key frame image of the first key frame image in the panoramic video; according to each of the obtained perspective targets, Obtain the perspective path of the panoramic video.
  • obtaining the perspective target of the target frame image of the panoramic video according to the perspective target of the first key frame image includes: based on the perspective target of the first key frame image, calculating the target frame The image is subjected to target tracking processing to obtain the perspective target of the target frame image.
  • the method further includes: performing frame extraction processing on each frame image located between the first key frame image and the second key frame image in the panoramic video to obtain at least one frame of the first key frame image. an image;
  • the first image of each frame is used as the target frame image.
  • obtaining at least one frame of the first image includes: obtaining at least one frame of the first image and other frames of images except the at least one frame of the first image; the method further includes: obtaining at least one frame of the first image according to the The viewing angle target of the first key frame image, the viewing angle target of the second key frame image and the viewing angle target of the first image in each frame are used to obtain the viewing angle targets of the other frame images.
  • the method further includes: obtaining a perspective target of the second key frame image;
  • Obtaining the perspective target of the target frame image of the panoramic video based on the perspective target of the first key frame image includes: based on the perspective target of the first key frame image and the perspective of the second key frame image Target, obtain the perspective target of the target frame image.
  • obtaining the perspective target of the first key frame image includes: performing target detection on the first key frame image to obtain each target in the first key frame image;
  • the feature evaluation strategy evaluates each of the targets to obtain an evaluation result of each of the targets; based on the evaluation results of each of the targets, the target with the optimal evaluation result is used as the perspective target of the first key frame image.
  • the evaluation of each of the targets according to a preset multi-dimensional feature evaluation strategy to obtain the evaluation results of each of the targets includes: performing the following operations on each of the targets: for multiple preset evaluations For each first evaluation dimension in the dimensions, evaluate the target based on the first evaluation dimension, and obtain the value of the target corresponding to the first evaluation dimension; according to the corresponding first evaluation of the target The numerical value of the dimension and the preset weight for the first evaluation dimension are used to obtain the score of the target corresponding to the first evaluation dimension; the sum of the scores of the target corresponding to each preset evaluation dimension is used as the The evaluation results of the stated goals.
  • obtaining the perspective path of the panoramic video according to each obtained perspective target includes: for each obtained perspective target, based on the bounding box corresponding to the perspective target, as described The center point of the bounding box is used as the viewpoint of the frame image where the viewpoint target is located; according to each obtained viewpoint, the viewpoint path of the panoramic video is obtained.
  • the method further includes: for each obtained perspective target, taking the center point of the bounding box corresponding to the perspective target as the center point of the visual field angle, and Boundary box, obtain the field of view angle of the frame image where the view angle target is located, wherein the view angle range corresponding to the view field angle is greater than or equal to the range of the bounding box; generate a frame image corresponding to the view angle target based on the field of view angle plane video frames; according to the perspective path of the panoramic video and each of the generated plane video frames, a plane video corresponding to the panoramic video is obtained.
  • embodiments of the present invention provide a perspective path acquisition device, including: a first acquisition module, configured to acquire the perspective target of the first key frame image of the panoramic video, wherein: Each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video; a second acquisition module is used to obtain the perspective of the target frame image of the panoramic video according to the perspective target of the first key frame image.
  • Target wherein the target frame image is located between the first key frame image and the second key frame image, and the second key frame image is the lower part of the first key frame image in the panoramic video.
  • a key frame image a third acquisition module, configured to obtain the perspective path of the panoramic video according to each of the obtained perspective targets.
  • embodiments of the present invention provide an electronic device, including a processor and a memory.
  • the memory is used to store program instructions
  • the processor is used to execute the program instructions to implement any one of the first aspects. the method described.
  • embodiments of the present invention provide a computer-readable storage medium.
  • Program instructions are stored on the computer-readable storage medium. When executed by a processor, the program instructions implement any one of the aspects described in the first aspect. Methods.
  • each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video.
  • the perspective target of the first key frame image is obtained, and for the first key frame image located at The target frame image between the image and the second key frame image, according to the perspective target of the first key frame image, obtains the perspective target of the target frame image of the panoramic video, and the second key frame image is the first key frame image in the panoramic video
  • the next key frame image is obtained, and then the perspective path of the panoramic video is obtained based on the obtained perspective targets. It can be seen that this embodiment can obtain the perspective path of the panoramic video.
  • Figure 1 is a schematic flow chart of a perspective path acquisition method provided by an embodiment of the present invention
  • Figure 2 is a schematic diagram for explaining panoramic video provided by an embodiment of the present invention.
  • Figure 3 is a schematic diagram of a viewing angle path provided by an embodiment of the present invention.
  • Figure 4 is a schematic diagram of a viewing angle path acquisition method provided by an embodiment of the present invention.
  • Figure 5 is a schematic diagram for explaining a viewing angle target acquisition method provided by an embodiment of the present invention.
  • Figure 6 is another schematic diagram for illustrating a viewing angle target acquisition method provided by an embodiment of the present invention.
  • Figure 7 is another schematic diagram for illustrating a viewing angle target acquisition method provided by an embodiment of the present invention.
  • Figure 8 is a block schematic diagram of a perspective path acquisition device provided by an embodiment of the present invention.
  • Figure 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • At least one of a, b and c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple.
  • first, second, etc. may be used to describe the set thresholds in the embodiments of the present invention, these set thresholds should not be limited to these terms. These terms are only used to distinguish set thresholds from each other.
  • the first set threshold may also be called a second set threshold, and similarly, the second set threshold may also be called a first set threshold.
  • Image saliency is an important visual feature in an image, reflecting the importance the human eye attaches to each area of the image.
  • Interpolation can refer to inserting a third pixel between the original two pixels and making its color equal to the average of the surrounding pixels.
  • the pixels on the spherical coordinates can be converted to a specified position on the plane coordinates according to the coordinate conversion. At this point there are gaps between pixels and interpolation is required to get a rectangular flat image/video.
  • Field of view refers to the two edges between the focus and the largest range of the screen after panoramic video projection. formed angle.
  • Panoramic video can be abstracted into a sphere centered on the observation point.
  • the projection focal length can be the distance from the center to the sphere, that is, the value of the projection focal length can be the radius corresponding to the sphere. Possibly, the projection focal length can have a value of 1.
  • the two dimensions ⁇ and ⁇ can be used to reflect the change of the viewpoint on the sphere, and the dimension ⁇ is used to reflect the change of the video shooting time t.
  • T in Figure 3 can represent the T-th frame video image
  • T+5 can represent the T+5-th frame video image. Each frame image between these two frame images is not shown in Figure 3.
  • the spherical image can be divided into several areas, and the perspective path planning is defined as the movement between areas, which is converted into a learnable solution and passed through the input.
  • a large number of manually labeled perspective path samples allow the model to learn, and ultimately the model automatically infers the optimal path on the panoramic video.
  • an embodiment of the present invention provides a perspective path acquisition method, including steps 101 to 103:
  • Step 101 For the first key frame image of the panoramic video, obtain the perspective target of the first key frame image, wherein each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video.
  • the panoramic video is subjected to frame extraction processing to achieve "key frame detection", and the extracted key frames are recorded as key frame images, that is, only the key frame images are detected to determine their perspective targets.
  • the frame extraction process for the panoramic video may be to extract each key frame starting from the first frame of the panoramic video or a specified position.
  • frame extraction can be implemented by extracting according to image content, extracting according to fixed time intervals, etc.
  • the target detection frequency can be detected once every 15 frames.
  • the bounding box of the perspective target can be a rectangular area or an area of other shapes, such as circles, ellipses, and free shapes.
  • the data of the bounding box can be the boundary data of the rectangular area or the corner point coordinates of two non-adjacent corner points of the rectangular area.
  • the panoramic video is subjected to frame extraction processing to extract key frame images, and only the extracted key frame images are subjected to perspective target detection, that is, the key frame images are the video images to be subjected to perspective target detection, and other frame images are View object detection is not performed, in which the view objects of other frame images can be determined by other means, that is, view object detection is not performed frame by frame.
  • the extracted key frame images may include the first key frame image (nth frame video image) and the second key frame image (n+4th frame video image) as shown in Figure 5 to Figure 7. ), the second key frame image is the next key frame image of the first key frame image in the panoramic video. There are multiple frames of video images (n+1 to n+3 frames of video images) between these two key frame images.
  • Step 102 Obtain the perspective target of the target frame image of the panoramic video according to the perspective target of the first key frame image, where the target frame image is located between the first key frame image and the second key frame image, and the second key frame image is The next keyframe image of the first keyframe image in the panoramic video.
  • the target frame image may be part or all of the frame video images between two adjacent key frame images.
  • this application can determine the perspective target of the other frame video image based on the perspective target of the most recent key frame image before the other frame video image. That is, for each other frame video image in the two adjacent key frame images, each frame in the two adjacent key frame images is determined based on the perspective target of the previous key frame image in the two adjacent key frame images. Viewpoint targets for other frames of video images.
  • the video image of each frame between the first key frame image and the second key frame image can be determined.
  • the specific determination method can be as follows:
  • the perspective target of each frame of video image between the first key frame image and the second key frame image can be determined based on the perspective target of the first key frame image according to method 1: target tracking.
  • method 1 can be used to determine the viewing angle target of the n+1-n+3-th frame video image based on the viewing angle target of the n-th frame video image.
  • method 2 can be used to determine the perspective target of the n+1-n+3-th frame video image based on the perspective target of the n-th frame video image and the perspective target of the n+4-th frame video image.
  • method 1 key frame tracking
  • method 2 can be combined to determine the perspective target of these frames of video images.
  • method 1 can be used to determine the perspective target of the n+2-th frame video image based on the perspective target of the n-th frame video image. Then, through method 2, the perspective target of the n+1th frame video image is determined based on the perspective target of the nth frame video image and the perspective target of the n+2th frame video image, and the perspective target of the n+2th frame video image is determined based on the perspective target. The target and the perspective target of the n+4th frame video image are used to determine the perspective target of the n+3th frame video image.
  • the viewing angle target can usually be the optimal viewing angle target of the video image.
  • Step 103 Obtain the perspective path of the panoramic video based on the obtained perspective targets.
  • this embodiment can achieve the acquisition of the perspective path of the panoramic video, and can greatly reduce the calculation cost of calculating the optimal perspective of the panoramic video.
  • step 102 Based on the above content, the implementation process of step 102 will be further described below with reference to Figures 5 to 7 respectively.
  • Obtaining the perspective target of the target frame image of the panoramic video according to the perspective target of the first key frame image includes: based on the perspective target of the first key frame image, The frame image is subjected to target tracking processing to obtain the perspective target of the target frame image.
  • method 1 target tracking method, based on the perspective target of the n-th frame video image, To determine the perspective target of the n+1 ⁇ n+3 frame video image.
  • the n+1-n+3-th frame video images can all be used as target frame images, that is, the viewpoint target of the n+1-n+3-th frame video image can be determined based on the above method 1.
  • target tracking can be performed based on the perspective target of the n-th frame video image, and the perspective target of the n+1th frame video image is obtained; and then target tracking is performed based on the perspective target of the n+1th frame video image, and the perspective target of the n+1th frame video image is obtained.
  • the method further includes: obtaining the perspective target of the second key frame image; and obtaining the perspective target of the target frame image of the panoramic video according to the perspective target of the first key frame image. , including: obtaining the perspective target of the target frame image based on the perspective target of the first key frame image and the perspective target of the second key frame image.
  • the n+1-n+3-th frame video images can all be used as target frame images, that is, the viewpoint target of the n+1-n+3-th frame video image can be determined based on the above method 2.
  • the viewpoint can be interpolated in the middle of the interval and the intermediate value is taken, if a rectangular corner point of the perspective target of the m-th frame video image is in the image
  • the pixels in row A rectangular corner point of the perspective target of the video image can be the pixel point in the X+a/2th row and Y+b/2th column in the image.
  • a rectangular corner point of the perspective target of the m-th frame video image and a rectangular corner point of the perspective target of the m+2-th frame video image are both pixels in the X-th row and Y-th column of the image, then the m+1-th A rectangular corner point of the perspective target of the frame video image can also be the pixel point in the Xth row and Yth column of the image.
  • the n+2-th frame video image can be obtained by interpolating the viewpoint in the middle of the interval between the two and taking the intermediate value.
  • the viewing angle targets of the n+1 to n+3 frames of video images have been determined.
  • the viewpoint target of each frame of video image between two key frame images can be accurately determined.
  • the method further includes: comparing each frame image in the panoramic video between the first key frame image and the second key frame image. Perform frame extraction processing to obtain at least one first frame of image; each first frame of image is used as a target frame image.
  • frame extraction processing is performed between two key frame images to achieve "key frame tracking".
  • the video image of the extracted key frame is recorded as the first image, that is, only the first image is tracked to determine its Perspective target.
  • the tracking frequency can be tracked every 3 frames.
  • n+1 to n+3 video images only the n+2 video image can be used as the target frame image. That is, the viewpoint target of the n+2 video image can be determined based on the above method 1. , instead of determining the viewpoint targets of the n+1 and n+3th frame video images based on the above method 1 (can be based on the above method 2).
  • obtaining at least one first frame of image includes: obtaining at least one first frame of image and other frames of images except at least one first frame of image;
  • the method also includes: obtaining the perspective targets of each other frame image based on the perspective target of the first key frame image, the perspective target of the second key frame image and the perspective target of the first image of each frame.
  • frame extraction processing is performed between two key frame images.
  • the video image of the extracted key frame is recorded as the first image, and the remaining other frame images are not regarded as the first image.
  • the perspective target of the first image is determined based on the above-mentioned method 1, and the perspective targets of the remaining other frame images are determined based on the above-mentioned method 2.
  • frame extraction processing is performed between the first key frame image and the second key frame image, and the key frame of the n+2th video image can be extracted.
  • target tracking is performed based on the perspective target of the n-th frame video image, and the perspective target of the n+2-th frame video image is obtained.
  • the n+1-th frame video image is obtained by interpolating the viewpoint in the middle of the interval between the two and taking the intermediate value.
  • the perspective target and based on the perspective target of the n+2th frame video image and the perspective target of the n+4th frame video image, by interpolating the viewpoint in the middle of the interval between the two and taking the intermediate value, the n+3th frame video is obtained
  • the image's perspective target In this way, the viewing angle targets of the n+1 to n+3 frames of video images have been determined.
  • This embodiment combines the above-mentioned method 1 and method 2 to determine the viewing angle target of each frame of video image between two key frame images. It can not only achieve accurate determination of the viewing angle target of each frame of video image between two key frame images, but also greatly reduce the cost. The computational cost of calculating the optimal viewing angle of panoramic video.
  • the detection can be based on the multi-dimensional features of the video image (ie, multiple evaluation dimensions). That is, this embodiment can comprehensively evaluate the importance of the target in the current video frame from multiple quantifiable dimensions based on a priori or a model.
  • the multi-dimensional features can be features such as area, saliency, expression, action, etc.
  • obtaining the perspective target of the first key frame image includes: performing target detection on the first key frame image to obtain each target in the first key frame image; according to the preset The multi-dimensional feature evaluation strategy evaluates each target to obtain the evaluation results of each target; based on the evaluation results of each target, the target with the optimal evaluation result is used as the perspective target of the first key frame image.
  • target detection is first performed on the video image to detect each candidate target in the video image. These candidate targets can then be evaluated based on the multidimensional feature evaluation strategy in order to evaluate the optimal target.
  • the evaluation results obtained from the target evaluation can include area size, saliency, wonderful expressions, confidence in actions, etc.
  • the optimal target Based on the evaluation results of each target, the optimal target can be evaluated, and the optimal target can be used as the perspective target of the video image. Based on this, the optimal viewing angle path of the panoramic video can be obtained.
  • This embodiment uses multi-dimensional features to evaluate the importance of the target, which can accurately evaluate the target, make the target evaluation highly interpretable, and make it easy to formulate different evaluation strategies according to needs.
  • each evaluation dimension can be assigned a weight, which can be manually defined or obtained through machine learning.
  • weighted summation of each evaluation dimension can be performed to obtain the target evaluation result.
  • each target is evaluated according to a preset multi-dimensional feature evaluation strategy, and the evaluation results of each target are obtained, including:
  • each target can be evaluated sequentially or in parallel to obtain corresponding evaluation results.
  • the target for each evaluation dimension among the multiple evaluation dimensions, the target can be evaluated based on the evaluation dimension and the corresponding evaluation value can be obtained, and then combined with the preset weight of the evaluation dimension, the corresponding weighted evaluation value can be obtained. Then, the weighted evaluation values corresponding to each evaluation dimension can be summed to obtain the evaluation result of the target.
  • this embodiment can use the methods of "key frame detection target”, “multidimensional feature evaluation target”, “tracking target” and “key frame tracking target” to process the panoramic video to obtain the corresponding optimal perspective path.
  • This not only greatly reduces the computational cost of calculating the optimal viewing angle of panoramic video, but also improves the performance of optimal path planning for panoramic video.
  • the use of multi-dimensional features when evaluating the importance of targets is highly interpretable, making it easy to formulate different optimal viewing angle paths according to different needs. Planning Strategy.
  • this embodiment can be executed after the panoramic video is obtained, or can be executed during the process of obtaining the panoramic video, so that the perspective path of the panoramic video can be obtained in real time.
  • the original panoramic video may be formatted first, and step 101 may be executed based on the processed panoramic video.
  • the original panoramic video may refer to an original spherical video captured using a panoramic camera.
  • This embodiment can be implemented through a panoramic perspective model.
  • the original panoramic video can be converted into a format that can be processed by the panoramic perspective model.
  • the panoramic perspective model can perform part or all of the processing processes such as "key frame detection target”, “multidimensional feature evaluation target”, “tracking target”, “key frame tracking target”, etc., that is, the panoramic perspective model can include targets Some or all of the detection model, feature evaluation model, tracking model, evaluation strategy and parameters.
  • the original panoramic video can be processed such as projection, splicing, video format conversion, resolution conversion and other video processing methods according to the format requirements of the input data of the panoramic perspective model.
  • the bounding box of the perspective target when obtaining the perspective target of the video image, can be obtained specifically (that is, this embodiment can obtain the target bounding box series corresponding to each frame of the video image in the panoramic video). In this way, the center point of the bounding box can be used as the viewpoint of the frame image where the viewpoint target is located.
  • obtaining the perspective path of the panoramic video based on each obtained perspective target includes: for each obtained perspective target, based on the bounding box corresponding to the perspective target, based on the bounding box The center point is used as the viewpoint of the frame image where the viewpoint target is located; according to each obtained viewpoint, the viewpoint path of the panoramic video is obtained.
  • the bounding box of the perspective target can be a rectangular area, or an area of other shapes, such as a circle, an ellipse, or a free shape.
  • the data of the bounding box can be the corner point coordinates of two non-adjacent corner points of the rectangular area.
  • the center point of the bounding box can be determined, which is the center point of the perspective target and serves as the viewpoint of the frame image where the perspective target is located. Then, based on the determined viewpoint of each frame of video image and the sequence of changes in each frame of video image over time, the perspective path of the panoramic video can be obtained.
  • the panoramic video can be intelligently edited.
  • the panoramic video can be converted into a planar video based on the acquired perspective path.
  • it can be played to display the converted flat video.
  • other display methods can also be performed based on the determined perspective path, such as path display on a 2:1 panorama or spherical video.
  • the method further includes: for each obtained perspective target, taking the center point of the bounding box corresponding to the perspective target as the center point of the field of view angle, and The bounding box of the frame image is used to obtain the field of view angle of the frame image where the view target is located, where the view angle range corresponding to the view field angle is greater than or equal to the range of the bounding box; based on the field of view angle, a planar video frame corresponding to the frame image of the view angle target is generated; According to the perspective path of the panoramic video and each generated planar video frame, a planar video corresponding to the panoramic video is obtained.
  • the viewpoint of the video image is used as the center point of the field of view, and based on the specified focal length and the bounding box of the viewpoint target of the video image, the field of view can be determined The size of the horn.
  • the specified focal length can be a fixed projection focal length, or the focal length can be adaptive according to the field of view. The larger the focal length, the wider the field of view, and the wider the viewing angle of the flat video after projection.
  • the size of the field of view angle is not smaller than the range of the bounding box, so that the view angle target at least exists within the field of view range of the determined field of view angle.
  • the viewing angle range of the field of view may be equal to the range of the bounding box.
  • the range of the field of view can be larger than the range of the bounding box, but the maximum range does not exceed one arc of a circle (that is, it is not greater than 2 ⁇ ).
  • a planar video frame of the video image can be generated.
  • the panoramic video can be converted from spherical coordinates to plane coordinates according to the field of view and projection focal length, and combined with interpolation pixel technology to obtain rectangular planar video frames that can be played by ordinary devices.
  • planar video corresponding to the panoramic video can be obtained according to the perspective path and each generated planar video frame.
  • each planar video frame can be spliced into a planar video through splicing.
  • the resulting flat video can be played on common devices for viewing by users.
  • this embodiment can be executed after the panoramic video is obtained, or during the process of obtaining the panoramic video, so that the corresponding planar video of the panoramic video can be obtained in real time.
  • the moving window average method can perform smooth path processing to obtain the processed smooth viewpoint path, and then based on the smoothed
  • the viewpoint path is used to determine the sequence of field angles.
  • the moving window averaging method can also be used to process the determined sequence of field of view angles, and then perform the conversion from panoramic video to planar video based on the smoothed sequence of field of view angles obtained. In this way, a flat video with better video effects can be obtained, and the user's viewing experience will be better.
  • an embodiment of the present invention provides a perspective path acquisition device 10 , which includes a first acquisition module 11 , a second acquisition module 12 , and a third acquisition module 13 .
  • the first acquisition module 11 is used to obtain the perspective target of the first key frame image of the panoramic video, wherein each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video.
  • the second acquisition module 12 is configured to acquire the perspective target of the target frame image of the panoramic video according to the perspective target of the first key frame image, where the target frame image is located between the first key frame image and the second key frame image.
  • the key frame image is the next key frame image of the first key frame image in the panoramic video.
  • the third acquisition module 13 is used to obtain the perspective path of the panoramic video according to each obtained perspective target.
  • the second acquisition module 12 is configured to perform target tracking processing on the target frame image according to the perspective target of the first key frame image to obtain the perspective target of the target frame image.
  • the perspective path acquisition device 10 further includes: a first module, the first module is used to perform processing on each frame image between the first key frame image and the second key frame image in the panoramic video. Frame extraction processing is performed to obtain at least one first frame of image; each first frame of image is used as a target frame image.
  • the first module is used to obtain at least one frame of the first image and each other frame of image except the at least one frame of the first image;
  • the perspective path acquisition device 10 also includes: a second module, the second module is used to obtain other frames based on the perspective target of the first key frame image, the perspective target of the second key frame image, and the perspective target of the first image in each frame. The image's perspective target.
  • the perspective path acquisition device 10 further includes: a third module, the third module is used to obtain the perspective target of the second key frame image;
  • the second acquisition module 12 is configured to obtain the perspective target of the target frame image based on the perspective target of the first key frame image and the perspective target of the second key frame image.
  • the first acquisition module 11 is used to perform target detection on the first key frame image to obtain each target in the first key frame image; and evaluate each target according to a preset multi-dimensional feature evaluation strategy. , obtain the evaluation results of each target; according to the evaluation results of each target, the target with the optimal evaluation result is used as the perspective target of the first key frame image.
  • the first acquisition module 11 is used to perform the following operations on each target: for each first evaluation dimension among multiple preset evaluation dimensions, evaluate the target based on the first evaluation dimension, and obtain The value of the target corresponding to the first evaluation dimension; according to the value of the target corresponding to the first evaluation dimension and the preset weight for the first evaluation dimension, the score corresponding to the first evaluation dimension of the target is obtained; based on the value of the target corresponding to each preset The sum of the scores of the evaluation dimensions is used as the evaluation result of the target.
  • the third acquisition module 13 is configured to, for each obtained perspective target, use the center point of the bounding box as the viewpoint of the frame image where the perspective target is located according to the bounding box corresponding to the perspective target; according to the obtained From each viewpoint, the perspective path of the panoramic video is obtained.
  • the viewing angle path acquisition device 10 further includes: a fourth module.
  • the fourth module is configured to use the center point of the bounding box corresponding to the viewing angle target as the center of the viewing angle for each obtained viewing angle target. point, and based on the bounding box corresponding to the perspective target, the field of view angle of the frame image where the perspective target is located is obtained, where the field of view range corresponding to the field of view angle is greater than or equal to the range of the bounding box; based on the field of view angle, the corresponding perspective target is generated
  • the plane video frame of the frame image; according to the perspective path of the panoramic video and each generated plane video frame, the plane corresponding to the panoramic video is obtained video.
  • An embodiment of the present invention also provides an electronic device, including a processor and a memory.
  • the memory is used to store program instructions
  • the processor is used to execute the program instructions to implement the method in any of the above method embodiments.
  • Embodiments of the present invention also provide a computer-readable storage medium.
  • the computer-readable storage medium stores program instructions. When executed by a processor, the program instructions implement the method of any of the above method embodiments.
  • Figure 9 is a schematic diagram of a computer device provided by an embodiment of the present invention.
  • the computer device 20 of this embodiment includes: a processor 21 and a memory 22.
  • the memory 22 is used to store a computer program 23 that can be run on the processor 21.
  • the computer program 23 is implemented when executed by the processor 21. To avoid repetition, the steps in the method embodiments of the present invention will not be described one by one here.
  • the computer program 23 is executed by the processor 21, the functions of each model/unit in the device embodiment of the present invention are implemented. To avoid repetition, they will not be described one by one here.
  • Computer device 20 includes, but is not limited to, processor 21 and memory 22 .
  • FIG. 9 is only an example of the computer device 20 and does not constitute a limitation on the computer device 20. It may include more or less components than shown, or some components may be combined, or different components may be used. , for example, computer equipment may also include input and output devices, network access equipment, buses, etc.
  • the processor 21 can be a central processing unit (Central Processing Unit, CPU), or other general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application specific integrated circuit, ASIC), or an on-site processor.
  • Programmable gate array Field-Programmable Gate Array, FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, etc.
  • the memory 22 may be an internal storage unit of the computer device 20, such as a hard disk or memory of the computer device 20.
  • the memory 22 may also be an external storage device of the computer device 20, such as a plug-in hard disk, a smart storage (SM) card, a secure digital (SD) card, or a flash card (FlashCard) equipped on the computer device 20. wait.
  • the memory 22 may also include both an internal storage unit of the computer device 20 and an external storage device.
  • the memory 22 is used to store computer programs 23 as well as other programs and data required by the computer equipment.
  • the memory 22 may also be used to temporarily store data that has been output or is to be output.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined. Either it can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separate.
  • a component shown as a unit may or may not be a physical unit, that is, it may be located in one place, or it may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in various embodiments of the present invention can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium and includes a number of instructions to cause a computer device (which can be a personal computer, server, or network device, etc.) or processor (Processor) to execute the methods described in various embodiments of the present invention. Some steps.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供一种视角路径获取方法、装置、电子设备及介质,该方法包括:对于全景视频的第一关键帧图像,获取所述第一关键帧图像的视角目标,其中,所述全景视频的各个关键帧图像经对所述全景视频进行抽帧处理得到;根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,其中,所述目标帧图像位于所述第一关键帧图像和第二关键帧图像之间,所述第二关键帧图像为所述全景视频中的、所述第一关键帧图像的下一个关键帧图像;根据获得的各个所述视角目标,获得所述全景视频的视角路径。

Description

视角路径获取方法、装置、电子设备及介质 技术领域
本申请涉及视频处理技术领域,特别涉及一种视角路径获取方法、装置、电子设备及介质。
背景技术
全景视频是指空间中一个观察点四周所有的场景,由观察点所能接收到的所有光线构成。对于拍摄的全景视频,可以获取全景视频的视角路径。
发明内容
本发明实施例提供了一种视角路径获取方法、装置、电子设备及介质,能够获取全景视频的视角路径。
第一方面,本发明实施例提供一种视角路径获取方法,包括:对于全景视频的第一关键帧图像,获取所述第一关键帧图像的视角目标,其中,所述全景视频的各个关键帧图像经对所述全景视频进行抽帧处理得到;根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,其中,所述目标帧图像位于所述第一关键帧图像和第二关键帧图像之间,所述第二关键帧图像为所述全景视频中的、所述第一关键帧图像的下一个关键帧图像;根据获得的各个所述视角目标,获得所述全景视频的视角路径。
可选地,所述根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,包括:根据所述第一关键帧图像的视角目标,对所述目标帧图像进行目标跟踪处理,得到所述目标帧图像的视角目标。
可选地,所述方法还包括:对所述全景视频中的、位于所述第一关键帧图像和所述第二关键帧图像之间的各帧图像进行抽帧处理,得到至少一帧第一图像;
以每一帧所述第一图像分别作为所述目标帧图像。
可选地,所述得到至少一帧第一图像,包括:得到至少一帧第一图像、和除所述至少一帧第一图像之外的其他各帧图像;所述方法还包括:根据所述第一关键帧图像的视角目标、所述第二关键帧图像的视角目标和每一帧所述第一图像的视角目标,得到所述其他各帧图像的视角目标。
可选地,所述方法还包括:获取所述第二关键帧图像的视角目标;
所述根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,包括:根据所述第一关键帧图像的视角目标和所述第二关键帧图像的视角目标,得到所述目标帧图像的视角目标。
可选地,所述获取所述第一关键帧图像的视角目标,包括:对所述第一关键帧图像进行目标检测,获得所述第一关键帧图像中的各个目标;根据预设的多维特征评价策略,对所述各个目标进行评价,得到各个所述目标的评价结果;根据各个所述目标的评价结果,以具有最优评价结果的目标作为所述第一关键帧图像的视角目标。
可选地,所述根据预设的多维特征评价策略,对所述各个目标进行评价,得到各个所述目标的评价结果,包括:对各个所述目标分别执行以下操作:对于多个预设评价维度中的每一个第一评价维度,基于所述第一评价维度对所述目标进行评价,得到所述目标的对应所述第一评价维度的数值;根据所述目标的对应所述第一评价维度的数值和对于所述第一评价维度的预设权重,得到所述目标的对应所述第一评价维度的评分;以所述目标的对应每一预设评价维度的评分的和,作为所述目标的评价结果。
可选地,所述根据获得的各个所述视角目标,获得所述全景视频的视角路径,包括:对于获得的每一个所述视角目标,根据所述视角目标所对应的边界框,以所述边界框的中心点作为所述视角目标所在帧图像的视点;根据获得的各个所述视点,得到所述全景视频的视角路径。
可选地,所述方法还包括:对于获得的每一个所述视角目标,以所述视角目标所对应的边界框的中心点作为视场角的中心点,并根据所述视角目标所对应的边界框,获得所述视角目标所在帧图像的视场角,其中,视场角所对应的视角范围大于或者等于边界框的范围;根据所述视场角,生成对应所述视角目标所在帧图像的平面视频帧;根据所述全景视频的视角路径和生成的各个所述平面视频帧,获得对应所述全景视频的平面视频。
第二方面,本发明实施例提供一种视角路径获取装置,包括:第一获取模块,用于对于全景视频的第一关键帧图像,获取所述第一关键帧图像的视角目标,其中,所述全景视频的各个关键帧图像经对所述全景视频进行抽帧处理得到;第二获取模块,用于根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,其中,所述目标帧图像位于所述第一关键帧图像和第二关键帧图像之间,所述第二关键帧图像为所述全景视频中的、所述第一关键帧图像的下一个关键帧图像;第三获取模块,用于根据获得的各个所述视角目标,获得所述全景视频的视角路径。
第三方面,本发明实施例提供一种电子设备,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于执行所述程序指令,以实现如第一方面中任一项所述的方法。
第四方面,本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质上存储程序指令,所述程序指令在被处理器执行时实现如第一方面中任一项所述的方法。
本发明实施例中,经对全景视频进行抽帧处理得到全景视频的各个关键帧图像,对于全景视频的第一关键帧图像,获取第一关键帧图像的视角目标,而对于位于第一关键帧图像和第二关键帧图像之间的目标帧图像,根据第一关键帧图像的视角目标,获取全景视频的目标帧图像的视角目标,第二关键帧图像为全景视频中的第一关键帧图像的下一个关键帧图像,进而根据获得的各个视角目标,获得全景视频的视角路径。可见,本实施例能够获取全景视频的视角路径。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本发明实施例提供的一种视角路径获取方法的流程示意图;
图2为本发明实施例提供的一种用于说明全景视频的示意图;
图3为本发明实施例提供的一种视角路径的示意图;
图4为本发明实施例提供的一种视角路径获取方式的示意图;
图5为本发明实施例提供的一种用于说明视角目标获取方式的示意图;
图6为本发明实施例提供的另一种用于说明视角目标获取方式的示意图;
图7为本发明实施例提供的又一种用于说明视角目标获取方式的示意图;
图8为本发明实施例提供的一种视角路径获取装置的方框原理图;
图9为本发明实施例提供的一种电子设备的结构示意图。
具体实施方式
为了更好的理解本发明的技术方案,下面结合附图对本发明实施例进行详细描述。
应当明确,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
在本发明实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本发明实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。
应当理解,本文中使用的术语“至少一个”是指一个或者多个,“多个”是指两个或两个以上。本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。“以下至少一项”及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
应当理解,尽管在本发明实施例中可能采用术语第一、第二等来描述设定阈值,但这些设定阈值不应限于这些术语。这些术语仅用来将设定阈值彼此区分开。例如,在不脱离本发明实施例范围的情况下,第一设定阈值也可以被称为第二设定阈值,类似地,第二设定阈值也可以被称为第一设定阈值。
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
在介绍本申请实施例之前,首先对本申请所涉及的一些技术术语进行说明。
显著性,在本申请中通常指图像显著性。图像显著性是图像中重要的视觉特征,体现出人眼对图像各区域的重视程度。
插值,可以指在原来两个像素间***第三个像素,并使其颜色等于位于其周围像素的平均。比如,在将全景视频转换成平面视频时,可以按照坐标转换,将球面坐标上的像素转换到平面坐标上的指定位置。此时像素之间存在空隙,需要插值才能得到矩形的平面图像/视频。
视场角(FOV,field of view),指全景视频投影后焦点到画面最大范围的两条边缘构 成的夹角。
请参考图2,全景视频可以抽象成一个以观察点为中心的球面。投影焦距可以为中心至球面的距离,即投影焦距的值可以为球面所对应的半径。可行地,投影焦距的值可以为1。
请参考图2和图3,Φ和θ这两个维度可用于体现视点在球面上的变化,Ω这个维度用于体现视频拍摄时间t的变化。
请参考图2,图2中的T=5可以表示相应的5帧视频图像。
请参考图3,图3中的T可以表示第T帧视频图像,T+5可以表示第T+5帧视频图像,这两帧图像之间的各帧图像在图3中未示出。
如图3所示,基于每一帧视频图像上的视点、以及各帧视频图像随时间变化的先后顺序,可以得到如图3虚线所示的视点路径。
在介绍本申请实施例之前,对已有的一种视角路径获取方式进行说明。
在可行的一种实现方式中,请参考图3,可以将球面图像切分为若干个区域,将视角路径规划定义为在区域间的运动,将其转化为一个可学习的方案,并通过输入大量人工标注的视角路径样本让模型学习,最终让模型自动推理出全景视频上的最优路径。
但这一实现方式的最终效果强依赖于人工标注的视角路径样本,可解释性较差。同时模型的性能也较低。
下面,结合相应附图,对本申请实施例的技术实现进行说明。
如图1所示,本发明实施例提供了一种视角路径获取方法,包括步骤101~步骤103:
步骤101,对于全景视频的第一关键帧图像,获取第一关键帧图像的视角目标,其中,全景视频的各个关键帧图像经对全景视频进行抽帧处理得到。
本实施例中,对全景视频进行抽帧处理,以实现“关键帧检测”,抽得的关键帧记作关键帧图像,即仅对关键帧图像进行检测以确定其视角目标。
可行地,对全景视频进行抽帧处理,可以为从全景视频的第一帧或者指定位置开始,来抽取各个关键帧。其中,抽帧的实现方式可以有按照图像内容来抽取、根据固定时间间隔来抽取等。比如目标检测频率可以为每15帧检测一次。
视角目标的边界框可以为一个矩形区域,也可以为其他形状的区域,比如圆形、椭圆形、自由形状。视角目标的边界框为矩形区域时,边界框的数据可以为矩形区域的边界数据或者矩形区域的两个不相邻角点的角点坐标。
本实施例中,对全景视频进行抽帧处理,以抽取关键帧图像,仅对抽取的关键帧图像进行视角目标检测,即关键帧图像为待进行视角目标检测的视频图像,而对其他帧图像不进行视角目标检测,其中可以通过其他方式来确定其他帧图像的视角目标,即不是逐帧进行视角目标检测。
请参考图5-图7,抽取的关键帧图像可以包括如图5-图7所示的第一关键帧图像(第n帧视频图像)和第二关键帧图像(第n+4帧视频图像),第二关键帧图像为全景视频中的、第一关键帧图像的下一个关键帧图像。这两个关键帧图像之间存在有多帧视频图像(第n+1~n+3帧视频图像)。
由于本申请是通过“关键帧检测”以抽取关键帧图像进行视角目标检测,而非逐帧进行视角目标检测,故而可以大幅降低计算全景视频最优视角的计算代价。
步骤102,根据第一关键帧图像的视角目标,获取全景视频的目标帧图像的视角目标,其中,目标帧图像位于第一关键帧图像和第二关键帧图像之间,第二关键帧图像为全景视频中的、第一关键帧图像的下一个关键帧图像。
目标帧图像可以为相邻两帧关键帧图像之间的部分或者全部帧视频图像。
对于不进行视角目标检测的其他帧视频图像,本申请可以根据该其他帧视频图像之前的、最近一个关键帧图像的视角目标,来确定该其他帧视频图像的视角目标。即对于相邻两个关键帧图像中的各其他帧视频图像,均根据该相邻两个关键帧图像中在前关键帧图像的视角目标,来确定该相邻两个关键帧图像中的各其他帧视频图像的视角目标。
比如请参考图5-图7,可以根据第一关键帧图像(即第n帧视频图像)的视角目标,来确定第一关键帧图像和第二关键帧图像之间各帧视频图像(第n+1~n+3帧视频图像)的视角目标,具体确定方式可以如下所述:
请参考图5,可以根据方式1:目标跟踪的方式,来基于第一关键帧图像的视角目标,来确定第一关键帧图像和第二关键帧图像之间各帧视频图像的视角目标。
比如,可以通过方式1,根据第n帧视频图像的视角目标,来确定第n+1~n+3帧视频图像的视角目标。
请参考图7,也可以根据方式2:在间隔中间可对视点进行插值并取中间值的方式,来基于第一关键帧图像的视角目标和第二关键帧图像的视角目标,来确定第一关键帧图像和第二关键帧图像之间各帧视频图像的视角目标。
比如,可以通过方式2,根据第n帧视频图像的视角目标和第n+4帧视频图像的视角目标,来确定第n+1~n+3帧视频图像的视角目标。
此外,请参考图6,在方式1中,也可以采用“关键帧跟踪”而非逐帧跟踪的方式来实现。对于未被跟踪的各帧视频图像,可以结合方式2来确定这些帧视频图像的视角目标。
比如,可以通过方式1,根据第n帧视频图像的视角目标,来确定第n+2帧视频图像的视角目标。进而通过方式2,根据第n帧视频图像的视角目标和第n+2帧视频图像的视角目标,来确定第n+1帧视频图像的视角目标,以及根据第n+2帧视频图像的视角目标和第n+4帧视频图像的视角目标,来确定第n+3帧视频图像的视角目标。
如此,可以获得全景视频中各帧视频图像的视角目标。该视角目标通常可以为视频图像的最优视角目标。
步骤103,根据获得的各个视角目标,获得全景视频的视角路径。
可见,本实施例可以实现全景视频的视角路径的获取,且可以大幅降低计算全景视频最优视角的计算代价。
基于上述内容,下面分别结合图5-图7,对步骤102的实现过程作进一步的说明。
在本发明一个实施例中,请参考图5,所述根据第一关键帧图像的视角目标,获取全景视频的目标帧图像的视角目标,包括:根据第一关键帧图像的视角目标,对目标帧图像进行目标跟踪处理,得到目标帧图像的视角目标。
请参考图5,可以通过方式1:目标跟踪的方式,根据第n帧视频图像的视角目标, 来确定第n+1~n+3帧视频图像的视角目标。
本实施例中,第n+1~n+3帧视频图像均可以作为目标帧图像,即可以基于上述方式1来确定第n+1~n+3帧视频图像的视点目标。
如图5所示,可以基于第n帧视频图像的视角目标进行目标跟踪,得到第n+1帧视频图像的视角目标;再基于第n+1帧视频图像的视角目标进行目标跟踪,得到第n+2帧视频图像的视角目标;再基于第n+2帧视频图像的视角目标进行目标跟踪,得到第n+3帧视频图像的视角目标。由于下一帧视频图像(第n+4帧视频图像)为第二关键帧图像,则结束目标跟踪。
可见,本实施例通过目标跟踪,可以实现两关键帧图像间各帧视频图像的视角目标的准确确定。
在本发明一个实施例中,请参考图7,该方法还包括:获取第二关键帧图像的视角目标;所述根据第一关键帧图像的视角目标,获取全景视频的目标帧图像的视角目标,包括:根据第一关键帧图像的视角目标和第二关键帧图像的视角目标,得到目标帧图像的视角目标。
请参考图7,可以通过方式2:在间隔中间可对视点进行插值并取中间值的方式,根据第n帧视频图像的视角目标和第n+4帧视频图像的视角目标,来确定第n+1~n+3帧视频图像的视角目标。
本实施例中,第n+1~n+3帧视频图像均可以作为目标帧图像,即可以基于上述方式2来确定第n+1~n+3帧视频图像的视点目标。
以视角目标的边界框为一个矩形区域为例,对于方式2:在间隔中间可对视点进行插值并取中间值的方式,若第m帧视频图像的视角目标的一个矩形角点为图像中的第X行第Y列的像素点,而第m+2帧视频图像的视角目标的一个矩形角点为图像中的第X+a行第Y+b列的像素点,则第m+1帧视频图像的视角目标的一个矩形角点可以为图像中的第X+a/2行第Y+b/2列的像素点。
若第m帧视频图像的视角目标的一个矩形角点和第m+2帧视频图像的视角目标的一个矩形角点均为图像中的第X行第Y列的像素点,则第m+1帧视频图像的视角目标的一个矩形角点也可以为图像中的第X行第Y列的像素点。
请参考图7,可以基于第n帧视频图像的视角目标和第n+4帧视频图像的视角目标,通过在两者间隔中间对视点进行插值并取中间值,得到第n+2帧视频图像的视角目标;再基于第n帧视频图像的视角目标和第n+2帧视频图像的视角目标,通过在两者间隔中间对视点进行插值并取中间值,得到第n+1帧视频图像的视角目标;以及基于第n+2帧视频图像的视角目标和第n+4帧视频图像的视角目标,通过在两者间隔中间对视点进行插值并取中间值,得到第n+3帧视频图像的视角目标。如此,第n+1~n+3帧视频图像的视角目标均已确定。
可见,本实施例通过在间隔中间对视点进行插值,可以实现两关键帧图像间各帧视频图像的视角目标的准确确定。
在本发明一个实施例中,请参考图6,在本发明一个实施例中,该方法还包括:对全景视频中的、位于第一关键帧图像和第二关键帧图像之间的各帧图像进行抽帧处理,得到至少一帧第一图像;以每一帧第一图像分别作为目标帧图像。
本实施例中,在两个关键帧图像间进行抽帧处理,以实现“关键帧跟踪”,抽得的关键帧的视频图像记作第一图像,即仅对第一图像进行跟踪以确定其视角目标。比如跟踪频率可以为每3帧跟踪一次。
请参考图6,第n+1~n+3帧视频图像中,可以仅第n+2帧视频图像作为目标帧图像,即可以基于上述方式1来确定第n+2帧视频图像的视点目标,而不基于上述方式1(可基于上述方式2)来确定第n+1、n+3帧视频图像的视点目标。
基于上述内容,在本发明一个实施例中,所述得到至少一帧第一图像,包括:得到至少一帧第一图像、和除至少一帧第一图像之外的其他各帧图像;
该方法还包括:根据第一关键帧图像的视角目标、第二关键帧图像的视角目标和每一帧第一图像的视角目标,得到其他各帧图像的视角目标。
本实施例中,在两个关键帧图像间进行抽帧处理,抽得的关键帧的视频图像记作第一图像,剩余的其他各帧图像不作为第一图像。
对第一图像基于上述方式1来确定其视角目标,而对剩余的其他各帧图像则基于上述方式2来确定其视角目标。
如图6所示,首先在第一关键帧图像和第二关键帧图像间进行抽帧处理,可以抽得第n+2帧视频图像这一关键帧。然后通过上述方式1,基于第n帧视频图像的视角目标进行目标跟踪,得到第n+2帧视频图像的视角目标。之后通过上述方式2,基于第n帧视频图像的视角目标和第n+2帧视频图像的视角目标,通过在两者间隔中间对视点进行插值并取中间值,得到第n+1帧视频图像的视角目标;以及基于第n+2帧视频图像的视角目标和第n+4帧视频图像的视角目标,通过在两者间隔中间对视点进行插值并取中间值,得到第n+3帧视频图像的视角目标。如此,第n+1~n+3帧视频图像的视角目标均已确定。
本实施例综合上述方式1和方式2来确定两个关键帧图像间的各帧视频图像的视角目标,不仅可以实现两关键帧图像间各帧视频图像的视角目标的准确确定,还可大幅降低计算全景视频最优视角的计算代价。
对视频图像进行目标检测以获得视角目标时,可以基于视频图像的多维特征(即多个评价维度)进行检测。即本实施例可以根据先验或模型,从多个可被量化的维度,综合评价目标在当前视频帧中的重要程度。
该多维特征可以为面积、显著性、表情、动作等特征。
基于此,在本发明一个实施例中,所述获取第一关键帧图像的视角目标,包括:对第一关键帧图像进行目标检测,获得第一关键帧图像中的各个目标;根据预设的多维特征评价策略,对各个目标进行评价,得到各个目标的评价结果;根据各个目标的评价结果,以具有最优评价结果的目标作为第一关键帧图像的视角目标。
本实施例中,首先对视频图像进行目标检测,以检测得到视频图像中的每一个候选目标。进而可以基于多维特征评价策略对这些候选目标进行评价,以便评价出最优目标。
以多维特征为面积、显著性、表情、动作等特征为例,进行目标评价得到的评价结果可以包括面积大小、显著性、精彩表情、动作的置信度等。
基于各个目标的评价结果,可以评价出最优目标,并以最优目标作为视频图像的视角目标。进而据此可以获得全景视频的最优视角路径。
本实施例评价目标重要程度时使用多维特征进行评价,能够准确的评价目标,使得目标评价的可解释性强,且易于根据需求制定不同的评价策略。
可行地,每个评价维度均可被赋予一个权重,该权重可人工定义,也可通过机器学习获得。从而可以对各个评价维度作加权求和,以得到目标评价结果。
基于此,在本发明一个实施例中,所述根据预设的多维特征评价策略,对各个目标进行评价,得到各个目标的评价结果,包括:
对各个目标分别执行以下操作:对于多个预设评价维度中的每一个第一评价维度,基于第一评价维度对目标进行评价,得到目标的对应第一评价维度的数值;根据目标的对应第一评价维度的数值和对于第一评价维度的预设权重,得到目标的对应第一评价维度的评分;以目标的对应每一预设评价维度的评分的和,作为目标的评价结果。
本实施例中,可以依次或并行的对每一个目标进行评价,以得到相应的评价结果。
其中,对于多个评价维度中的每一个评价维度,可以基于该评价维度对目标进行评价,得到相应的评价值,进而结合该评价维度的预设权重,可以得到相应的加权评价值。进而可以对各个评价维度对应的加权评价值进行求和,来得到目标的评价结果。
由上可知,本实施例可以采用“关键帧检测目标”、“多维特征评价目标”、“跟踪目标”、“关键帧跟踪目标”的方式,来处理全景视频以得到相应的最优视角路径,如此不仅可以大幅降低计算全景视频最优视角的计算代价,提升全景视频最优路径规划的性能,且评价目标重要程度时使用多维特征可解释性强,易于根据不同需求制定不同的最优视角路径规划策略。
需要说明的是,本实施例可以在获得全景视频后执行,也可以在获得全景视频的过程中执行,如此可以实时的获得全景视频的视角路径。
在本发明一个实施例中,在步骤101之前,可以先对原始全景视频进行格式处理,并基于处理后得到的全景视频来执行步骤101。
该原始全景视频可以指使用全景相机拍摄的原始球面视频。
本实施例可以通过全景视角模型来实现,经格式处理,可以将原始全景视频转换为可被全景视角模型处理的格式。其中,该全景视角模型可以执行“关键帧检测目标”、“多维特征评价目标”、“跟踪目标”、“关键帧跟踪目标”等处理过程中的部分或全部,即该全景视角模型可以包括目标检测模型、特征评价模型、跟踪模型、评价策略与参数中的部分或全部。
可行地,可以根据全景视角模型对输入数据的格式需求,对原始全景视频进行如投影、拼接、视频格式转换、分辨率转换以及其他视频处理方式的处理。
本实施例中,在获得视频图像的视角目标时,具体可以获得视角目标的边界框(即本实施例可以获得对应全景视频中各帧视频图像的目标边界框系列)。如此,可以边界框的中心点作为视角目标所在帧图像的视点。
基于此,在本发明一个实施例中,所述根据获得的各个视角目标,获得全景视频的视角路径,包括:对于获得的每一个视角目标,根据视角目标所对应的边界框,以边界框的中心点作为视角目标所在帧图像的视点;根据获得的各个视点,得到全景视频的视角路径。
视角目标的边界框可以为一个矩形区域,也可以为其他形状的区域,比如圆形、椭圆形、自由形状。视角目标的边界框为矩形区域时,边界框的数据可以为矩形区域的两个不相邻角点的角点坐标。
根据视角目标的边界框的数据,可以确定出边界框的中心点,该中心点即可以为视角目标的中心点,并作为视角目标所在帧图像的视点。进而基于确定出的各帧视频图像的视点、以及各帧视频图像随时间变化的先后顺序,可以得到全景视频的视角路径。
基于获取的全景视频的视角路径,可以对全景视频进行智能剪辑。在一种可行地实现方式中,可以基于获取的视角路径,将全景视频转换为平面视频。进而,可播放以展示转换得到的平面视频。在其他可行地实现方式中,还可基于确定出的视角路径进行其他方式的展示,比如可以2:1全景图或球面视频上的路径展示。
基于上述内容,在本发明一个实施例中,该方法还包括:对于获得的每一个视角目标,以视角目标所对应的边界框的中心点作为视场角的中心点,并根据视角目标所对应的边界框,获得视角目标所在帧图像的视场角,其中,视场角所对应的视角范围大于或者等于边界框的范围;根据视场角,生成对应视角目标所在帧图像的平面视频帧;根据全景视频的视角路径和生成的各个平面视频帧,获得对应全景视频的平面视频。
本实施例中,对于任意一帧视频图像,以该视频图像的视点作为视场角的中心点,并基于指定的焦距,以及结合该视频图像的视角目标的边界框,可以确定出该视场角的大小。
该指定的焦距可以为固定的投影焦距,也可根据视场角大小自适应焦距,焦距越大,视场角视野范围越大,投影后得到平面视频的视角越广。
其中,视场角的大小不小于边界框的范围,使得视角目标至少存在于所确定出的视场角的视场范围中。如此,在一种实现方式中,视场角的视角范围可以等于边界框的范围。而在其他实现方式中,视场角的视角范围可以大于边界框的范围,但最大不超过一周弧度(即不大于2π)。
基于投影焦距和每一帧视频图像的视场角,可以生成该视频图像的平面视频帧。可行地,可以将全景视频按照视场角与投影焦距,将像素从球面坐标转换为平面坐标,并结合插值像素技术来得到普通设备可播放的矩形平面视频帧。
进而,可根据视角路径和生成的各个平面视频帧,获得全景视频所对应的平面视频,比如具体可以通过拼接的方式,将各个平面视频帧拼接为平面视频。获得的平面视频可在普通设备上播放以供用户观看。
需要说明的是,本实施例可以在获得全景视频后执行,也可以在获得全景视频的过程中执行,如此可以实时的获得全景视频的相应平面视频。
考虑到每一帧视频图像的视场角通常不同,故而在获得全景视频的视点路径后,可以先使用移动窗口平均法执行平滑路径处理,得到处理得到的平滑视点路径,再基于处理得到的平滑视点路径来确定视场角序列。以及还可以使用移动窗口平均法,处理确定出的视场角序列,进而基于处理得到的平滑视场角序列,来执行由全景视频至平面视频的转化。如此,可以得到视频效果更佳的平面视频,用户观看体验更佳。
如图8所示,本发明实施例提供了一种视角路径获取装置10,包括第一获取模块11、第二获取模块12和第三获取模块13。
其中,第一获取模块11用于对于全景视频的第一关键帧图像,获取第一关键帧图像的视角目标,其中,全景视频的各个关键帧图像经对全景视频进行抽帧处理得到。第二获取模块12用于根据第一关键帧图像的视角目标,获取全景视频的目标帧图像的视角目标,其中,目标帧图像位于第一关键帧图像和第二关键帧图像之间,第二关键帧图像为全景视频中的、第一关键帧图像的下一个关键帧图像。第三获取模块13用于根据获得的各个视角目标,获得全景视频的视角路径。
在本发明一个实施例中,第二获取模块12用于根据第一关键帧图像的视角目标,对目标帧图像进行目标跟踪处理,得到目标帧图像的视角目标。
在本发明一个实施例中,视角路径获取装置10还包括:第一模块,第一模块用于对全景视频中的、位于第一关键帧图像和第二关键帧图像之间的各帧图像进行抽帧处理,得到至少一帧第一图像;以每一帧第一图像分别作为目标帧图像。
在本发明一个实施例中,第一模块用于得到至少一帧第一图像、和除至少一帧第一图像之外的其他各帧图像;
视角路径获取装置10还包括:第二模块,第二模块用于根据第一关键帧图像的视角目标、第二关键帧图像的视角目标和每一帧第一图像的视角目标,得到其他各帧图像的视角目标。
在本发明一个实施例中,视角路径获取装置10还包括:第三模块,第三模块用于获取第二关键帧图像的视角目标;
第二获取模块12用于根据第一关键帧图像的视角目标和第二关键帧图像的视角目标,得到目标帧图像的视角目标。
在本发明一个实施例中,第一获取模块11用于对第一关键帧图像进行目标检测,获得第一关键帧图像中的各个目标;根据预设的多维特征评价策略,对各个目标进行评价,得到各个目标的评价结果;根据各个目标的评价结果,以具有最优评价结果的目标作为第一关键帧图像的视角目标。
在本发明一个实施例中,第一获取模块11用于对各个目标分别执行以下操作:对于多个预设评价维度中的每一个第一评价维度,基于第一评价维度对目标进行评价,得到目标的对应第一评价维度的数值;根据目标的对应第一评价维度的数值和对于第一评价维度的预设权重,得到目标的对应第一评价维度的评分;以目标的对应每一预设评价维度的评分的和,作为目标的评价结果。
在本发明一个实施例中,第三获取模块13用于对于获得的每一个视角目标,根据视角目标所对应的边界框,以边界框的中心点作为视角目标所在帧图像的视点;根据获得的各个视点,得到全景视频的视角路径。
在本发明一个实施例中,视角路径获取装置10还包括:第四模块,第四模块用于对于获得的每一个视角目标,以视角目标所对应的边界框的中心点作为视场角的中心点,并根据视角目标所对应的边界框,获得视角目标所在帧图像的视场角,其中,视场角所对应的视角范围大于或者等于边界框的范围;根据视场角,生成对应视角目标所在帧图像的平面视频帧;根据全景视频的视角路径和生成的各个平面视频帧,获得对应全景视频的平面 视频。
本发明实施例还提供了一种电子设备,包括处理器和存储器,存储器用于存储程序指令,处理器用于执行程序指令,以实现如以上任意方法实施例的方法。
本发明实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储程序指令,程序指令在被处理器执行时实现如以上任意方法实施例的方法。
图9为本发明实施例提供的一种计算机设备的示意图。如图9所示,该实施例的计算机设备20包括:处理器21和存储器22,存储器22用于存储可在处理器21上运行的计算机程序23,该计算机程序23被处理器21执行时实现本发明方法实施例中的步骤,为避免重复,此处不一一赘述。或者,该计算机程序23被处理器21执行时实现本发明装置实施例中各模型/单元的功能,为避免重复,此处不一一赘述。
计算机设备20包括但不仅限于处理器21和存储器22。本领域技术人员可以理解,图9仅仅是计算机设备20的示例,并不构成对计算机设备20的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如计算机设备还可以包括输入输出设备、网络接入设备、总线等。
处理器21可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器,或者该处理器也可以是任何常规的处理器等。
存储器22可以是计算机设备20的内部存储单元,例如计算机设备20的硬盘或内存。存储器22也可以是计算机设备20的外部存储设备,例如计算机设备20上配备的插接式硬盘,智能存储(Smart Media,SM)卡,安全数字(Secure Digital,SD)卡,闪存卡(FlashCard)等。进一步地,存储器22还可以既包括计算机设备20的内部存储单元也包括外部存储设备。存储器22用于存储计算机程序23以及计算机设备所需的其他程序和数据。存储器22还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本发明所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,可以是电性、机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机装置(可以是个人计算机,服务器,或者网络装置等)或处理器(Processor)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明保护的范围之内。

Claims (12)

  1. 一种视角路径获取方法,其特征在于,包括:
    对于全景视频的第一关键帧图像,获取所述第一关键帧图像的视角目标,其中,所述全景视频的各个关键帧图像经对所述全景视频进行抽帧处理得到;
    根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,其中,所述目标帧图像位于所述第一关键帧图像和第二关键帧图像之间,所述第二关键帧图像为所述全景视频中的、所述第一关键帧图像的下一个关键帧图像;
    根据获得的各个所述视角目标,获得所述全景视频的视角路径。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,包括:
    根据所述第一关键帧图像的视角目标,对所述目标帧图像进行目标跟踪处理,得到所述目标帧图像的视角目标。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    对所述全景视频中的、位于所述第一关键帧图像和所述第二关键帧图像之间的各帧图像进行抽帧处理,得到至少一帧第一图像;
    以每一帧所述第一图像分别作为所述目标帧图像。
  4. 根据权利要求3所述的方法,其特征在于,所述得到至少一帧第一图像,包括:得到至少一帧第一图像、和除所述至少一帧第一图像之外的其他各帧图像;
    所述方法还包括:根据所述第一关键帧图像的视角目标、所述第二关键帧图像的视角目标和每一帧所述第一图像的视角目标,得到所述其他各帧图像的视角目标。
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:获取所述第二关键帧图像的视角目标;
    所述根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,包括:
    根据所述第一关键帧图像的视角目标和所述第二关键帧图像的视角目标,得到所述目标帧图像的视角目标。
  6. 根据权利要求1所述的方法,其特征在于,所述获取所述第一关键帧图像的视角目标,包括:
    对所述第一关键帧图像进行目标检测,获得所述第一关键帧图像中的各个目标;
    根据预设的多维特征评价策略,对所述各个目标进行评价,得到各个所述目标的评价结果;
    根据各个所述目标的评价结果,以具有最优评价结果的目标作为所述第一关键帧图像的视角目标。
  7. 根据权利要求6所述的方法,其特征在于,所述根据预设的多维特征评价策略,对所述各个目标进行评价,得到各个所述目标的评价结果,包括:
    对各个所述目标分别执行以下操作:
    对于多个预设评价维度中的每一个第一评价维度,基于所述第一评价维度对所述目标进行评价,得到所述目标的对应所述第一评价维度的数值;
    根据所述目标的对应所述第一评价维度的数值和对于所述第一评价维度的预设权重,得到所述目标的对应所述第一评价维度的评分;
    以所述目标的对应每一预设评价维度的评分的和,作为所述目标的评价结果。
  8. 根据权利要求1所述的方法,其特征在于,所述根据获得的各个所述视角目标,获得所述全景视频的视角路径,包括:
    对于获得的每一个所述视角目标,根据所述视角目标所对应的边界框,以所述边界框的中心点作为所述视角目标所在帧图像的视点;
    根据获得的各个所述视点,得到所述全景视频的视角路径。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    对于获得的每一个所述视角目标,以所述视角目标所对应的边界框的中心点作为视场角的中心点,并根据所述视角目标所对应的边界框,获得所述视角目标所在帧图像的视场角,其中,视场角所对应的视角范围大于或者等于边界框的范围;
    根据所述视场角,生成对应所述视角目标所在帧图像的平面视频帧;
    根据所述全景视频的视角路径和生成的各个所述平面视频帧,获得对应所述全景视频的平面视频。
  10. 一种视角路径获取装置,其特征在于,包括:
    第一获取模块,用于对于全景视频的第一关键帧图像,获取所述第一关键帧图像的视角目标,其中,所述全景视频的各个关键帧图像经对所述全景视频进行抽帧处理得到;
    第二获取模块,用于根据所述第一关键帧图像的视角目标,获取所述全景视频的目标帧图像的视角目标,其中,所述目标帧图像位于所述第一关键帧图像和第二关键帧图像之间,所述第二关键帧图像为所述全景视频中的、所述第一关键帧图像的下一个关键帧图像;
    第三获取模块,用于根据获得的各个所述视角目标,获得所述全景视频的视角路径。
  11. 一种电子设备,其特征在于,包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于执行所述程序指令,以实现如权利要求1至9中任一项所述的方法。
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储程序指令,所述程序指令在被处理器执行时实现如权利要求1至9中任一项所述的方法。
PCT/CN2023/108962 2022-07-26 2023-07-24 视角路径获取方法、装置、电子设备及介质 WO2024022301A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210882895.0 2022-07-26
CN202210882895.0A CN115294493A (zh) 2022-07-26 2022-07-26 视角路径获取方法、装置、电子设备及介质

Publications (1)

Publication Number Publication Date
WO2024022301A1 true WO2024022301A1 (zh) 2024-02-01

Family

ID=83823781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/108962 WO2024022301A1 (zh) 2022-07-26 2023-07-24 视角路径获取方法、装置、电子设备及介质

Country Status (2)

Country Link
CN (1) CN115294493A (zh)
WO (1) WO2024022301A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294493A (zh) * 2022-07-26 2022-11-04 影石创新科技股份有限公司 视角路径获取方法、装置、电子设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170084001A1 (en) * 2015-09-22 2017-03-23 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
CN111163267A (zh) * 2020-01-07 2020-05-15 影石创新科技股份有限公司 一种全景视频剪辑方法、装置、设备及存储介质
CN113747138A (zh) * 2021-07-30 2021-12-03 杭州群核信息技术有限公司 虚拟场景的视频生成方法和装置、存储介质及电子设备
US20220006995A1 (en) * 2020-07-06 2022-01-06 Canon Kabushiki Kaisha Information processing apparatus, method of controlling information processing apparatus, and storage medium
CN114598810A (zh) * 2022-01-18 2022-06-07 影石创新科技股份有限公司 全景视频的自动剪辑方法、全景相机、计算机程序产品及可读存储介质
CN115294493A (zh) * 2022-07-26 2022-11-04 影石创新科技股份有限公司 视角路径获取方法、装置、电子设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170084001A1 (en) * 2015-09-22 2017-03-23 Fyusion, Inc. Artificially rendering images using viewpoint interpolation and extrapolation
CN111163267A (zh) * 2020-01-07 2020-05-15 影石创新科技股份有限公司 一种全景视频剪辑方法、装置、设备及存储介质
US20220006995A1 (en) * 2020-07-06 2022-01-06 Canon Kabushiki Kaisha Information processing apparatus, method of controlling information processing apparatus, and storage medium
CN113747138A (zh) * 2021-07-30 2021-12-03 杭州群核信息技术有限公司 虚拟场景的视频生成方法和装置、存储介质及电子设备
CN114598810A (zh) * 2022-01-18 2022-06-07 影石创新科技股份有限公司 全景视频的自动剪辑方法、全景相机、计算机程序产品及可读存储介质
CN115294493A (zh) * 2022-07-26 2022-11-04 影石创新科技股份有限公司 视角路径获取方法、装置、电子设备及介质

Also Published As

Publication number Publication date
CN115294493A (zh) 2022-11-04

Similar Documents

Publication Publication Date Title
CN109035304B (zh) 目标跟踪方法、介质、计算设备和装置
US10832069B2 (en) Living body detection method, electronic device and computer readable medium
US10254845B2 (en) Hand gesture recognition for cursor control
US9305240B2 (en) Motion aligned distance calculations for image comparisons
US7554575B2 (en) Fast imaging system calibration
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
US11600008B2 (en) Human-tracking methods, systems, and storage media
US11145080B2 (en) Method and apparatus for three-dimensional object pose estimation, device and storage medium
Tau et al. Dense correspondences across scenes and scales
US20210124928A1 (en) Object tracking methods and apparatuses, electronic devices and storage media
WO2018082308A1 (zh) 一种图像处理方法及终端
CN111091590A (zh) 图像处理方法、装置、存储介质及电子设备
CN112381071A (zh) 一种视频流中目标的行为分析方法、终端设备及介质
WO2024022301A1 (zh) 视角路径获取方法、装置、电子设备及介质
CN111209774A (zh) 目标行为识别及显示方法、装置、设备、可读介质
WO2021098587A1 (zh) 手势分析方法、装置、设备及计算机可读存储介质
CN111325107A (zh) 检测模型训练方法、装置、电子设备和可读存储介质
CN111667504A (zh) 一种人脸追踪方法、装置及设备
CN108229281B (zh) 神经网络的生成方法和人脸检测方法、装置及电子设备
Xiong et al. Snap angle prediction for 360 panoramas
CN114255493A (zh) 图像检测方法、人脸检测方法及装置、设备及存储介质
US20220122341A1 (en) Target detection method and apparatus, electronic device, and computer storage medium
CN111259702A (zh) 一种用户兴趣的估计方法及装置
CN111401285B (zh) 目标跟踪方法、装置及电子设备
CN109493349B (zh) 一种图像特征处理模块、增强现实设备和角点检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23845511

Country of ref document: EP

Kind code of ref document: A1