CN115103125B

CN115103125B - Guide broadcasting method and device

Info

Publication number: CN115103125B
Application number: CN202210826557.5A
Authority: CN
Inventors: 袁潮; 请求不公布姓名; 肖占中
Original assignee: Beijing Zhuohe Technology Co Ltd
Current assignee: Beijing Zhuohe Technology Co Ltd
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2023-05-12
Anticipated expiration: 2042-07-13
Also published as: CN115103125A

Abstract

The application provides a method and a device for guiding broadcast, wherein the method for guiding broadcast comprises the following steps: acquiring a global video; searching a scene matched with an instruction based on the instruction; acquiring a local video matched with the scene based on the scene; transforming the local video to a corresponding block of the global video to obtain a global fusion video matched with the scene; and playing the global fusion video in a first window, and playing the local video in a second window. The technical problem that the processing speed and the efficiency of the guide video are low in the prior art is solved.

Description

Guide broadcasting method and device

Technical Field

The present disclosure relates to the field of computer devices, and in particular, to a method and apparatus for broadcasting.

Background

Along with the improvement of the computing power of a computer and the resolution and visual field of a camera, the requirements of people on image videos and quality are higher and higher, and a high-resolution panoramic image is hoped to be obtained, so that a wider visual field can be obtained, and detailed information of the images and the videos is not lost. Then, a global video is photographed with a global camera, and a local video is photographed with a local camera. The global video has low pixels and cannot capture local details; the local video pixels are high but cannot exhibit their position in the global video. In the prior art, in order to present the position of a specific local video in a global video, fusion processing is generally performed on all the local videos and the global video, which results in large processing capacity and low processing efficiency of the video.

Disclosure of Invention

The embodiment of the invention provides a method and a device for guiding and broadcasting, which aim to solve the technical problem of low guiding and broadcasting video processing speed and efficiency in the prior art.

The invention provides a guide broadcasting method, which comprises the following steps:

acquiring a global video;

searching a scene matched with the instruction in the global video based on the instruction;

acquiring a local video matched with the scene based on the scene;

transforming the local video to a corresponding block of the global video to obtain a global fusion video matched with the scene;

and playing the global fusion video in a first window, and playing the local video in a second window.

Optionally, the step of transforming the local video to a corresponding block of the global video to obtain a global fusion video matched with the scene specifically includes: a block matching algorithm is adopted to find out the corresponding block of the local video in the global video; and registering the local video and the global video based on the corresponding blocks to obtain a matched global fusion video of the scene.

Optionally, the method further comprises: acquiring a light field video, sampling the light field video by a set magnification factor to obtain a sampled light field video, and performing Fourier transform on the sampled light field video to obtain a first video; after the step of transforming the local video to the corresponding block of the global video to obtain the global fusion video matched with the scene, the method further includes: performing high-pass filtering on the global fusion video to obtain a second video; performing linear addition on the first video and the second video, and performing Fourier change to obtain a third video; and playing the third video in a third window.

Optionally, the first window and the second window are displayed simultaneously under the same interface.

Optionally, a region corresponding to the local video is visually identified in the global fusion video.

Optionally, the instruction includes at least one of a voice instruction, a touch instruction, and a gesture instruction.

The embodiment of the application also provides a guiding and broadcasting device, which comprises:

the first acquisition module is used for acquiring a global video;

the searching module is used for searching a scene matched with the instruction in the global video based on the instruction;

the second acquisition module acquires a local video matched with the scene based on the scene;

the transformation module is used for transforming the local video to a corresponding block of the global video to obtain a global fusion video matched with the scene; and

and the playing module is used for playing the global fusion video in a first window and playing the local video in a second window.

Optionally, the transformation module is further adapted to: a block matching algorithm is adopted to find out the corresponding block of the local video in the global video; and registering the local video and the global video based on the corresponding blocks to obtain a matched global fusion video of the scene.

Optionally, the present application also proposes a computer readable storage medium, on which a computer program is stored, characterized in that the computer program when executed implements the steps of the method as described above.

Optionally, the application further proposes a computer device comprising a processor, a memory and a computer program stored on said memory, characterized in that said processor implements the steps of the method as described above when executing said computer program.

In the method, a scene matched with an instruction is acquired in a global video through the instruction, and then a local video matched with the scene is acquired in a searching mode; and then the local video is transformed to a corresponding block in the global video to obtain the global fusion video matched with the scene. At this time, the processing amount of the local video and the global video is small, and the processing efficiency is improved. Meanwhile, the global fusion video only presents high-resolution video on the corresponding block, and the video of other areas of the global fusion video still keeps low resolution, so that a user can capture the position of the local video related to the instruction in the global video in terms of visual effect. And simultaneously, playing the global fusion video in the first window and playing the local video in the second window, so that a user can compare the global fusion video with the local video, and the user can acquire interested information.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an application scenario schematic diagram of a multicast guiding device provided in an embodiment of the present application;

fig. 2 is a schematic flow chart of a multicast guiding method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a multicast guiding device according to an embodiment of the present application;

fig. 4 is an internal structural diagram of a computer device provided in an embodiment of the present application.

Description of the embodiments

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

It will be appreciated that "system," "apparatus," "unit" and/or "module" as used herein is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

Fig. 1 is a schematic view of an application scenario of a unicast device according to some embodiments of the present application. As shown in fig. 1, the unicast apparatus 100 may include a server 110, a network 120, a group of image capturing devices 130, and a memory 140.

Server 110 may process data and/or information acquired from at least one component of the lead apparatus 100 (e.g., image acquisition device group 130 and memory 140) or an external data source (e.g., a cloud data center). For example, the server 110 may obtain interaction instructions from the image capture device group 130. As another example, server 110 may also retrieve historical data from memory 140.

In some embodiments, server 110 may include a processing device 112. The processing device 112 may process information and/or data related to the human-machine interaction system to perform one or more of the functions described in this specification. For example, the processing device 112 may determine the imaging control strategy based on the interaction instructions and/or historical data. In some embodiments, the processing device 112 may include at least one processing unit (e.g., a single core processing engine or a multi-core processing engine). In some embodiments, the processing device 112 may be part of the image acquisition device group 130.

The network 120 may provide a channel for information exchange. In some embodiments, network 120 may include one or more network access points. One or more components of the multicast device 100 may connect to the network 120 through an access point to exchange data and/or information. In some embodiments, at least one component in the lead device 100 may access data or instructions stored in the memory 140 via the network 120.

The image capturing device group 130 may be composed of a plurality of image capturing devices, and the types of the image capturing devices are not limited, and may be, for example, a camera, a light field camera, or a mobile terminal having an image capturing function.

In some embodiments, memory 140 may store data and/or instructions that processing device 112 may perform or use to implement the exemplary methods described herein. For example, the memory 140 may store historical data. In some embodiments, memory 140 may be directly connected to server 110 as back-end memory. In some embodiments, memory 140 may be part of server 110, image capture device group 130.

Fig. 2 shows a flow chart of a method for guiding broadcast according to an embodiment of the application. Referring to fig. 2, the application further provides a guiding method, which includes the following steps:

acquiring a global video;

acquiring a scene matched with an instruction based on the instruction;

acquiring a local video matched with the scene based on the scene;

In the embodiment of the application, a scene matched with the instruction is acquired in the global video through the instruction, and then a local video matched with the scene is acquired through a searching mode; and then the local video is transformed to a corresponding block in the global video to obtain the global fusion video matched with the scene. At this time, the processing amount of the local video and the global video is small, and the processing efficiency is improved. Meanwhile, the global fusion video only presents high-resolution video on the corresponding block, and the video of other areas of the global fusion video still keeps low resolution, so that a user can capture the position of the local video related to the instruction in the global video in terms of visual effect. And simultaneously, playing the global fusion video in the first window and playing the local video in the second window, so that a user can compare the global fusion video with the local video, and the user can acquire interested information.

It should be noted that, the global video is obtained by shooting with a global camera. One global video corresponds to a plurality of local videos. And if the global video and the local video are offline videos, marking a plurality of local videos and corresponding scenes, wherein each mark corresponds to a different instruction. If the global video and the local video are live videos, the association relation between the scene and the local camera can be preset, and each mark corresponds to different instructions. When the instruction is acquired, acquiring a corresponding scene, and acquiring the local video matched with the scene based on the scene.

In order to improve the streaming rate of the video or avoid data overflow of the ontology server, in the embodiment of the application, the global video is stored in a local memory, and the local video is stored in a remote memory or a cloud memory. Acquiring a global video; acquiring a scene matched with an instruction based on the instruction; based on the scene, acquiring a local video matched with the scene from a remote memory or a cloud memory; and transforming the local video to a corresponding block of the global video to obtain a global fusion video matched with the scene.

In the embodiment of the present application, the instruction may be at least one of a voice instruction, a touch instruction, and a gesture instruction.

As an optional implementation manner of the foregoing embodiment, the step of transforming the local video to a corresponding block of the global video to obtain a global fusion video matched with the scene specifically includes:

and adopting a block matching algorithm to find out the corresponding block of the local video in the global video. In general, a zero-mean normalized cross-correlation block matching algorithm (abbreviated as "ZNCC algorithm") is adopted to perform block matching, preferably performing twice ZNCC iterations, to find a corresponding block of the local video in the global video, and obtain a pixel matching relationship between the local video and the global reference video.

And registering the local video and the global video based on the corresponding blocks to obtain a matched global fusion video of the scene. In general, the local video and global video are transformed using a global transform, a grid-based transform, and a temporal, spatial smoothing transform to obtain a matched global fusion video of the scene.

And (3) overall transformation: and taking the searched local video and the corresponding block thereof as a pair, and then adopting a ZNCC algorithm to extract and match the characteristic points of the local video and the corresponding block thereof so as to extract the corresponding (matched) characteristic point pair in the local image and the corresponding block thereof. In a preferred embodiment of the present invention, two ZNCC iterations are performed to extract feature point pairs and calculate a homography matrix, and the process of the two iterations can be represented by the following iteration formula:

，/>

，

wherein,,

representing matching blocks between the local video and corresponding blocks thereof (representing matching relations between the two); i _l 、I _r Representing local video and its corresponding block in global reference video, p _l And p _r Local video I respectively _l And corresponding block I _r Corresponding feature points of (i.e. p) _l And p _r Is a characteristic point pair; ZNCC () represents an energy function that computes a local video and its corresponding block with the ZNCC algorithm; h represents a homography matrix, and is initialized to an identity matrix during initialization; />

For a homogenization matrix of pl, pi () represents the center projection and inverse homogenization function; w is the size of the local video (the local video is square, w represents the side length of the square), εFor the search width.

Then, a mesh-based transformation is performed: based on the preliminary global transformation video obtained in the previous step, performing grid-based transformation on the feature point pairs extracted in the whole transformation process by using an ASAP transformation frame (nearest transformation frame), and performing optical flow-based transformation on the grid transformation result to optimize the pixel matching relationship, so as to obtain more reliable feature point pairs, and obtain feature points which show more successful matching in the local video at the moment and changed optical flows. And combining the distortion of the optical flow transformation with the stability of the local video, and recalculating the homography matrix to complete the transformation based on the grid and the transformation based on the optical flow, thereby obtaining a transformation result. And performing color calibration on the local video after the transformation and registration are completed.

In a specific embodiment, a temporal and spatial smooth transformation is performed by introducing a temporal stability constraint, the energy function of the smooth transformation being:

E(V)＝λ _r E _r (V)+λ _t E _t (V)+λ _s E _s (V)

where V represents a homography matrix of transformations performed in dependence on mesh vertices, E _r (V) is the sum of the distances of each feature point pair between each local video in the global transformed video and the global reference video, E _t (V) is a time stability constraint; e (E) _s (V) is a spatial smoothing term defined as spatial deformations between adjacent vertices; lambda (lambda) _r 、λ _t And lambda (lambda) _s Are constants greater than 0; wherein:

；α _pl is p _l Bilinear interpolation weights of (2);

；/>

is a feature point p in the local video _l Feature points in the corresponding time prior graph; b is an indicator function for checking whether the pixel point pl is on a static background, when B (p _l ) =0 denotes the pixel point p _l Is positioned on a moving background; s is the global transformation function between the local video and its temporal prior map.

After the series of transformation and registration, a global high-resolution video is obtained, wherein, in consideration of the problem of inconsistent color of each local video in the global high-resolution video caused by different color illumination of the local camera, each local video can be subjected to color correction until the color correction is consistent with the global reference video, so that the global high-resolution video has uniform color style as a whole. In addition, such optimization can also be performed on global high resolution video: and (3) removing the superposition part between the converted partial videos by adopting a graph cutting method so as to minimize the error of video registration.

As an alternative implementation of the above embodiment, the method further includes: acquiring a light field video, sampling the light field video by a set magnification factor to obtain a sampled light field video, and performing Fourier transform on the sampled light field video to obtain a first video; after the step of transforming the local video to the corresponding block of the global video to obtain the global fusion video matched with the scene, the method further includes: performing high-pass filtering on the global fusion video to obtain a second video; performing linear addition on the first video and the second video, and performing Fourier change to obtain a third video; and playing the third video in a third window. After the global high-resolution video is obtained, video super-resolution is needed to be carried out on the global light field video so as to solve the problem of low spatial resolution, and the specific method is as follows: up-sampling the global light field video with low resolution (spatial resolution) by amplification set times to obtain a sampled low-resolution light field video, performing Fourier transform on the sampled low-resolution light field video to obtain a first frequency spectrum video, and performing low-pass filtering on the first frequency spectrum video; and performing high-pass filtering on the global high-resolution video to obtain a second spectrum video. And thenAnd linearly adding the low-pass filtered first frequency spectrum video with the second frequency spectrum video, and then performing inverse Fourier transform to obtain a global high-resolution light field video. Wherein the set multiple is f _h /f _l ，f _h And f _l Focal lengths of the local camera and the global camera, respectively.

As an alternative implementation of the foregoing embodiment, the first window and the second window are displayed simultaneously under the same interface. In order to be able to facilitate the user capturing the information of interest in the video, the first window and the second window are presented under the same interface. In general, the second window floats on the first window, and the second window is close to a corresponding position of the local video in the global video, so that the user can observe the second window conveniently.

Further, the first window, the second window and the third window are displayed simultaneously under the same interface. In order to be able to facilitate the user to capture information of interest in the video, the first window, the second window and the third window are presented under the same interface. In general, the second window floats on the first window and the third window, and the second window is close to the corresponding position of the local video in the global video, so that the user can observe the second window conveniently.

As an optional implementation manner of the foregoing embodiment, the region corresponding to the local video is visually identified in the global fusion video. Typically, the visual means may optionally select a circled representation, an indicated representation, or the like.

As shown in fig. 3, the embodiment of the present application further provides a multicast guiding device, including:

a first obtaining module 100, configured to obtain a global video;

the searching module 200 searches a scene matched with the instruction based on the instruction;

a second obtaining module 300, based on the scene, obtaining a local video matched with the scene;

the transformation module 400 is configured to transform the local video to a corresponding block of the global video to obtain a global fusion video matched with the scene; and

and the playing module 500 is configured to play the global fusion video in a first window and play the local video in a second window.

The transformation module 400 is further adapted to: a block matching algorithm is adopted to find out the corresponding block of the local video in the global video; and registering the local video and the global video based on the corresponding blocks to obtain a matched global fusion video of the scene.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working processes of the modules/units/sub-units/components in the above-described apparatus may refer to corresponding processes in the foregoing method embodiments, which are not described herein again.

In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing relevant data of the image acquisition device. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method and system for streaming.

In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a method and system for streaming. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the structures shown in FIG. 4 are block diagrams only and do not constitute a limitation of the computer device on which the present aspects apply, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.

In some embodiments, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In some embodiments, a computer readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

In summary, the present application further provides a method for guiding broadcast, including:

acquiring a global video;

acquiring a local video matched with the scene based on the scene;

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The guiding and broadcasting method is characterized by comprising the following steps:

acquiring a global video;

searching a scene matched with an instruction based on the instruction;

acquiring a local video matched with the scene based on the scene;

playing the global fusion video in a first window, and playing the local video in a second window;

the step of transforming the local video to the corresponding block of the global video to obtain the global fusion video matched with the scene specifically includes:

a block matching algorithm is adopted to find out the corresponding block of the local video in the global video;

registering the local video and the global video based on the corresponding blocks to obtain a matched global fusion video of the scene;

the obtaining the matched global fusion video of the scene comprises the following steps:

transforming the local video and the global video by adopting integral transformation, grid-based transformation and time and space smoothing transformation to obtain a matched global fusion video of the scene;

wherein the step of integrally transforming comprises:

taking the searched local video and the corresponding block thereof as a pair, and adopting a ZNCC algorithm to extract and match characteristic points of the local video and the corresponding block thereof so as to extract the matched characteristic point pair in the local image and the corresponding block thereof;

performing two ZNCC iterations to extract the characteristic point pairs and calculate a homography matrix, wherein the process of the two iterations can be expressed by adopting the following iteration formula:

，/>

wherein->

Representing matching blocks between the local video and its corresponding blocks; i _l 、I _r Representing local video and its corresponding block in global reference video, p _l And p _r Local video I respectively _l And corresponding block I _r Corresponding feature points of (i.e. p) _l And p _r Is a characteristic point pair; ZNCC () represents an energy function that computes a local video and its corresponding block with the ZNCC algorithm; h represents a homography matrix, and is initialized to an identity matrix during initialization; />

Is uniform of plA transformation matrix, pi () represents a center projection and inverse transformation function; w is the size of the local video, the local video is square, w is the side length of the square, and epsilon is the search width.

2. The method of claim 1, wherein the method further comprises:

acquiring a light field video, sampling the light field video by a set magnification factor to obtain a sampled light field video, and performing Fourier transform on the sampled light field video to obtain a first video;

after the step of transforming the local video to the corresponding block of the global video to obtain the global fusion video matched with the scene, the method further includes:

performing high-pass filtering on the global fusion video to obtain a second video;

performing linear addition on the first video and the second video, and performing Fourier change to obtain a third video;

and playing the third video in a third window.

3. The method of claim 1, wherein the first window and the second window are presented simultaneously under the same interface.

4. The method of claim 1, wherein the region to which the local video corresponds is visually identified in the global fusion video.

5. The method of claim 1, wherein the instructions comprise at least one of voice instructions, touch instructions, and gesture instructions.

6. A director, comprising:

the first acquisition module is used for acquiring a global video;

the searching module is used for searching a scene matched with the instruction based on the instruction;

the playing module is used for playing the global fusion video in a first window and playing the local video in a second window;

wherein the transformation module is further to:

wherein the step of integrally transforming comprises:

，/>

wherein->

For a homogenization matrix of pl, pi () represents the center projection and inverse homogenization function; w is the size of the local video, the local video is square, w is the side length of the square, and epsilon is the search width.

7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the method according to any one of claims 1-5.

8. A computer device comprising a processor, a memory and a computer program stored on the memory, characterized in that the processor implements the steps of the method according to any of claims 1-5 when the computer program is executed.