CN114500871A

CN114500871A - Multi-channel video analysis method, equipment and medium

Info

Publication number: CN114500871A
Application number: CN202111532772.6A
Authority: CN
Inventors: 韩振; 蔡富东; 孔志强; 陈雷; 李在学; 王海慧; 马景行; 朱朋
Original assignee: Shandong Senter Electronic Co Ltd
Current assignee: Shandong Senter Electronic Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-05-13
Anticipated expiration: 2041-12-15
Also published as: CN114500871B

Abstract

The embodiment of the application discloses a multi-channel video analysis method, equipment and a medium. Acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user. By the method, the aim of analyzing multiple paths of videos by low-computing-power equipment at the same time is fulfilled.

Description

Multi-channel video analysis method, equipment and medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a medium for multi-channel video analysis.

Background

Front-end intelligent analysis refers to performing intelligent analysis on video or image data at a front-end device (such as a network camera) so as to identify a target object. This functionality requires that the front-end equipment must have sufficient computing power, the higher the computing power the higher the hardware cost.

The multi-view camera can provide multi-path videos, and due to the fact that the equipment with the front-end intelligent analysis technology is limited in computing power, simultaneous video analysis of the multi-path videos cannot be achieved, video analysis can only be conducted on single-path videos, therefore, only target objects in one path of video pictures can be identified, and target objects in other lenses can be omitted.

If a plurality of videos are analyzed simultaneously, the calculation power of the equipment is generally required to be increased, so that the cost of the equipment is high, and the popularization and application of the equipment are influenced.

Disclosure of Invention

The embodiment of the application provides a multi-channel video analysis method, equipment and a medium, which are used for solving the following technical problems: when a plurality of paths of videos are analyzed simultaneously, the calculation power of equipment is generally required to be increased, so that the cost of the equipment is high, and the popularization and application of the equipment are influenced.

The embodiment of the application adopts the following technical scheme:

the embodiment of the application provides a multi-channel video analysis method. The method comprises the steps of obtaining first real-time video data corresponding to a plurality of lenses of a multi-view camera respectively; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.

According to the embodiment of the application, picture-in-picture synthesis processing is carried out on the acquired first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.

In an implementation manner of the present application, encoding the second real-time video data with the mark to obtain the compressed video data specifically includes: transmitting YUV data corresponding to the second real-time video data with the marks to an encoder to obtain encoded H264 or H265 data; and transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.

In an implementation manner of the present application, after the YUV data corresponding to the plurality of first real-time video data are respectively transmitted to an encoder to obtain encoded H264 or H265 data, the method further includes: receiving first real-time video data requests respectively sent by different users; the first real-time video data request at least comprises any one or more paths of video data; and simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.

In an implementation manner of the present application, performing picture-in-picture synthesis processing on a plurality of first real-time video data specifically includes: when the picture-in-picture synthesis mode is multi-frame side-by-side or multi-frame side-by-side, determining the size of each frame image corresponding to the first real-time video data according to the video parameters corresponding to the first real-time video data; generating a blank image; and determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.

According to the video data parameters, the size of each path of video is determined, so that the arrangement positions of multiple paths of videos are set, and the multiple paths of videos are uniformly discharged. To improve the visual effect of the generated picture-in-picture video data under the condition of side-by-side or parallel multi-frames.

In one implementation of the present application, after determining sets of coordinate positions of the plurality of first real-time video data in the blank image, the method further includes: acquiring display timestamps corresponding to a plurality of first real-time video data respectively; if the difference value between the display timestamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-lattice spliced image in a preset splicing mode; each grid in the multi-grid spliced image corresponds to one first real-time video data.

In an implementation manner of the present application, the second real-time video data is transmitted to the preset intelligent analysis module, so as to mark a target image in the second real-time video data through the preset intelligent analysis module, which specifically includes: inputting a current frame image corresponding to the second real-time video data into a preset target identification model, and obtaining a marking frame corresponding to the target image through the preset target identification model; extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame; performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set; and obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.

According to the method and the device, the marking frame of the current frame target image is obtained through the preset target identification model, and the predicted coordinate position information of the current frame image is obtained according to the Kalman filter, so that the position tracking of the target image is realized. Under the condition that a plurality of target images appear in the same frame image, different targets can be tracked and positioned, and tracking and labeling of the plurality of target images are realized.

In an implementation manner of the present application, performing picture-in-picture synthesis processing on a plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting the motion characteristics of the target images of the current frames of the plurality of first real-time video data based on a Kalman filter to obtain the predicted coordinate position information of the target images in the next frame; and carrying out size adjustment on the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a size frame overlapping mode.

In an implementation manner of the present application, after resizing a plurality of first real-time video data to obtain picture-in-picture video data in a size-frame superposition form, the method further includes: after every preset time interval, counting the quantity of the predicted coordinate position information in the first real-time video data again; and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.

According to the embodiment of the application, the number of the target images in the next frame corresponding to each path of video data can be obtained by counting the number of the predicted coordinate position information, so that each path of video data is overlapped and discharged according to the number of the target images, the video data with a large number of targets is placed on the upper layer, the shielding of the target images is reduced, and the accuracy of target image identification is improved.

The embodiment of the application provides a multichannel video analysis equipment, includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.

An embodiment of the present application provides a non-volatile computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: according to the embodiment of the application, picture-in-picture synthesis processing is carried out on the acquired first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. On the attached sheet

In the figure:

fig. 1 is a flowchart of a multi-channel video analysis method according to an embodiment of the present disclosure;

fig. 2 is a block diagram of a multi-channel video analysis process according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a multi-channel video analysis apparatus according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a multi-channel video analysis method, equipment and medium.

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.

In order to solve the above problem, embodiments of the present application provide a method, an apparatus, and a medium for multi-channel video analysis. Picture-in-picture synthesis processing is carried out on the obtained first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.

The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a multi-channel video analysis method according to an embodiment of the present disclosure. As shown in fig. 1, the multi-channel video analysis method includes the following steps:

s101, acquiring first real-time video data corresponding to a plurality of lenses of a multi-view camera by a multi-path video analysis device.

In an embodiment of the application, the multi-channel video analysis device obtains YUV real-time video data corresponding to a plurality of lenses of the multi-view camera through physical interfaces such as MIPI and USB.

S102, the multi-channel video analysis equipment carries out picture-in-picture synthesis processing on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data.

In an embodiment of the application, when the pip composition manner is multi-frame parallel or multi-frame parallel, the size of each frame of image corresponding to the plurality of first real-time video data is determined according to the video parameters corresponding to the plurality of first real-time video data, respectively. A blank image is generated. And determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.

Specifically, the picture-in-picture combining manner in the embodiment of the present application includes at least two, in which multiple frames are placed side by side or in parallel and large and small frames are placed in an overlapping manner. But is not limited to the two picture-in-picture combining methods. When the picture-in-picture synthesis mode is that multiple frames are arranged side by side or in parallel, the positions of the videos are sequenced according to the image size corresponding to each first real-time video data and the number of the first real-time video data, so that the multiple first real-time videos are uniformly distributed, the visual effect of the picture-in-picture videos is enhanced, the shielding among the videos is reduced, and the accuracy of target image identification is improved.

Further, according to the parameters corresponding to the first real-time video data, the image size corresponding to each video and the number of the first real-time video data can be obtained. The sizes of the first real-time video data of the channels may be the same or different. The sequence number of each path of first real-time video data is labeled in advance, and for example, the sequence number labeling can be performed according to the position of a camera.

Furthermore, the multichannel video analysis equipment generates blank images, determines the size of the blank images, and determines the size of each channel of first real-time video data in a picture-in-picture according to the size of the blank images and the number of the first real-time video data. And carrying out position distribution on the blank image according to the size so as to obtain the coordinate position of each first real-time video data in the blank image. And adjusting the size of the video data according to the size of the video data in the picture-in-picture. And sequentially arranging the plurality of first real-time video data according to the adjusted sizes and the marked serial numbers so as to realize picture-in-picture video data in a multi-frame side-by-side or multi-frame side-by-side form.

In an embodiment of the present application, display timestamps corresponding to a plurality of first real-time video data are obtained. And if the difference value between the display timestamps is smaller than a preset threshold value, synthesizing the first real-time video data into a multi-lattice spliced image in a preset splicing mode. Each grid in the multi-grid spliced image corresponds to one first real-time video data.

Specifically, since the plurality of shots separately and individually perform video shooting, after receiving the plurality of paths of first real-time video data, in order to ensure time synchronization of the plurality of paths of video in the combined picture-in-picture video data, time stamps of the plurality of paths of first real-time video data need to be compared. And under the condition that the difference value of the timestamps of the multiple paths of first real-time video data is smaller than a preset threshold value, synthesizing the multiple paths of first real-time video data into a multi-lattice splicing map.

For example, the preset threshold may be set to 0.1S, and in the case that the timestamps of the multiple paths of first real-time video data are less than 0.1S, the received multiple paths of first real-time video data are combined in picture.

It should be noted that, in the embodiment of the present application, the preset threshold is preferably set to 0.1S, and the preset threshold may be subjected to data adjustment according to actual situations in application.

In an embodiment of the application, when the current picture-in-picture synthesis manner is a large-small frame superposition manner, motion characteristics of target images of a plurality of current frames of first real-time video data are extracted based on a kalman filter, so as to obtain predicted coordinate position information of the target images in a next frame. And adjusting the sizes of the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a large-frame and small-frame overlapping mode.

Specifically, when the current picture-in-picture synthesizing manner is a large-and-small frame superimposing manner, the picture-in-picture synthesizing effect is to superimpose a video frame with a smaller size onto a video frame with a larger size. At this time, a part of the target object may be occluded. Therefore, in order to reduce the occluded target image as much as possible, the kalman filter may be used to extract the motion features of the target image of the current frame of the first real-time video data so as to predict the position of the target image in the next frame. In the same frame image, there may be one or more target images. Therefore, the plurality of first real-time video data can be overlapped and discharged according to the predicted number of the target images.

In an embodiment of the present application, after every preset time interval, the number of predicted coordinate position information in the first real-time video data is counted again. And updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.

Specifically, the video images captured by the respective image capturing apparatuses are transformed in real time, and the number of target images thereof is also constantly updated, so that, in order to reduce the number of target images that are occluded when overlapped and laid out, the number of target images in the respective first live video data may be newly predicted at intervals, so that the stacking order of the videos may be newly changed according to the newly predicted number of target images. Therefore, the video image with a larger target number is placed at the uppermost layer of the picture-in-picture video data, and the video image with a minimum target image number is placed at the lowermost layer of the picture-in-picture video data.

In an embodiment of the application, after the number of the predicted coordinate position information corresponding to the next frame of image is obtained, the plurality of first real-time video data are overlapped and arranged from top to bottom according to the number of the predicted coordinate position information. And taking the area except the predicted coordinate position information in the next frame image as a background area. And carrying out size adjustment on the plurality of first real-time video data to enable the previous layer of video images to be stacked in the background area corresponding to the next layer of video images. Wherein the size of the next layer video image is larger than that of the previous layer video image. Thereby minimizing occlusion of the target image in the underlying video data.

S103, the multi-channel video analysis equipment transmits the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module.

In an embodiment of the application, a current frame image corresponding to the second real-time video data is input into a preset target recognition model, and a labeling frame corresponding to the target image is obtained through the preset target recognition model. And extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame. And performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set. And obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.

Specifically, the neural network model is trained according to a sample set corresponding to pre-collected second real-time video data to obtain a preset target recognition model. The preset target identification model can identify and label a target image in the input second real-time video data. And inputting the current frame image corresponding to the current second real-time video data into a preset target identification model to obtain the position of the marking frame corresponding to the target image. The target is tracked according to Kalman filtering, namely a Kalman filter is used for predicting the motion state of the target. And predicting the position of each target in the current frame image according to the position of the labeling frame corresponding to each target in the previous frame image, namely predicting the position information of each target in the current frame. For example, a standard Kalman filter based on a constant velocity model and a linear observation model may be used to predict the motion state of each target, and obtain the predicted position of each target in the current frame.

Further, the cosine distance between the features of each target image and the depth features in the stored depth feature set is calculated, and a cosine distance matrix is generated. And calculating the Mahalanobis distance between the predicted position of each target image in the current frame and the corresponding detection frame position. And preprocessing the cosine distance matrix, wherein the elements of which the Mahalanobis distance is greater than a first preset threshold value in the cosine distance matrix are set to be infinite, and the elements of which the cosine distance is greater than a second preset threshold value in the cosine distance matrix are set to be a larger value. The first preset threshold and the second preset threshold can be set according to different scenes. And based on the preprocessed cosine distance matrix, performing first matching on the labeling boxes corresponding to the target images and the predicted coordinate position information by using a Hungarian algorithm, so as to obtain similarity scores between the labeling boxes corresponding to the target images and the predicted coordinate position information, and obtain a matching set. And obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track. And meanwhile, the identification result can be uploaded to the client platform.

And S104, the multi-channel video analysis equipment encodes the second real-time video data with the marks to obtain compressed video data.

In an embodiment of the present application, YUV data corresponding to the marked second real-time video data is sent to an encoder to obtain encoded H264 or H265 data. And transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.

Specifically, the multiple paths of single-lens video YUV data are continuously transmitted to the encoder, and encoded H264 or H265 data are continuously acquired. And continuously transmitting the synthesized picture-in-picture YUV data to an encoder, and continuously acquiring encoded H264 or H265 data.

And S105, receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.

In one embodiment of the present application, first real-time video data requests respectively transmitted by different users are received. The first real-time video data request at least comprises any one or more paths of video data. And simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.

Specifically, the embodiment of the application supports multiple clients to request different video streams simultaneously. Any one or more videos requested by the client are determined based on parameters specified in the request, via a streaming media transport protocol, such as: RTP or RTCP protocol, which pushes real-time video data specified by the client to the client at the same time. Since the picture-in-picture effect is served by a single video channel, the client can also request the video data channel independently, i.e. the picture-in-picture effect can be seen.

Fig. 2 is a block diagram of a multi-channel video analysis process according to an embodiment of the present disclosure. As shown in fig. 2, the multi-channel video analysis process includes:

in an embodiment of the present application, YUN data corresponding to each lens in the multi-view camera is obtained, and the obtained multiple paths of YUN data are combined into a new path of YUN data, that is, picture-in-picture video data combination is performed. And coding YUN data corresponding to each lens in the multi-view camera to generate H264 or H265 data corresponding to each path of video.

In one embodiment of the present application, the synthesized picture-in-picture video data is conveyed to an intelligent analysis module for analysis of a target image in the picture-in-picture video data by the intelligent analysis module. At this time, the recognition result can be directly uploaded to the user platform. And marking the identified target object according to the analysis result to form YUN data with a marking frame or a mark. And coding the YUN data with the mark box or mark to form H264 or H265 data corresponding to the picture-in-picture video data.

In one embodiment of the present application, one or more video data may be simultaneously transmitted to respective clients upon request by the clients.

Fig. 3 is a schematic structural diagram of a multi-channel video analysis apparatus according to an embodiment of the present application. As shown in fig. 3, the multi-channel video analysis apparatus includes:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;

picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;

transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;

coding the second real-time video data with the marks to obtain compressed video data;

and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.

An embodiment of the present application provides a non-volatile computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to:

coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device, and non-volatile computer storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to the partial description of the method embodiments for relevant points.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the embodiments of the present application pertain. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for multi-channel video analysis, the method comprising:

2. The method according to claim 1, wherein said encoding the second real-time video data with the flag to obtain compressed video data comprises:

transmitting the YUV data corresponding to the second real-time video data with the marks to an encoder to obtain encoded H264 or H265 data; and

and transmitting the YUV data corresponding to the plurality of first real-time video data to an encoder to obtain encoded H264 or H265 data.

3. The method of claim 2, wherein after the YUV data corresponding to the first real-time video data are respectively transmitted to the encoder to obtain encoded H264 or H265 data, the method further comprises:

receiving the first real-time video data requests respectively sent by different users; the first real-time video data request at least comprises any one or more paths of video data;

and simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.

4. The method according to claim 1, wherein the step of picture-in-picture synthesizing the plurality of first real-time video data comprises:

when the picture-in-picture synthesis mode is multi-frame side-by-side or multi-frame side-by-side, determining the size of each frame of image corresponding to the first real-time video data according to the video parameters corresponding to the first real-time video data;

generating a blank image;

and determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.

5. The method of claim 4, wherein after determining the set of coordinate locations of the first real-time video data in the blank image, the method further comprises:

acquiring display time stamps corresponding to the plurality of first real-time video data respectively;

if the difference value between the display timestamps is smaller than a preset threshold value, combining the first real-time video data into a multi-lattice spliced image in a preset splicing mode; wherein each cell in the multi-cell stitched image corresponds to one of the first real-time video data.

6. The method according to claim 1, wherein the step of sending the second real-time video data to a preset intelligent analysis module to mark a target image in the second real-time video data by the preset intelligent analysis module comprises:

inputting a current frame image corresponding to the second real-time video data into a preset target identification model, and obtaining a labeling frame corresponding to a target image through the preset target identification model;

extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame;

performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set;

and obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.

7. The method according to claim 1, wherein the step of picture-in-picture synthesizing the plurality of first real-time video data comprises:

when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting the motion characteristics of the target images of the current frames of the plurality of first real-time video data based on a Kalman filter to obtain the predicted coordinate position information of the target images in the next frame;

and adjusting the sizes of the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a large-frame and small-frame overlapping mode.

8. The method of claim 7, wherein after resizing the first real-time video data to obtain the pip video data in a size-box-overlapped form, the method further comprises:

after every preset time interval, counting the number of the predicted coordinate position information in the first real-time video data again;

and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.

9. A multi-channel video analysis apparatus comprising:

at least one processor; and the number of the first and second groups,

10. A non-transitory computer storage medium storing computer-executable instructions configured to: