CN114500871B

CN114500871B - Multipath video analysis method, equipment and medium

Info

Publication number: CN114500871B
Application number: CN202111532772.6A
Authority: CN
Inventors: 韩振; 蔡富东; 孔志强; 陈雷; 李在学; 王海慧; 马景行; 朱朋
Original assignee: Shandong Senter Electronic Co Ltd
Current assignee: Shandong Senter Electronic Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2023-11-14
Anticipated expiration: 2041-12-15
Also published as: CN114500871A

Abstract

The embodiment of the application discloses a multi-path video analysis method, equipment and a medium. Acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user. By the method, the aim that the low-computation-force equipment analyzes multiple paths of videos at the same time is fulfilled.

Description

Multipath video analysis method, equipment and medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a medium for multi-path video analysis.

Background

Front-end intelligent analysis refers to intelligent analysis of video or image data at a front-end device (e.g., a webcam) to identify a target object. This functionality requires that the head-end equipment must have sufficient computing power, the higher the computing power the higher the hardware cost.

The multi-path video can be provided by the multi-camera, and the equipment with the front-end intelligent analysis technology has limited calculation power, so that the simultaneous video analysis of the multi-path video cannot be met, and the video analysis of a single-path video can be performed, so that only the target object in one path of video picture can be identified, and the target objects in other lenses can be omitted.

If multiple paths of videos are to be analyzed at the same time, the computing power of the equipment is generally required to be increased, so that the equipment cost is high, and the popularization and application of the equipment are affected.

Disclosure of Invention

The embodiment of the application provides a multi-path video analysis method, equipment and medium, which are used for solving the following technical problems: when multi-path videos are analyzed simultaneously, the calculation power of equipment is generally required to be increased, so that the equipment cost is high, and the popularization and application of the equipment are affected.

The embodiment of the application adopts the following technical scheme:

the embodiment of the application provides a multipath video analysis method. Acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.

According to the embodiment of the application, the first real-time video data respectively corresponding to the acquired multiple lenses are subjected to picture-in-picture synthesis processing, and the synthesized picture-in-picture video data are subjected to target analysis labeling. The synthesized video data can be used as single-path video data, so that analysis of all videos can be realized by analyzing the single-path video. The problem of high cost when analyzing each path of video independently is solved. The calculation force requirement of front-end equipment is reduced, so that the equipment cost is reduced. The popularization and application of the front-end analysis equipment in the living environment are facilitated.

In one implementation of the present application, the method encodes the second real-time video data with the mark to obtain compressed video data, specifically includes: transmitting YUV data corresponding to the marked second real-time video data to an encoder to obtain encoded H264 or H265 data; and transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.

In one implementation of the present application, after the YUV data corresponding to each of the plurality of first real-time video data is transmitted to the encoder to obtain the encoded H264 or H265 data, the method further includes: receiving first real-time video data requests respectively sent by different users; wherein the first real-time video data request at least comprises any one or more paths of video data; and transmitting one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.

In one implementation of the present application, the performing a picture-in-picture synthesis process on the plurality of first real-time video data specifically includes: when the picture-in-picture synthesis mode is multi-frame side by side or multi-frame side by side, determining the size of each frame of image corresponding to the first real-time video data according to the video parameters corresponding to the first real-time video data respectively; generating a blank image; and determining coordinate position sets of the plurality of first real-time video data in the blank images respectively according to the sizes of the blank images, the numbers of the first real-time video data and the sizes of each frame of image.

According to the embodiment of the application, the size of each path of video is determined according to the video data parameters, so that the arrangement positions of multiple paths of videos are set, and the multiple paths of videos are uniformly discharged. To enhance the visual effect of the generated picture-in-picture video data in a multi-frame side-by-side or side-by-side situation.

In one implementation of the present application, after determining the coordinate position sets of the plurality of first real-time video data in the blank images, the method further includes: acquiring display time stamps corresponding to the first real-time video data respectively; if the difference value among the display time stamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-grid spliced image in a preset splicing mode; wherein each cell in the multi-cell stitched image corresponds to a first real-time video data.

In one implementation manner of the present application, the second real-time video data is sent to a preset intelligent analysis module, so as to mark the target image in the second real-time video data through the preset intelligent analysis module, which specifically includes: inputting a current frame image corresponding to the second real-time video data into a preset target recognition model, and obtaining a labeling frame corresponding to the target image through the preset target recognition model; extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame; performing Hungary calculation based on the annotation frame and the predicted coordinate position information to perform cascade matching on the annotation frame and the predicted coordinate position information to obtain a matching set; and obtaining a motion trail of the target image according to the matching set so as to track and mark the target image in the second real-time video data through the motion trail.

According to the embodiment of the application, the labeling frame of the target image of the current frame is obtained by presetting the target recognition model, and the predicted coordinate position information of the current frame image is obtained according to the Kalman filter, so that the position tracking of the target image is realized. Under the condition that a plurality of target images appear in the same frame of image, different targets can be tracked and positioned, and tracking and labeling of the plurality of target images are realized.

In one implementation of the present application, the performing a picture-in-picture synthesis process on the plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting motion characteristics of a target image of a plurality of first real-time video data current frames based on a Kalman filter to obtain predicted coordinate position information of the target image in a next frame; and based on the predicted coordinate position information of the target image in the next frame, performing size adjustment on the plurality of first real-time video data to obtain picture-in-picture video data in a size frame superposition form.

In one implementation of the present application, after resizing the plurality of first real-time video data to obtain the pip video data in a form of overlapping a size frame, the method further includes: counting the number of the predicted coordinate position information in the first real-time video data again after each interval is preset; and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.

According to the embodiment of the application, the number of the target images in the next frame corresponding to each path of video data can be obtained by counting the number of the predicted coordinate position information, so that each path of video data is overlapped and discharged according to the number of the target images, and the video data with more targets are placed on the upper layer, so that the shielding of the target images is reduced, and the accuracy rate of identifying the target images is improved.

The embodiment of the application provides a multi-path video analysis device, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.

Embodiments of the present application provide a non-volatile computer storage medium storing computer-executable instructions configured to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.

The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects: according to the embodiment of the application, the first real-time video data respectively corresponding to the acquired multiple lenses are subjected to picture-in-picture synthesis processing, and the synthesized picture-in-picture video data are subjected to target analysis labeling. The synthesized video data can be used as single-path video data, so that analysis of all videos can be realized by analyzing the single-path video. The problem of high cost when analyzing each path of video independently is solved. The calculation force requirement of front-end equipment is reduced, so that the equipment cost is reduced. The popularization and application of the front-end analysis equipment in the living environment are facilitated.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art. Attached at

In the figure:

FIG. 1 is a flow chart of a multi-channel video analysis method according to an embodiment of the present application;

FIG. 2 is a block diagram of a multi-channel video analysis flow provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of a multi-channel video analysis device according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a multi-path video analysis method, equipment and medium.

In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

In order to solve the above problems, embodiments of the present application provide a method, apparatus, and medium for multi-path video analysis. And carrying out picture-in-picture synthesis processing on the first real-time video data respectively corresponding to the acquired multiple lenses, and carrying out target analysis labeling on the synthesized picture-in-picture video data. The synthesized video data can be used as single-path video data, so that analysis of all videos can be realized by analyzing the single-path video. The problem of high cost when analyzing each path of video independently is solved. The calculation force requirement of front-end equipment is reduced, so that the equipment cost is reduced. The popularization and application of the front-end analysis equipment in the living environment are facilitated.

The following describes the technical scheme provided by the embodiment of the application in detail through the attached drawings.

Fig. 1 is a flowchart of a multi-path video analysis method according to an embodiment of the present application. As shown in fig. 1, the multi-path video analysis method includes the following steps:

s101, a multi-path video analysis device acquires first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera.

In one embodiment of the application, the multi-path video analysis device obtains YUV real-time video data corresponding to a plurality of lenses of the multi-view camera respectively through physical interfaces such as MIPI, USB and the like.

S102, the multipath video analysis equipment performs picture-in-picture synthesis processing on the plurality of first real-time video data, and takes the synthesized picture-in-picture video data as second real-time video data.

In one embodiment of the present application, when the picture-in-picture synthesis mode is multi-frame side-by-side or multi-frame side-by-side, the size of each frame of image corresponding to each of the plurality of first real-time video data is determined according to the video parameters corresponding to each of the plurality of first real-time video data. A blank image is generated. And determining coordinate position sets of the plurality of first real-time video data in the blank images respectively according to the sizes of the blank images, the numbers of the first real-time video data and the sizes of each frame of image.

Specifically, the picture-in-picture synthesizing mode in the embodiment of the application at least comprises two modes, wherein multiple frames are arranged side by side or side by side and the frames with the sizes are overlapped. But is not limited to the two types of picture-in-picture synthesis. When the picture-in-picture synthesis mode is that a plurality of frames are arranged side by side or side by side, the positions of the videos are ordered according to the image size corresponding to each first real-time video data and the number of the first real-time video data, so that the plurality of first real-time videos are uniformly discharged, the visual effect of the picture-in-picture videos is enhanced, shielding among the videos is reduced, and the accuracy rate of identifying target images is improved.

Further, according to the parameters corresponding to the first real-time video data, the image size corresponding to each video and the number of the first real-time video data can be obtained. The sizes of the paths of first real-time video data may be the same or different. Each path of first real-time video data is marked with a serial number in advance, for example, the serial number can be marked according to the position of the camera.

Further, the multipath video analysis equipment generates a blank image, determines the size of the blank image, and determines the size of each path of first real-time video data in the picture according to the size of the blank image and the number of the first real-time video data. And performing position allocation on the blank image according to the size to acquire the coordinate position of each first real-time video data in the blank image. The size of the video data is adjusted according to the size of the video data in the picture-in-picture. And sequentially discharging the plurality of first real-time video data according to the adjusted size and the marked serial numbers so as to realize the picture-in-picture video data in a multi-frame side-by-side or multi-frame side-by-side form.

In one embodiment of the present application, a plurality of display time stamps corresponding to the first real-time video data, respectively, are acquired. If the difference value between the display time stamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-grid spliced image in a preset splicing mode. Wherein each cell in the multi-cell stitched image corresponds to a first real-time video data.

Specifically, since the plurality of shots are taken separately and individually, after receiving the plurality of pieces of first real-time video data, in order to ensure time synchronization of the plurality of pieces of video in the synthesized picture-in-picture video data, it is necessary to compare time stamps of the plurality of pieces of first real-time video data. And under the condition that the time stamp difference value of the plurality of paths of first real-time video data is smaller than a preset threshold value, synthesizing the plurality of paths of first real-time video data into a multi-grid mosaic.

For example, the preset threshold may be set to 0.1S, and if the time stamp of the multiple paths of first real-time video data is less than 0.1S, the received multiple paths of first real-time video data are subjected to picture-in-picture synthesis.

It should be noted that, in the embodiment of the present application, the preset threshold is preferably set to 0.1S, and in application, the preset threshold may be adjusted according to actual situations.

In one embodiment of the present application, when the current pip synthesis mode is a size frame superposition mode, motion features of a target image of a current frame of a plurality of first real-time video data are extracted based on a kalman filter, so as to obtain predicted coordinate position information of the target image in a next frame. And based on the predicted coordinate position information of the target image in the next frame, performing size adjustment on the plurality of first real-time video data to obtain picture-in-picture video data in a size frame superposition form.

Specifically, when the current pip synthesis mode is a large and small frame overlapping mode, the pip synthesis effect is that video frames with smaller sizes are overlapped on video frames with larger sizes. At this time, a part of the target object is blocked. Therefore, in order to minimize the occluded target image, the motion characteristics of the target image of the current frame of the first real-time video data may be extracted by a kalman filter to predict the position of the target image in the next frame. In the same frame of image, there may be one or more target images. Accordingly, it is possible to superimpose and discharge a plurality of first real-time video data according to the number of predicted target images.

In one embodiment of the present application, the number of predicted coordinate position information in the first real-time video data is counted again after each interval for a preset period of time. And updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.

Specifically, the number of the target images of the video images shot by each image capturing apparatus is continuously updated, so in order to reduce the number of the blocked target images in the overlapping emission, the number of the target images in each first real-time video data can be re-predicted after a period of time, so that the stacking sequence of the videos is changed again according to the number of the re-predicted target images. Thus, the video image with the larger target number is placed at the uppermost layer of the picture-in-picture video data, and the video image with the smallest target number is placed at the lowermost layer of the picture-in-picture video data.

In one embodiment of the present application, after the number of predicted coordinate position information corresponding to the next frame image is acquired, the plurality of first real-time video data is overlapped and discharged from top to bottom according to the number from at least more. And taking the area except the predicted coordinate position information in the next frame image as a background area. And carrying out size adjustment on the plurality of first real-time video data so that the video image of the upper layer is overlapped in the background area corresponding to the video image of the lower layer. Wherein the size of the next layer of video image is larger than the size of the previous layer of video image. Thereby minimizing occlusion of the target image in the underlying video data.

And S103, the multipath video analysis equipment transmits the second real-time video data to a preset intelligent analysis module so as to mark the target image in the second real-time video data through the preset intelligent analysis module.

In one embodiment of the application, the current frame image corresponding to the second real-time video data is input into a preset target recognition model, and the annotation frame corresponding to the target image is obtained through the preset target recognition model. And extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame. And performing Hungary calculation based on the annotation frame and the predicted coordinate position information so as to perform cascade matching on the annotation frame and the predicted coordinate position information to obtain a matching set. And obtaining a motion trail of the target image according to the matching set so as to track and mark the target image in the second real-time video data through the motion trail.

Specifically, training the neural network model according to a sample set corresponding to the pre-acquired second real-time video data to obtain a preset target recognition model. The preset target recognition model can recognize and mark the target image in the input second real-time video data. And inputting the current frame image corresponding to the current second real-time video data into a preset target recognition model to obtain the label frame position corresponding to the target image. The target is tracked according to Kalman filtering, i.e. a Kalman filter is used to predict the motion state of the target. And predicting the position of each target in the current frame image according to the position of the marking frame corresponding to each target in the previous frame image, namely predicting coordinate position information of each target in the current frame. For example, a standard Kalman filter based on a constant velocity model and a linear observation model may be used to predict the motion state of each target, so as to obtain the predicted position of each target in the current frame.

Further, cosine distances between the features of each target image and the depth features in the stored depth feature set are calculated, and a cosine distance matrix is generated. And calculating the mahalanobis distance between the predicted position of the current frame and the corresponding detection frame position of each target image. The cosine distance matrix is preprocessed, wherein elements with the Marsh distance larger than a first preset threshold value in the cosine distance matrix are set to infinity, and elements with the cosine distance larger than a second preset threshold value in the cosine distance matrix are set to a larger value. The first preset threshold and the second preset threshold can be set according to different scenes. Based on the preprocessed cosine distance matrix, the annotation frame corresponding to each target image and the predicted coordinate position information are subjected to first matching by using a Hungary algorithm, so that the similarity score between the annotation frame corresponding to each target image and the predicted coordinate position information can be obtained, and a matching set is obtained. And obtaining a motion trail of the target image according to the matching set so as to track and mark the target image in the second real-time video data through the motion trail. And the identification result can be uploaded to the client platform.

S104, the multipath video analysis equipment encodes the second real-time video data with the marks to obtain compressed video data.

In one embodiment of the present application, YUV data corresponding to the second real-time video data with the flag is transferred to an encoder to obtain encoded H264 or H265 data. And transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.

Specifically, the multipath single-lens video YUV data is continuously conveyed to an encoder, and encoded H264 or H265 data is continuously acquired. And continuously transmitting the YUV data after the picture-in-picture synthesis to an encoder, and continuously acquiring encoded H264 or H265 data.

S105, receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.

In one embodiment of the application, first real-time video data requests respectively sent by different users are received. Wherein the first real-time video data request includes at least any one or more paths of video data. And transmitting one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.

Specifically, the embodiment of the application supports multiple clients to simultaneously request different video streams. Determining any one or more paths of videos requested by the client based on parameters specified in the request, and transmitting the videos through a streaming media transmission protocol, such as: RTP or RTCP protocol pushes client-specific real-time video data to clients simultaneously. Because the PIP effect is served by a single video, the client can also request the video data independently, i.e. the PIP effect can be seen.

Fig. 2 is a flow chart of multi-channel video analysis according to an embodiment of the present application. As shown in fig. 2, the multi-path video analysis flow includes:

in one embodiment of the present application, YUN data corresponding to each lens in the multi-view camera is obtained, and the obtained plurality of YUN data are combined into one new YUN data, that is, the picture-in-picture video data is combined. And coding the YUN data corresponding to each lens in the multi-view camera to generate H264 or H265 data corresponding to each path of video.

In one embodiment of the application, the synthesized picture-in-picture video data is delivered to an intelligent analysis module for analysis of the target image in the picture-in-picture video data by the intelligent analysis module. At this time, the recognition result may be directly uploaded to the user platform. And marking the identified target object according to the analysis result to form YUN data with marking frames or marks. And coding the YUN data with the label frame or the label to form H264 or H265 data corresponding to the picture-in-picture video data.

In one embodiment of the application, one or more channels of video data may be simultaneously transmitted to respective clients upon request by the clients.

Fig. 3 is a schematic structural diagram of a multi-channel video analysis device according to an embodiment of the present application. As shown in fig. 3, the multi-path video analysis apparatus includes:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;

performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data;

transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;

encoding the second real-time video data with the marks to obtain compressed video data;

and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.

Embodiments of the present application provide a non-volatile computer storage medium storing computer-executable instructions configured to:

encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.

The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for apparatus, devices, non-volatile computer storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.

The foregoing describes certain embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the embodiments of the application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of multi-path video analysis, the method comprising:

receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user;

the process of performing the picture-in-picture synthesis processing on the plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting motion characteristics of a target image of a current frame of the plurality of first real-time video data based on a Kalman filter to obtain predicted coordinate position information of the target image in a next frame;

based on the predicted coordinate position information of the target image in the next frame, performing size adjustment on the plurality of first real-time video data to obtain picture-in-picture video data in a size frame superposition form;

counting the number of the predicted coordinate position information in the first real-time video data again after each interval of preset time length;

and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.

2. The method for multi-channel video analysis according to claim 1, wherein the encoding the second real-time video data with the tag to obtain compressed video data specifically comprises:

transmitting YUV data corresponding to the second real-time video data with the mark to an encoder to obtain encoded H264 or H265 data; and

and transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.

3. The method according to claim 2, wherein after the YUV data corresponding to each of the first real-time video data is transmitted to the encoder to obtain encoded H264 or H265 data, the method further comprises:

receiving the first real-time video data requests respectively sent by different users; wherein the first real-time video data request at least comprises any one or more paths of video data;

and transmitting one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.

4. The method of claim 1, wherein the performing a picture-in-picture synthesis process on the plurality of first real-time video data comprises:

when the picture-in-picture synthesis mode is multi-frame side by side or multi-frame side by side, determining the size of each frame of image corresponding to the plurality of first real-time video data according to the video parameters corresponding to the plurality of first real-time video data respectively;

generating a blank image;

and determining coordinate position sets of the plurality of first real-time video data in the blank images respectively according to the size of the blank images, the numbers of the first real-time video data and the size of each frame of image.

5. The method of claim 4, wherein after determining the respective sets of coordinate positions of the plurality of first real-time video data in the blank image, the method further comprises:

acquiring display time stamps corresponding to the plurality of first real-time video data respectively;

if the difference value among the display time stamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-grid spliced image in a preset splicing mode; wherein each cell in the multi-cell stitched image corresponds to one of the first real-time video data.

6. The method of claim 1, wherein the delivering the second real-time video data to a preset intelligent analysis module to mark the target image in the second real-time video data by the preset intelligent analysis module specifically comprises:

inputting the current frame image corresponding to the second real-time video data into a preset target recognition model, and obtaining a labeling frame corresponding to a target image through the preset target recognition model;

extracting the motion characteristics of a target image of a previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame;

performing Hungary calculation based on the annotation frame and the predicted coordinate position information to perform cascade matching on the annotation frame and the predicted coordinate position information to obtain a matching set;

and obtaining the motion trail of the target image according to the matching set so as to track and mark the target image in the second real-time video data through the motion trail.

7. A multi-path video analysis device comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

8. A non-volatile computer storage medium storing computer-executable instructions, the computer

The executable instructions are configured to: