CN114500871B - Multipath video analysis method, equipment and medium - Google Patents

Multipath video analysis method, equipment and medium Download PDF

Info

Publication number
CN114500871B
CN114500871B CN202111532772.6A CN202111532772A CN114500871B CN 114500871 B CN114500871 B CN 114500871B CN 202111532772 A CN202111532772 A CN 202111532772A CN 114500871 B CN114500871 B CN 114500871B
Authority
CN
China
Prior art keywords
video data
real
time video
picture
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111532772.6A
Other languages
Chinese (zh)
Other versions
CN114500871A (en
Inventor
韩振
蔡富东
孔志强
陈雷
李在学
王海慧
马景行
朱朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Senter Electronic Co Ltd
Original Assignee
Shandong Senter Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Senter Electronic Co Ltd filed Critical Shandong Senter Electronic Co Ltd
Priority to CN202111532772.6A priority Critical patent/CN114500871B/en
Publication of CN114500871A publication Critical patent/CN114500871A/en
Application granted granted Critical
Publication of CN114500871B publication Critical patent/CN114500871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4858End-user interface for client configuration for modifying screen layout parameters, e.g. fonts, size of the windows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the application discloses a multi-path video analysis method, equipment and a medium. Acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user. By the method, the aim that the low-computation-force equipment analyzes multiple paths of videos at the same time is fulfilled.

Description

Multipath video analysis method, equipment and medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a medium for multi-path video analysis.
Background
Front-end intelligent analysis refers to intelligent analysis of video or image data at a front-end device (e.g., a webcam) to identify a target object. This functionality requires that the head-end equipment must have sufficient computing power, the higher the computing power the higher the hardware cost.
The multi-path video can be provided by the multi-camera, and the equipment with the front-end intelligent analysis technology has limited calculation power, so that the simultaneous video analysis of the multi-path video cannot be met, and the video analysis of a single-path video can be performed, so that only the target object in one path of video picture can be identified, and the target objects in other lenses can be omitted.
If multiple paths of videos are to be analyzed at the same time, the computing power of the equipment is generally required to be increased, so that the equipment cost is high, and the popularization and application of the equipment are affected.
Disclosure of Invention
The embodiment of the application provides a multi-path video analysis method, equipment and medium, which are used for solving the following technical problems: when multi-path videos are analyzed simultaneously, the calculation power of equipment is generally required to be increased, so that the equipment cost is high, and the popularization and application of the equipment are affected.
The embodiment of the application adopts the following technical scheme:
the embodiment of the application provides a multipath video analysis method. Acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
According to the embodiment of the application, the first real-time video data respectively corresponding to the acquired multiple lenses are subjected to picture-in-picture synthesis processing, and the synthesized picture-in-picture video data are subjected to target analysis labeling. The synthesized video data can be used as single-path video data, so that analysis of all videos can be realized by analyzing the single-path video. The problem of high cost when analyzing each path of video independently is solved. The calculation force requirement of front-end equipment is reduced, so that the equipment cost is reduced. The popularization and application of the front-end analysis equipment in the living environment are facilitated.
In one implementation of the present application, the method encodes the second real-time video data with the mark to obtain compressed video data, specifically includes: transmitting YUV data corresponding to the marked second real-time video data to an encoder to obtain encoded H264 or H265 data; and transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.
In one implementation of the present application, after the YUV data corresponding to each of the plurality of first real-time video data is transmitted to the encoder to obtain the encoded H264 or H265 data, the method further includes: receiving first real-time video data requests respectively sent by different users; wherein the first real-time video data request at least comprises any one or more paths of video data; and transmitting one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
In one implementation of the present application, the performing a picture-in-picture synthesis process on the plurality of first real-time video data specifically includes: when the picture-in-picture synthesis mode is multi-frame side by side or multi-frame side by side, determining the size of each frame of image corresponding to the first real-time video data according to the video parameters corresponding to the first real-time video data respectively; generating a blank image; and determining coordinate position sets of the plurality of first real-time video data in the blank images respectively according to the sizes of the blank images, the numbers of the first real-time video data and the sizes of each frame of image.
According to the embodiment of the application, the size of each path of video is determined according to the video data parameters, so that the arrangement positions of multiple paths of videos are set, and the multiple paths of videos are uniformly discharged. To enhance the visual effect of the generated picture-in-picture video data in a multi-frame side-by-side or side-by-side situation.
In one implementation of the present application, after determining the coordinate position sets of the plurality of first real-time video data in the blank images, the method further includes: acquiring display time stamps corresponding to the first real-time video data respectively; if the difference value among the display time stamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-grid spliced image in a preset splicing mode; wherein each cell in the multi-cell stitched image corresponds to a first real-time video data.
In one implementation manner of the present application, the second real-time video data is sent to a preset intelligent analysis module, so as to mark the target image in the second real-time video data through the preset intelligent analysis module, which specifically includes: inputting a current frame image corresponding to the second real-time video data into a preset target recognition model, and obtaining a labeling frame corresponding to the target image through the preset target recognition model; extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame; performing Hungary calculation based on the annotation frame and the predicted coordinate position information to perform cascade matching on the annotation frame and the predicted coordinate position information to obtain a matching set; and obtaining a motion trail of the target image according to the matching set so as to track and mark the target image in the second real-time video data through the motion trail.
According to the embodiment of the application, the labeling frame of the target image of the current frame is obtained by presetting the target recognition model, and the predicted coordinate position information of the current frame image is obtained according to the Kalman filter, so that the position tracking of the target image is realized. Under the condition that a plurality of target images appear in the same frame of image, different targets can be tracked and positioned, and tracking and labeling of the plurality of target images are realized.
In one implementation of the present application, the performing a picture-in-picture synthesis process on the plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting motion characteristics of a target image of a plurality of first real-time video data current frames based on a Kalman filter to obtain predicted coordinate position information of the target image in a next frame; and based on the predicted coordinate position information of the target image in the next frame, performing size adjustment on the plurality of first real-time video data to obtain picture-in-picture video data in a size frame superposition form.
In one implementation of the present application, after resizing the plurality of first real-time video data to obtain the pip video data in a form of overlapping a size frame, the method further includes: counting the number of the predicted coordinate position information in the first real-time video data again after each interval is preset; and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
According to the embodiment of the application, the number of the target images in the next frame corresponding to each path of video data can be obtained by counting the number of the predicted coordinate position information, so that each path of video data is overlapped and discharged according to the number of the target images, and the video data with more targets are placed on the upper layer, so that the shielding of the target images is reduced, and the accuracy rate of identifying the target images is improved.
The embodiment of the application provides a multi-path video analysis device, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
Embodiments of the present application provide a non-volatile computer storage medium storing computer-executable instructions configured to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
The above at least one technical scheme adopted by the embodiment of the application can achieve the following beneficial effects: according to the embodiment of the application, the first real-time video data respectively corresponding to the acquired multiple lenses are subjected to picture-in-picture synthesis processing, and the synthesized picture-in-picture video data are subjected to target analysis labeling. The synthesized video data can be used as single-path video data, so that analysis of all videos can be realized by analyzing the single-path video. The problem of high cost when analyzing each path of video independently is solved. The calculation force requirement of front-end equipment is reduced, so that the equipment cost is reduced. The popularization and application of the front-end analysis equipment in the living environment are facilitated.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art. Attached at
In the figure:
FIG. 1 is a flow chart of a multi-channel video analysis method according to an embodiment of the present application;
FIG. 2 is a block diagram of a multi-channel video analysis flow provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a multi-channel video analysis device according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a multi-path video analysis method, equipment and medium.
In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
Front-end intelligent analysis refers to intelligent analysis of video or image data at a front-end device (e.g., a webcam) to identify a target object. This functionality requires that the head-end equipment must have sufficient computing power, the higher the computing power the higher the hardware cost.
The multi-path video can be provided by the multi-camera, and the equipment with the front-end intelligent analysis technology has limited calculation power, so that the simultaneous video analysis of the multi-path video cannot be met, and the video analysis of a single-path video can be performed, so that only the target object in one path of video picture can be identified, and the target objects in other lenses can be omitted.
If multiple paths of videos are to be analyzed at the same time, the computing power of the equipment is generally required to be increased, so that the equipment cost is high, and the popularization and application of the equipment are affected.
In order to solve the above problems, embodiments of the present application provide a method, apparatus, and medium for multi-path video analysis. And carrying out picture-in-picture synthesis processing on the first real-time video data respectively corresponding to the acquired multiple lenses, and carrying out target analysis labeling on the synthesized picture-in-picture video data. The synthesized video data can be used as single-path video data, so that analysis of all videos can be realized by analyzing the single-path video. The problem of high cost when analyzing each path of video independently is solved. The calculation force requirement of front-end equipment is reduced, so that the equipment cost is reduced. The popularization and application of the front-end analysis equipment in the living environment are facilitated.
The following describes the technical scheme provided by the embodiment of the application in detail through the attached drawings.
Fig. 1 is a flowchart of a multi-path video analysis method according to an embodiment of the present application. As shown in fig. 1, the multi-path video analysis method includes the following steps:
s101, a multi-path video analysis device acquires first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera.
In one embodiment of the application, the multi-path video analysis device obtains YUV real-time video data corresponding to a plurality of lenses of the multi-view camera respectively through physical interfaces such as MIPI, USB and the like.
S102, the multipath video analysis equipment performs picture-in-picture synthesis processing on the plurality of first real-time video data, and takes the synthesized picture-in-picture video data as second real-time video data.
In one embodiment of the present application, when the picture-in-picture synthesis mode is multi-frame side-by-side or multi-frame side-by-side, the size of each frame of image corresponding to each of the plurality of first real-time video data is determined according to the video parameters corresponding to each of the plurality of first real-time video data. A blank image is generated. And determining coordinate position sets of the plurality of first real-time video data in the blank images respectively according to the sizes of the blank images, the numbers of the first real-time video data and the sizes of each frame of image.
Specifically, the picture-in-picture synthesizing mode in the embodiment of the application at least comprises two modes, wherein multiple frames are arranged side by side or side by side and the frames with the sizes are overlapped. But is not limited to the two types of picture-in-picture synthesis. When the picture-in-picture synthesis mode is that a plurality of frames are arranged side by side or side by side, the positions of the videos are ordered according to the image size corresponding to each first real-time video data and the number of the first real-time video data, so that the plurality of first real-time videos are uniformly discharged, the visual effect of the picture-in-picture videos is enhanced, shielding among the videos is reduced, and the accuracy rate of identifying target images is improved.
Further, according to the parameters corresponding to the first real-time video data, the image size corresponding to each video and the number of the first real-time video data can be obtained. The sizes of the paths of first real-time video data may be the same or different. Each path of first real-time video data is marked with a serial number in advance, for example, the serial number can be marked according to the position of the camera.
Further, the multipath video analysis equipment generates a blank image, determines the size of the blank image, and determines the size of each path of first real-time video data in the picture according to the size of the blank image and the number of the first real-time video data. And performing position allocation on the blank image according to the size to acquire the coordinate position of each first real-time video data in the blank image. The size of the video data is adjusted according to the size of the video data in the picture-in-picture. And sequentially discharging the plurality of first real-time video data according to the adjusted size and the marked serial numbers so as to realize the picture-in-picture video data in a multi-frame side-by-side or multi-frame side-by-side form.
In one embodiment of the present application, a plurality of display time stamps corresponding to the first real-time video data, respectively, are acquired. If the difference value between the display time stamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-grid spliced image in a preset splicing mode. Wherein each cell in the multi-cell stitched image corresponds to a first real-time video data.
Specifically, since the plurality of shots are taken separately and individually, after receiving the plurality of pieces of first real-time video data, in order to ensure time synchronization of the plurality of pieces of video in the synthesized picture-in-picture video data, it is necessary to compare time stamps of the plurality of pieces of first real-time video data. And under the condition that the time stamp difference value of the plurality of paths of first real-time video data is smaller than a preset threshold value, synthesizing the plurality of paths of first real-time video data into a multi-grid mosaic.
For example, the preset threshold may be set to 0.1S, and if the time stamp of the multiple paths of first real-time video data is less than 0.1S, the received multiple paths of first real-time video data are subjected to picture-in-picture synthesis.
It should be noted that, in the embodiment of the present application, the preset threshold is preferably set to 0.1S, and in application, the preset threshold may be adjusted according to actual situations.
In one embodiment of the present application, when the current pip synthesis mode is a size frame superposition mode, motion features of a target image of a current frame of a plurality of first real-time video data are extracted based on a kalman filter, so as to obtain predicted coordinate position information of the target image in a next frame. And based on the predicted coordinate position information of the target image in the next frame, performing size adjustment on the plurality of first real-time video data to obtain picture-in-picture video data in a size frame superposition form.
Specifically, when the current pip synthesis mode is a large and small frame overlapping mode, the pip synthesis effect is that video frames with smaller sizes are overlapped on video frames with larger sizes. At this time, a part of the target object is blocked. Therefore, in order to minimize the occluded target image, the motion characteristics of the target image of the current frame of the first real-time video data may be extracted by a kalman filter to predict the position of the target image in the next frame. In the same frame of image, there may be one or more target images. Accordingly, it is possible to superimpose and discharge a plurality of first real-time video data according to the number of predicted target images.
In one embodiment of the present application, the number of predicted coordinate position information in the first real-time video data is counted again after each interval for a preset period of time. And updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
Specifically, the number of the target images of the video images shot by each image capturing apparatus is continuously updated, so in order to reduce the number of the blocked target images in the overlapping emission, the number of the target images in each first real-time video data can be re-predicted after a period of time, so that the stacking sequence of the videos is changed again according to the number of the re-predicted target images. Thus, the video image with the larger target number is placed at the uppermost layer of the picture-in-picture video data, and the video image with the smallest target number is placed at the lowermost layer of the picture-in-picture video data.
In one embodiment of the present application, after the number of predicted coordinate position information corresponding to the next frame image is acquired, the plurality of first real-time video data is overlapped and discharged from top to bottom according to the number from at least more. And taking the area except the predicted coordinate position information in the next frame image as a background area. And carrying out size adjustment on the plurality of first real-time video data so that the video image of the upper layer is overlapped in the background area corresponding to the video image of the lower layer. Wherein the size of the next layer of video image is larger than the size of the previous layer of video image. Thereby minimizing occlusion of the target image in the underlying video data.
According to the embodiment of the application, the number of the target images in the next frame corresponding to each path of video data can be obtained by counting the number of the predicted coordinate position information, so that each path of video data is overlapped and discharged according to the number of the target images, and the video data with more targets are placed on the upper layer, so that the shielding of the target images is reduced, and the accuracy rate of identifying the target images is improved.
And S103, the multipath video analysis equipment transmits the second real-time video data to a preset intelligent analysis module so as to mark the target image in the second real-time video data through the preset intelligent analysis module.
In one embodiment of the application, the current frame image corresponding to the second real-time video data is input into a preset target recognition model, and the annotation frame corresponding to the target image is obtained through the preset target recognition model. And extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame. And performing Hungary calculation based on the annotation frame and the predicted coordinate position information so as to perform cascade matching on the annotation frame and the predicted coordinate position information to obtain a matching set. And obtaining a motion trail of the target image according to the matching set so as to track and mark the target image in the second real-time video data through the motion trail.
Specifically, training the neural network model according to a sample set corresponding to the pre-acquired second real-time video data to obtain a preset target recognition model. The preset target recognition model can recognize and mark the target image in the input second real-time video data. And inputting the current frame image corresponding to the current second real-time video data into a preset target recognition model to obtain the label frame position corresponding to the target image. The target is tracked according to Kalman filtering, i.e. a Kalman filter is used to predict the motion state of the target. And predicting the position of each target in the current frame image according to the position of the marking frame corresponding to each target in the previous frame image, namely predicting coordinate position information of each target in the current frame. For example, a standard Kalman filter based on a constant velocity model and a linear observation model may be used to predict the motion state of each target, so as to obtain the predicted position of each target in the current frame.
Further, cosine distances between the features of each target image and the depth features in the stored depth feature set are calculated, and a cosine distance matrix is generated. And calculating the mahalanobis distance between the predicted position of the current frame and the corresponding detection frame position of each target image. The cosine distance matrix is preprocessed, wherein elements with the Marsh distance larger than a first preset threshold value in the cosine distance matrix are set to infinity, and elements with the cosine distance larger than a second preset threshold value in the cosine distance matrix are set to a larger value. The first preset threshold and the second preset threshold can be set according to different scenes. Based on the preprocessed cosine distance matrix, the annotation frame corresponding to each target image and the predicted coordinate position information are subjected to first matching by using a Hungary algorithm, so that the similarity score between the annotation frame corresponding to each target image and the predicted coordinate position information can be obtained, and a matching set is obtained. And obtaining a motion trail of the target image according to the matching set so as to track and mark the target image in the second real-time video data through the motion trail. And the identification result can be uploaded to the client platform.
According to the embodiment of the application, the labeling frame of the target image of the current frame is obtained by presetting the target recognition model, and the predicted coordinate position information of the current frame image is obtained according to the Kalman filter, so that the position tracking of the target image is realized. Under the condition that a plurality of target images appear in the same frame of image, different targets can be tracked and positioned, and tracking and labeling of the plurality of target images are realized.
S104, the multipath video analysis equipment encodes the second real-time video data with the marks to obtain compressed video data.
In one embodiment of the present application, YUV data corresponding to the second real-time video data with the flag is transferred to an encoder to obtain encoded H264 or H265 data. And transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.
Specifically, the multipath single-lens video YUV data is continuously conveyed to an encoder, and encoded H264 or H265 data is continuously acquired. And continuously transmitting the YUV data after the picture-in-picture synthesis to an encoder, and continuously acquiring encoded H264 or H265 data.
S105, receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
In one embodiment of the application, first real-time video data requests respectively sent by different users are received. Wherein the first real-time video data request includes at least any one or more paths of video data. And transmitting one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
Specifically, the embodiment of the application supports multiple clients to simultaneously request different video streams. Determining any one or more paths of videos requested by the client based on parameters specified in the request, and transmitting the videos through a streaming media transmission protocol, such as: RTP or RTCP protocol pushes client-specific real-time video data to clients simultaneously. Because the PIP effect is served by a single video, the client can also request the video data independently, i.e. the PIP effect can be seen.
Fig. 2 is a flow chart of multi-channel video analysis according to an embodiment of the present application. As shown in fig. 2, the multi-path video analysis flow includes:
in one embodiment of the present application, YUN data corresponding to each lens in the multi-view camera is obtained, and the obtained plurality of YUN data are combined into one new YUN data, that is, the picture-in-picture video data is combined. And coding the YUN data corresponding to each lens in the multi-view camera to generate H264 or H265 data corresponding to each path of video.
In one embodiment of the application, the synthesized picture-in-picture video data is delivered to an intelligent analysis module for analysis of the target image in the picture-in-picture video data by the intelligent analysis module. At this time, the recognition result may be directly uploaded to the user platform. And marking the identified target object according to the analysis result to form YUN data with marking frames or marks. And coding the YUN data with the label frame or the label to form H264 or H265 data corresponding to the picture-in-picture video data.
In one embodiment of the application, one or more channels of video data may be simultaneously transmitted to respective clients upon request by the clients.
Fig. 3 is a schematic structural diagram of a multi-channel video analysis device according to an embodiment of the present application. As shown in fig. 3, the multi-path video analysis apparatus includes:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
encoding the second real-time video data with the marks to obtain compressed video data;
and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
Embodiments of the present application provide a non-volatile computer storage medium storing computer-executable instructions configured to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
encoding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
The embodiments of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for apparatus, devices, non-volatile computer storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the section of the method embodiments being relevant.
The foregoing describes certain embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the embodiments of the application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the claims of the present application.

Claims (8)

1. A method of multi-path video analysis, the method comprising:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
encoding the second real-time video data with the marks to obtain compressed video data;
receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user;
the process of performing the picture-in-picture synthesis processing on the plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting motion characteristics of a target image of a current frame of the plurality of first real-time video data based on a Kalman filter to obtain predicted coordinate position information of the target image in a next frame;
based on the predicted coordinate position information of the target image in the next frame, performing size adjustment on the plurality of first real-time video data to obtain picture-in-picture video data in a size frame superposition form;
counting the number of the predicted coordinate position information in the first real-time video data again after each interval of preset time length;
and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
2. The method for multi-channel video analysis according to claim 1, wherein the encoding the second real-time video data with the tag to obtain compressed video data specifically comprises:
transmitting YUV data corresponding to the second real-time video data with the mark to an encoder to obtain encoded H264 or H265 data; and
and transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.
3. The method according to claim 2, wherein after the YUV data corresponding to each of the first real-time video data is transmitted to the encoder to obtain encoded H264 or H265 data, the method further comprises:
receiving the first real-time video data requests respectively sent by different users; wherein the first real-time video data request at least comprises any one or more paths of video data;
and transmitting one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
4. The method of claim 1, wherein the performing a picture-in-picture synthesis process on the plurality of first real-time video data comprises:
when the picture-in-picture synthesis mode is multi-frame side by side or multi-frame side by side, determining the size of each frame of image corresponding to the plurality of first real-time video data according to the video parameters corresponding to the plurality of first real-time video data respectively;
generating a blank image;
and determining coordinate position sets of the plurality of first real-time video data in the blank images respectively according to the size of the blank images, the numbers of the first real-time video data and the size of each frame of image.
5. The method of claim 4, wherein after determining the respective sets of coordinate positions of the plurality of first real-time video data in the blank image, the method further comprises:
acquiring display time stamps corresponding to the plurality of first real-time video data respectively;
if the difference value among the display time stamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-grid spliced image in a preset splicing mode; wherein each cell in the multi-cell stitched image corresponds to one of the first real-time video data.
6. The method of claim 1, wherein the delivering the second real-time video data to a preset intelligent analysis module to mark the target image in the second real-time video data by the preset intelligent analysis module specifically comprises:
inputting the current frame image corresponding to the second real-time video data into a preset target recognition model, and obtaining a labeling frame corresponding to a target image through the preset target recognition model;
extracting the motion characteristics of a target image of a previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame;
performing Hungary calculation based on the annotation frame and the predicted coordinate position information to perform cascade matching on the annotation frame and the predicted coordinate position information to obtain a matching set;
and obtaining the motion trail of the target image according to the matching set so as to track and mark the target image in the second real-time video data through the motion trail.
7. A multi-path video analysis device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
encoding the second real-time video data with the marks to obtain compressed video data;
receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user;
the process of performing the picture-in-picture synthesis processing on the plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting motion characteristics of a target image of a current frame of the plurality of first real-time video data based on a Kalman filter to obtain predicted coordinate position information of the target image in a next frame;
based on the predicted coordinate position information of the target image in the next frame, performing size adjustment on the plurality of first real-time video data to obtain picture-in-picture video data in a size frame superposition form;
counting the number of the predicted coordinate position information in the first real-time video data again after each interval of preset time length;
and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
8. A non-volatile computer storage medium storing computer-executable instructions, the computer
The executable instructions are configured to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
performing picture-in-picture synthesis processing on the plurality of first real-time video data, and taking the synthesized picture-in-picture video data as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
encoding the second real-time video data with the marks to obtain compressed video data;
receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user;
the process of performing the picture-in-picture synthesis processing on the plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting motion characteristics of a target image of a current frame of the plurality of first real-time video data based on a Kalman filter to obtain predicted coordinate position information of the target image in a next frame;
based on the predicted coordinate position information of the target image in the next frame, performing size adjustment on the plurality of first real-time video data to obtain picture-in-picture video data in a size frame superposition form;
counting the number of the predicted coordinate position information in the first real-time video data again after each interval of preset time length;
and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
CN202111532772.6A 2021-12-15 2021-12-15 Multipath video analysis method, equipment and medium Active CN114500871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111532772.6A CN114500871B (en) 2021-12-15 2021-12-15 Multipath video analysis method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111532772.6A CN114500871B (en) 2021-12-15 2021-12-15 Multipath video analysis method, equipment and medium

Publications (2)

Publication Number Publication Date
CN114500871A CN114500871A (en) 2022-05-13
CN114500871B true CN114500871B (en) 2023-11-14

Family

ID=81493135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111532772.6A Active CN114500871B (en) 2021-12-15 2021-12-15 Multipath video analysis method, equipment and medium

Country Status (1)

Country Link
CN (1) CN114500871B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550608A (en) * 2022-09-19 2022-12-30 国网智能科技股份有限公司 Multi-user high-concurrency AI video real-time fusion display control method and system
CN115988258B (en) * 2023-03-17 2023-06-23 广州佰锐网络科技有限公司 Video communication method, storage medium and system based on internet of things (IoT) device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194443A (en) * 2010-03-04 2011-09-21 腾讯科技(深圳)有限公司 Display method and system for window of video picture in picture and video processing equipment
WO2017005096A2 (en) * 2015-07-06 2017-01-12 阿里巴巴集团控股有限公司 Method and device for encoding multiple video streams
WO2017211250A1 (en) * 2016-06-08 2017-12-14 深圳创维数字技术有限公司 Image overlay display method and system
CN109711320A (en) * 2018-12-24 2019-05-03 兴唐通信科技有限公司 A kind of operator on duty's unlawful practice detection method and system
CN110321806A (en) * 2019-06-12 2019-10-11 浙江大华技术股份有限公司 Object detection method, image processing equipment and the equipment with store function
CN111010605A (en) * 2019-11-26 2020-04-14 杭州东信北邮信息技术有限公司 Method for displaying video picture-in-picture window
CN112637550A (en) * 2020-11-18 2021-04-09 合肥市卓迩无人机科技服务有限责任公司 PTZ moving target tracking method for multi-path 4K quasi-real-time spliced video
CN112884811A (en) * 2021-03-18 2021-06-01 中国人民解放军国防科技大学 Photoelectric detection tracking method and system for unmanned aerial vehicle cluster
CN113612922A (en) * 2021-07-29 2021-11-05 重庆赛迪奇智人工智能科技有限公司 Video processing method and device, electronic equipment and computer readable storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3094082A1 (en) * 2015-05-13 2016-11-16 AIM Sport Vision AG Digitally overlaying an image with another image
CN112800805A (en) * 2019-10-28 2021-05-14 上海哔哩哔哩科技有限公司 Video editing method, system, computer device and computer storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194443A (en) * 2010-03-04 2011-09-21 腾讯科技(深圳)有限公司 Display method and system for window of video picture in picture and video processing equipment
WO2017005096A2 (en) * 2015-07-06 2017-01-12 阿里巴巴集团控股有限公司 Method and device for encoding multiple video streams
WO2017211250A1 (en) * 2016-06-08 2017-12-14 深圳创维数字技术有限公司 Image overlay display method and system
CN109711320A (en) * 2018-12-24 2019-05-03 兴唐通信科技有限公司 A kind of operator on duty's unlawful practice detection method and system
CN110321806A (en) * 2019-06-12 2019-10-11 浙江大华技术股份有限公司 Object detection method, image processing equipment and the equipment with store function
CN111010605A (en) * 2019-11-26 2020-04-14 杭州东信北邮信息技术有限公司 Method for displaying video picture-in-picture window
CN112637550A (en) * 2020-11-18 2021-04-09 合肥市卓迩无人机科技服务有限责任公司 PTZ moving target tracking method for multi-path 4K quasi-real-time spliced video
CN112884811A (en) * 2021-03-18 2021-06-01 中国人民解放军国防科技大学 Photoelectric detection tracking method and system for unmanned aerial vehicle cluster
CN113612922A (en) * 2021-07-29 2021-11-05 重庆赛迪奇智人工智能科技有限公司 Video processing method and device, electronic equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
视频监控***中指针图像识别研究;司佳伟;《优秀硕士论文》;全文 *

Also Published As

Publication number Publication date
CN114500871A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN114500871B (en) Multipath video analysis method, equipment and medium
US9392322B2 (en) Method of visually synchronizing differing camera feeds with common subject
CN104137146B (en) For the method and system of the Video coding of the noise filtering of Utilization prospects Object Segmentation
Feng et al. Viewport prediction for live 360-degree mobile video streaming using user-content hybrid motion tracking
CN107004271B (en) Display method, display apparatus, electronic device, computer program product, and storage medium
CN104539929B (en) Stereo-image coding method and code device with motion prediction
CN100542303C (en) A kind of method for correcting multi-viewpoint vedio color
KR20150050172A (en) Apparatus and Method for Selecting Multi-Camera Dynamically to Track Interested Object
EP1315123A3 (en) Scalable architecture for establishing correspondence of multiple video streams at frame rate
US20200045363A1 (en) Gaze-Responsive Advertisement
CN102467661A (en) Multimedia device and method for controlling the same
US20210289145A1 (en) Transporting ultra-high definition video from multiple sources
JP2014116716A (en) Tracking device
Feng et al. LiveROI: region of interest analysis for viewport prediction in live mobile virtual reality streaming
Iashin et al. Sparse in space and time: Audio-visual synchronisation with trainable selectors
CN117221627A (en) Multi-view synchronization method and free view system
CN115174941B (en) Real-time motion performance analysis and real-time data sharing method based on multiple paths of video streams
CN112001224A (en) Video acquisition method and video acquisition system based on convolutional neural network
AU2018230038B2 (en) Transporting ultra-high definition video from multiple sources
JP5864371B2 (en) Still image automatic generation system, worker information processing terminal, instructor information processing terminal, and determination device in still image automatic generation system
CN108428241A (en) The movement locus catching method of mobile target in HD video
CN114040184A (en) Image display method, system, storage medium and computer program product
CN114830674A (en) Transmitting apparatus and receiving apparatus
AU2017392150B2 (en) Method for encoding and processing raw UHD video via an existing HD video architecture
US20180115591A1 (en) Marking Objects of Interest in a Streaming Video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant