CN114500871A - Multi-channel video analysis method, equipment and medium - Google Patents

Multi-channel video analysis method, equipment and medium Download PDF

Info

Publication number
CN114500871A
CN114500871A CN202111532772.6A CN202111532772A CN114500871A CN 114500871 A CN114500871 A CN 114500871A CN 202111532772 A CN202111532772 A CN 202111532772A CN 114500871 A CN114500871 A CN 114500871A
Authority
CN
China
Prior art keywords
video data
real
time video
picture
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111532772.6A
Other languages
Chinese (zh)
Other versions
CN114500871B (en
Inventor
韩振
蔡富东
孔志强
陈雷
李在学
王海慧
马景行
朱朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Senter Electronic Co Ltd
Original Assignee
Shandong Senter Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Senter Electronic Co Ltd filed Critical Shandong Senter Electronic Co Ltd
Priority to CN202111532772.6A priority Critical patent/CN114500871B/en
Publication of CN114500871A publication Critical patent/CN114500871A/en
Application granted granted Critical
Publication of CN114500871B publication Critical patent/CN114500871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/90Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/236Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • H04N21/4316Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4858End-user interface for client configuration for modifying screen layout parameters, e.g. fonts, size of the windows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/265Mixing

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Computer Graphics (AREA)
  • Human Computer Interaction (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a multi-channel video analysis method, equipment and a medium. Acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user. By the method, the aim of analyzing multiple paths of videos by low-computing-power equipment at the same time is fulfilled.

Description

Multi-channel video analysis method, equipment and medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a medium for multi-channel video analysis.
Background
Front-end intelligent analysis refers to performing intelligent analysis on video or image data at a front-end device (such as a network camera) so as to identify a target object. This functionality requires that the front-end equipment must have sufficient computing power, the higher the computing power the higher the hardware cost.
The multi-view camera can provide multi-path videos, and due to the fact that the equipment with the front-end intelligent analysis technology is limited in computing power, simultaneous video analysis of the multi-path videos cannot be achieved, video analysis can only be conducted on single-path videos, therefore, only target objects in one path of video pictures can be identified, and target objects in other lenses can be omitted.
If a plurality of videos are analyzed simultaneously, the calculation power of the equipment is generally required to be increased, so that the cost of the equipment is high, and the popularization and application of the equipment are influenced.
Disclosure of Invention
The embodiment of the application provides a multi-channel video analysis method, equipment and a medium, which are used for solving the following technical problems: when a plurality of paths of videos are analyzed simultaneously, the calculation power of equipment is generally required to be increased, so that the cost of the equipment is high, and the popularization and application of the equipment are influenced.
The embodiment of the application adopts the following technical scheme:
the embodiment of the application provides a multi-channel video analysis method. The method comprises the steps of obtaining first real-time video data corresponding to a plurality of lenses of a multi-view camera respectively; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
According to the embodiment of the application, picture-in-picture synthesis processing is carried out on the acquired first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.
In an implementation manner of the present application, encoding the second real-time video data with the mark to obtain the compressed video data specifically includes: transmitting YUV data corresponding to the second real-time video data with the marks to an encoder to obtain encoded H264 or H265 data; and transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.
In an implementation manner of the present application, after the YUV data corresponding to the plurality of first real-time video data are respectively transmitted to an encoder to obtain encoded H264 or H265 data, the method further includes: receiving first real-time video data requests respectively sent by different users; the first real-time video data request at least comprises any one or more paths of video data; and simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
In an implementation manner of the present application, performing picture-in-picture synthesis processing on a plurality of first real-time video data specifically includes: when the picture-in-picture synthesis mode is multi-frame side-by-side or multi-frame side-by-side, determining the size of each frame image corresponding to the first real-time video data according to the video parameters corresponding to the first real-time video data; generating a blank image; and determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.
According to the video data parameters, the size of each path of video is determined, so that the arrangement positions of multiple paths of videos are set, and the multiple paths of videos are uniformly discharged. To improve the visual effect of the generated picture-in-picture video data under the condition of side-by-side or parallel multi-frames.
In one implementation of the present application, after determining sets of coordinate positions of the plurality of first real-time video data in the blank image, the method further includes: acquiring display timestamps corresponding to a plurality of first real-time video data respectively; if the difference value between the display timestamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-lattice spliced image in a preset splicing mode; each grid in the multi-grid spliced image corresponds to one first real-time video data.
In an implementation manner of the present application, the second real-time video data is transmitted to the preset intelligent analysis module, so as to mark a target image in the second real-time video data through the preset intelligent analysis module, which specifically includes: inputting a current frame image corresponding to the second real-time video data into a preset target identification model, and obtaining a marking frame corresponding to the target image through the preset target identification model; extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame; performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set; and obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.
According to the method and the device, the marking frame of the current frame target image is obtained through the preset target identification model, and the predicted coordinate position information of the current frame image is obtained according to the Kalman filter, so that the position tracking of the target image is realized. Under the condition that a plurality of target images appear in the same frame image, different targets can be tracked and positioned, and tracking and labeling of the plurality of target images are realized.
In an implementation manner of the present application, performing picture-in-picture synthesis processing on a plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting the motion characteristics of the target images of the current frames of the plurality of first real-time video data based on a Kalman filter to obtain the predicted coordinate position information of the target images in the next frame; and carrying out size adjustment on the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a size frame overlapping mode.
In an implementation manner of the present application, after resizing a plurality of first real-time video data to obtain picture-in-picture video data in a size-frame superposition form, the method further includes: after every preset time interval, counting the quantity of the predicted coordinate position information in the first real-time video data again; and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
According to the embodiment of the application, the number of the target images in the next frame corresponding to each path of video data can be obtained by counting the number of the predicted coordinate position information, so that each path of video data is overlapped and discharged according to the number of the target images, the video data with a large number of targets is placed on the upper layer, the shielding of the target images is reduced, and the accuracy of target image identification is improved.
The embodiment of the application provides a multichannel video analysis equipment, includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
An embodiment of the present application provides a non-volatile computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: according to the embodiment of the application, picture-in-picture synthesis processing is carried out on the acquired first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. On the attached sheet
In the figure:
fig. 1 is a flowchart of a multi-channel video analysis method according to an embodiment of the present disclosure;
fig. 2 is a block diagram of a multi-channel video analysis process according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a multi-channel video analysis apparatus according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a multi-channel video analysis method, equipment and medium.
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.
Front-end intelligent analysis refers to performing intelligent analysis on video or image data at a front-end device (such as a network camera) so as to identify a target object. This functionality requires that the front-end equipment must have sufficient computing power, the higher the computing power the higher the hardware cost.
The multi-view camera can provide multi-path videos, and due to the fact that the equipment with the front-end intelligent analysis technology is limited in computing power, simultaneous video analysis of the multi-path videos cannot be achieved, video analysis can only be conducted on single-path videos, therefore, only target objects in one path of video pictures can be identified, and target objects in other lenses can be omitted.
If a plurality of videos are analyzed simultaneously, the calculation power of the equipment is generally required to be increased, so that the cost of the equipment is high, and the popularization and application of the equipment are influenced.
In order to solve the above problem, embodiments of the present application provide a method, an apparatus, and a medium for multi-channel video analysis. Picture-in-picture synthesis processing is carried out on the obtained first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a multi-channel video analysis method according to an embodiment of the present disclosure. As shown in fig. 1, the multi-channel video analysis method includes the following steps:
s101, acquiring first real-time video data corresponding to a plurality of lenses of a multi-view camera by a multi-path video analysis device.
In an embodiment of the application, the multi-channel video analysis device obtains YUV real-time video data corresponding to a plurality of lenses of the multi-view camera through physical interfaces such as MIPI and USB.
S102, the multi-channel video analysis equipment carries out picture-in-picture synthesis processing on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data.
In an embodiment of the application, when the pip composition manner is multi-frame parallel or multi-frame parallel, the size of each frame of image corresponding to the plurality of first real-time video data is determined according to the video parameters corresponding to the plurality of first real-time video data, respectively. A blank image is generated. And determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.
Specifically, the picture-in-picture combining manner in the embodiment of the present application includes at least two, in which multiple frames are placed side by side or in parallel and large and small frames are placed in an overlapping manner. But is not limited to the two picture-in-picture combining methods. When the picture-in-picture synthesis mode is that multiple frames are arranged side by side or in parallel, the positions of the videos are sequenced according to the image size corresponding to each first real-time video data and the number of the first real-time video data, so that the multiple first real-time videos are uniformly distributed, the visual effect of the picture-in-picture videos is enhanced, the shielding among the videos is reduced, and the accuracy of target image identification is improved.
Further, according to the parameters corresponding to the first real-time video data, the image size corresponding to each video and the number of the first real-time video data can be obtained. The sizes of the first real-time video data of the channels may be the same or different. The sequence number of each path of first real-time video data is labeled in advance, and for example, the sequence number labeling can be performed according to the position of a camera.
Furthermore, the multichannel video analysis equipment generates blank images, determines the size of the blank images, and determines the size of each channel of first real-time video data in a picture-in-picture according to the size of the blank images and the number of the first real-time video data. And carrying out position distribution on the blank image according to the size so as to obtain the coordinate position of each first real-time video data in the blank image. And adjusting the size of the video data according to the size of the video data in the picture-in-picture. And sequentially arranging the plurality of first real-time video data according to the adjusted sizes and the marked serial numbers so as to realize picture-in-picture video data in a multi-frame side-by-side or multi-frame side-by-side form.
In an embodiment of the present application, display timestamps corresponding to a plurality of first real-time video data are obtained. And if the difference value between the display timestamps is smaller than a preset threshold value, synthesizing the first real-time video data into a multi-lattice spliced image in a preset splicing mode. Each grid in the multi-grid spliced image corresponds to one first real-time video data.
Specifically, since the plurality of shots separately and individually perform video shooting, after receiving the plurality of paths of first real-time video data, in order to ensure time synchronization of the plurality of paths of video in the combined picture-in-picture video data, time stamps of the plurality of paths of first real-time video data need to be compared. And under the condition that the difference value of the timestamps of the multiple paths of first real-time video data is smaller than a preset threshold value, synthesizing the multiple paths of first real-time video data into a multi-lattice splicing map.
For example, the preset threshold may be set to 0.1S, and in the case that the timestamps of the multiple paths of first real-time video data are less than 0.1S, the received multiple paths of first real-time video data are combined in picture.
It should be noted that, in the embodiment of the present application, the preset threshold is preferably set to 0.1S, and the preset threshold may be subjected to data adjustment according to actual situations in application.
In an embodiment of the application, when the current picture-in-picture synthesis manner is a large-small frame superposition manner, motion characteristics of target images of a plurality of current frames of first real-time video data are extracted based on a kalman filter, so as to obtain predicted coordinate position information of the target images in a next frame. And adjusting the sizes of the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a large-frame and small-frame overlapping mode.
Specifically, when the current picture-in-picture synthesizing manner is a large-and-small frame superimposing manner, the picture-in-picture synthesizing effect is to superimpose a video frame with a smaller size onto a video frame with a larger size. At this time, a part of the target object may be occluded. Therefore, in order to reduce the occluded target image as much as possible, the kalman filter may be used to extract the motion features of the target image of the current frame of the first real-time video data so as to predict the position of the target image in the next frame. In the same frame image, there may be one or more target images. Therefore, the plurality of first real-time video data can be overlapped and discharged according to the predicted number of the target images.
In an embodiment of the present application, after every preset time interval, the number of predicted coordinate position information in the first real-time video data is counted again. And updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
Specifically, the video images captured by the respective image capturing apparatuses are transformed in real time, and the number of target images thereof is also constantly updated, so that, in order to reduce the number of target images that are occluded when overlapped and laid out, the number of target images in the respective first live video data may be newly predicted at intervals, so that the stacking order of the videos may be newly changed according to the newly predicted number of target images. Therefore, the video image with a larger target number is placed at the uppermost layer of the picture-in-picture video data, and the video image with a minimum target image number is placed at the lowermost layer of the picture-in-picture video data.
In an embodiment of the application, after the number of the predicted coordinate position information corresponding to the next frame of image is obtained, the plurality of first real-time video data are overlapped and arranged from top to bottom according to the number of the predicted coordinate position information. And taking the area except the predicted coordinate position information in the next frame image as a background area. And carrying out size adjustment on the plurality of first real-time video data to enable the previous layer of video images to be stacked in the background area corresponding to the next layer of video images. Wherein the size of the next layer video image is larger than that of the previous layer video image. Thereby minimizing occlusion of the target image in the underlying video data.
According to the embodiment of the application, the number of the target images in the next frame corresponding to each path of video data can be obtained by counting the number of the predicted coordinate position information, so that each path of video data is overlapped and discharged according to the number of the target images, the video data with a large number of targets is placed on the upper layer, the shielding of the target images is reduced, and the accuracy of target image identification is improved.
S103, the multi-channel video analysis equipment transmits the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module.
In an embodiment of the application, a current frame image corresponding to the second real-time video data is input into a preset target recognition model, and a labeling frame corresponding to the target image is obtained through the preset target recognition model. And extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame. And performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set. And obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.
Specifically, the neural network model is trained according to a sample set corresponding to pre-collected second real-time video data to obtain a preset target recognition model. The preset target identification model can identify and label a target image in the input second real-time video data. And inputting the current frame image corresponding to the current second real-time video data into a preset target identification model to obtain the position of the marking frame corresponding to the target image. The target is tracked according to Kalman filtering, namely a Kalman filter is used for predicting the motion state of the target. And predicting the position of each target in the current frame image according to the position of the labeling frame corresponding to each target in the previous frame image, namely predicting the position information of each target in the current frame. For example, a standard Kalman filter based on a constant velocity model and a linear observation model may be used to predict the motion state of each target, and obtain the predicted position of each target in the current frame.
Further, the cosine distance between the features of each target image and the depth features in the stored depth feature set is calculated, and a cosine distance matrix is generated. And calculating the Mahalanobis distance between the predicted position of each target image in the current frame and the corresponding detection frame position. And preprocessing the cosine distance matrix, wherein the elements of which the Mahalanobis distance is greater than a first preset threshold value in the cosine distance matrix are set to be infinite, and the elements of which the cosine distance is greater than a second preset threshold value in the cosine distance matrix are set to be a larger value. The first preset threshold and the second preset threshold can be set according to different scenes. And based on the preprocessed cosine distance matrix, performing first matching on the labeling boxes corresponding to the target images and the predicted coordinate position information by using a Hungarian algorithm, so as to obtain similarity scores between the labeling boxes corresponding to the target images and the predicted coordinate position information, and obtain a matching set. And obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track. And meanwhile, the identification result can be uploaded to the client platform.
According to the method and the device, the marking frame of the current frame target image is obtained through the preset target identification model, and the predicted coordinate position information of the current frame image is obtained according to the Kalman filter, so that the position tracking of the target image is realized. Under the condition that a plurality of target images appear in the same frame image, different targets can be tracked and positioned, and tracking and labeling of the plurality of target images are realized.
And S104, the multi-channel video analysis equipment encodes the second real-time video data with the marks to obtain compressed video data.
In an embodiment of the present application, YUV data corresponding to the marked second real-time video data is sent to an encoder to obtain encoded H264 or H265 data. And transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.
Specifically, the multiple paths of single-lens video YUV data are continuously transmitted to the encoder, and encoded H264 or H265 data are continuously acquired. And continuously transmitting the synthesized picture-in-picture YUV data to an encoder, and continuously acquiring encoded H264 or H265 data.
And S105, receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
In one embodiment of the present application, first real-time video data requests respectively transmitted by different users are received. The first real-time video data request at least comprises any one or more paths of video data. And simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
Specifically, the embodiment of the application supports multiple clients to request different video streams simultaneously. Any one or more videos requested by the client are determined based on parameters specified in the request, via a streaming media transport protocol, such as: RTP or RTCP protocol, which pushes real-time video data specified by the client to the client at the same time. Since the picture-in-picture effect is served by a single video channel, the client can also request the video data channel independently, i.e. the picture-in-picture effect can be seen.
Fig. 2 is a block diagram of a multi-channel video analysis process according to an embodiment of the present disclosure. As shown in fig. 2, the multi-channel video analysis process includes:
in an embodiment of the present application, YUN data corresponding to each lens in the multi-view camera is obtained, and the obtained multiple paths of YUN data are combined into a new path of YUN data, that is, picture-in-picture video data combination is performed. And coding YUN data corresponding to each lens in the multi-view camera to generate H264 or H265 data corresponding to each path of video.
In one embodiment of the present application, the synthesized picture-in-picture video data is conveyed to an intelligent analysis module for analysis of a target image in the picture-in-picture video data by the intelligent analysis module. At this time, the recognition result can be directly uploaded to the user platform. And marking the identified target object according to the analysis result to form YUN data with a marking frame or a mark. And coding the YUN data with the mark box or mark to form H264 or H265 data corresponding to the picture-in-picture video data.
In one embodiment of the present application, one or more video data may be simultaneously transmitted to respective clients upon request by the clients.
Fig. 3 is a schematic structural diagram of a multi-channel video analysis apparatus according to an embodiment of the present application. As shown in fig. 3, the multi-channel video analysis apparatus includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data;
and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
An embodiment of the present application provides a non-volatile computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device, and non-volatile computer storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to the partial description of the method embodiments for relevant points.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the embodiments of the present application pertain. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. A method for multi-channel video analysis, the method comprising:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data;
and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
2. The method according to claim 1, wherein said encoding the second real-time video data with the flag to obtain compressed video data comprises:
transmitting the YUV data corresponding to the second real-time video data with the marks to an encoder to obtain encoded H264 or H265 data; and
and transmitting the YUV data corresponding to the plurality of first real-time video data to an encoder to obtain encoded H264 or H265 data.
3. The method of claim 2, wherein after the YUV data corresponding to the first real-time video data are respectively transmitted to the encoder to obtain encoded H264 or H265 data, the method further comprises:
receiving the first real-time video data requests respectively sent by different users; the first real-time video data request at least comprises any one or more paths of video data;
and simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
4. The method according to claim 1, wherein the step of picture-in-picture synthesizing the plurality of first real-time video data comprises:
when the picture-in-picture synthesis mode is multi-frame side-by-side or multi-frame side-by-side, determining the size of each frame of image corresponding to the first real-time video data according to the video parameters corresponding to the first real-time video data;
generating a blank image;
and determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.
5. The method of claim 4, wherein after determining the set of coordinate locations of the first real-time video data in the blank image, the method further comprises:
acquiring display time stamps corresponding to the plurality of first real-time video data respectively;
if the difference value between the display timestamps is smaller than a preset threshold value, combining the first real-time video data into a multi-lattice spliced image in a preset splicing mode; wherein each cell in the multi-cell stitched image corresponds to one of the first real-time video data.
6. The method according to claim 1, wherein the step of sending the second real-time video data to a preset intelligent analysis module to mark a target image in the second real-time video data by the preset intelligent analysis module comprises:
inputting a current frame image corresponding to the second real-time video data into a preset target identification model, and obtaining a labeling frame corresponding to a target image through the preset target identification model;
extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame;
performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set;
and obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.
7. The method according to claim 1, wherein the step of picture-in-picture synthesizing the plurality of first real-time video data comprises:
when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting the motion characteristics of the target images of the current frames of the plurality of first real-time video data based on a Kalman filter to obtain the predicted coordinate position information of the target images in the next frame;
and adjusting the sizes of the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a large-frame and small-frame overlapping mode.
8. The method of claim 7, wherein after resizing the first real-time video data to obtain the pip video data in a size-box-overlapped form, the method further comprises:
after every preset time interval, counting the number of the predicted coordinate position information in the first real-time video data again;
and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
9. A multi-channel video analysis apparatus comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data;
and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
10. A non-transitory computer storage medium storing computer-executable instructions configured to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
CN202111532772.6A 2021-12-15 2021-12-15 Multipath video analysis method, equipment and medium Active CN114500871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111532772.6A CN114500871B (en) 2021-12-15 2021-12-15 Multipath video analysis method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111532772.6A CN114500871B (en) 2021-12-15 2021-12-15 Multipath video analysis method, equipment and medium

Publications (2)

Publication Number Publication Date
CN114500871A true CN114500871A (en) 2022-05-13
CN114500871B CN114500871B (en) 2023-11-14

Family

ID=81493135

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111532772.6A Active CN114500871B (en) 2021-12-15 2021-12-15 Multipath video analysis method, equipment and medium

Country Status (1)

Country Link
CN (1) CN114500871B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550608A (en) * 2022-09-19 2022-12-30 国网智能科技股份有限公司 Multi-user high-concurrency AI video real-time fusion display control method and system
CN115988258A (en) * 2023-03-17 2023-04-18 广州佰锐网络科技有限公司 IoT (Internet of things) -based video communication method, storage medium and system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194443A (en) * 2010-03-04 2011-09-21 腾讯科技(深圳)有限公司 Display method and system for window of video picture in picture and video processing equipment
WO2017005096A2 (en) * 2015-07-06 2017-01-12 阿里巴巴集团控股有限公司 Method and device for encoding multiple video streams
WO2017211250A1 (en) * 2016-06-08 2017-12-14 深圳创维数字技术有限公司 Image overlay display method and system
US20180122144A1 (en) * 2015-05-13 2018-05-03 Aim Sport Vision Ag Digitally overlaying an image with another image
CN109711320A (en) * 2018-12-24 2019-05-03 兴唐通信科技有限公司 A kind of operator on duty's unlawful practice detection method and system
CN110321806A (en) * 2019-06-12 2019-10-11 浙江大华技术股份有限公司 Object detection method, image processing equipment and the equipment with store function
CN111010605A (en) * 2019-11-26 2020-04-14 杭州东信北邮信息技术有限公司 Method for displaying video picture-in-picture window
CN112637550A (en) * 2020-11-18 2021-04-09 合肥市卓迩无人机科技服务有限责任公司 PTZ moving target tracking method for multi-path 4K quasi-real-time spliced video
US20210125639A1 (en) * 2019-10-28 2021-04-29 Shanghai Bilibili Technology Co., Ltd. Method and system of clipping a video, computing device, and computer storage medium
CN112884811A (en) * 2021-03-18 2021-06-01 中国人民解放军国防科技大学 Photoelectric detection tracking method and system for unmanned aerial vehicle cluster
CN113612922A (en) * 2021-07-29 2021-11-05 重庆赛迪奇智人工智能科技有限公司 Video processing method and device, electronic equipment and computer readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194443A (en) * 2010-03-04 2011-09-21 腾讯科技(深圳)有限公司 Display method and system for window of video picture in picture and video processing equipment
US20180122144A1 (en) * 2015-05-13 2018-05-03 Aim Sport Vision Ag Digitally overlaying an image with another image
WO2017005096A2 (en) * 2015-07-06 2017-01-12 阿里巴巴集团控股有限公司 Method and device for encoding multiple video streams
WO2017211250A1 (en) * 2016-06-08 2017-12-14 深圳创维数字技术有限公司 Image overlay display method and system
CN109711320A (en) * 2018-12-24 2019-05-03 兴唐通信科技有限公司 A kind of operator on duty's unlawful practice detection method and system
CN110321806A (en) * 2019-06-12 2019-10-11 浙江大华技术股份有限公司 Object detection method, image processing equipment and the equipment with store function
US20210125639A1 (en) * 2019-10-28 2021-04-29 Shanghai Bilibili Technology Co., Ltd. Method and system of clipping a video, computing device, and computer storage medium
CN111010605A (en) * 2019-11-26 2020-04-14 杭州东信北邮信息技术有限公司 Method for displaying video picture-in-picture window
CN112637550A (en) * 2020-11-18 2021-04-09 合肥市卓迩无人机科技服务有限责任公司 PTZ moving target tracking method for multi-path 4K quasi-real-time spliced video
CN112884811A (en) * 2021-03-18 2021-06-01 中国人民解放军国防科技大学 Photoelectric detection tracking method and system for unmanned aerial vehicle cluster
CN113612922A (en) * 2021-07-29 2021-11-05 重庆赛迪奇智人工智能科技有限公司 Video processing method and device, electronic equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
司佳伟: "视频监控***中指针图像识别研究", 《优秀硕士论文》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550608A (en) * 2022-09-19 2022-12-30 国网智能科技股份有限公司 Multi-user high-concurrency AI video real-time fusion display control method and system
CN115988258A (en) * 2023-03-17 2023-04-18 广州佰锐网络科技有限公司 IoT (Internet of things) -based video communication method, storage medium and system

Also Published As

Publication number Publication date
CN114500871B (en) 2023-11-14

Similar Documents

Publication Publication Date Title
CN114500871A (en) Multi-channel video analysis method, equipment and medium
CN100542303C (en) A kind of method for correcting multi-viewpoint vedio color
CN104063883B (en) A kind of monitor video abstraction generating method being combined based on object and key frame
US9392322B2 (en) Method of visually synchronizing differing camera feeds with common subject
CN104539929B (en) Stereo-image coding method and code device with motion prediction
US20130279813A1 (en) Adaptive interest rate control for visual search
CN106231349B (en) Main broadcaster's class interaction platform server method for changing scenes and its device, server
US11037308B2 (en) Intelligent method for viewing surveillance videos with improved efficiency
CN106060578A (en) Producing video data
DE102020124815A1 (en) SYSTEM AND DEVICE FOR USER CONTROLLED VIRTUAL CAMERA FOR VOLUMETRIC VIDEO
CN110933461B (en) Image processing method, device, system, network equipment, terminal and storage medium
CN105745937A (en) Method and apparatus for image frame identification and video stream comparison
DE112019000271T5 (en) METHOD AND DEVICE FOR PROCESSING AND DISTRIBUTION OF LIVE VIRTUAL REALITY CONTENT
KR20100073080A (en) Method and apparatus for representing motion control camera effect based on synchronized multiple image
CN110418148B (en) Video generation method, video generation device and readable storage medium
CN106231397B (en) Main broadcaster's class interaction platform main broadcaster end method for changing scenes and its device, Zhu Boduan
EP4096227A1 (en) Coordinates as ancillary data
CN106231350B (en) Main broadcaster's class interaction platform method for changing scenes and its device
US11044399B2 (en) Video surveillance system
CN115315939A (en) Information processing apparatus, information processing method, and program
CN115174941B (en) Real-time motion performance analysis and real-time data sharing method based on multiple paths of video streams
CN114830674A (en) Transmitting apparatus and receiving apparatus
CN108900860A (en) A kind of instructor in broadcasting's control method and device
CN112836635B (en) Image processing method, device and equipment
CN114040184B (en) Image display method, system, storage medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant