CN114500871A - Multi-channel video analysis method, equipment and medium - Google Patents
Multi-channel video analysis method, equipment and medium Download PDFInfo
- Publication number
- CN114500871A CN114500871A CN202111532772.6A CN202111532772A CN114500871A CN 114500871 A CN114500871 A CN 114500871A CN 202111532772 A CN202111532772 A CN 202111532772A CN 114500871 A CN114500871 A CN 114500871A
- Authority
- CN
- China
- Prior art keywords
- video data
- real
- time video
- picture
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 21
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 21
- 238000002372 labelling Methods 0.000 claims description 15
- 230000002194 synthesizing effect Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/90—Arrangement of cameras or camera modules, e.g. multiple cameras in TV studios or sports stadiums
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23424—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25866—Management of end-user data
- H04N21/25891—Management of end-user data being end-user preferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
- H04N21/4312—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
- H04N21/4316—Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations for displaying supplemental content in a region of the screen, e.g. an advertisement in a separate window
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/485—End-user interface for client configuration
- H04N21/4858—End-user interface for client configuration for modifying screen layout parameters, e.g. fonts, size of the windows
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/265—Mixing
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Computer Graphics (AREA)
- Human Computer Interaction (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application discloses a multi-channel video analysis method, equipment and a medium. Acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user. By the method, the aim of analyzing multiple paths of videos by low-computing-power equipment at the same time is fulfilled.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a medium for multi-channel video analysis.
Background
Front-end intelligent analysis refers to performing intelligent analysis on video or image data at a front-end device (such as a network camera) so as to identify a target object. This functionality requires that the front-end equipment must have sufficient computing power, the higher the computing power the higher the hardware cost.
The multi-view camera can provide multi-path videos, and due to the fact that the equipment with the front-end intelligent analysis technology is limited in computing power, simultaneous video analysis of the multi-path videos cannot be achieved, video analysis can only be conducted on single-path videos, therefore, only target objects in one path of video pictures can be identified, and target objects in other lenses can be omitted.
If a plurality of videos are analyzed simultaneously, the calculation power of the equipment is generally required to be increased, so that the cost of the equipment is high, and the popularization and application of the equipment are influenced.
Disclosure of Invention
The embodiment of the application provides a multi-channel video analysis method, equipment and a medium, which are used for solving the following technical problems: when a plurality of paths of videos are analyzed simultaneously, the calculation power of equipment is generally required to be increased, so that the cost of the equipment is high, and the popularization and application of the equipment are influenced.
The embodiment of the application adopts the following technical scheme:
the embodiment of the application provides a multi-channel video analysis method. The method comprises the steps of obtaining first real-time video data corresponding to a plurality of lenses of a multi-view camera respectively; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
According to the embodiment of the application, picture-in-picture synthesis processing is carried out on the acquired first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.
In an implementation manner of the present application, encoding the second real-time video data with the mark to obtain the compressed video data specifically includes: transmitting YUV data corresponding to the second real-time video data with the marks to an encoder to obtain encoded H264 or H265 data; and transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.
In an implementation manner of the present application, after the YUV data corresponding to the plurality of first real-time video data are respectively transmitted to an encoder to obtain encoded H264 or H265 data, the method further includes: receiving first real-time video data requests respectively sent by different users; the first real-time video data request at least comprises any one or more paths of video data; and simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
In an implementation manner of the present application, performing picture-in-picture synthesis processing on a plurality of first real-time video data specifically includes: when the picture-in-picture synthesis mode is multi-frame side-by-side or multi-frame side-by-side, determining the size of each frame image corresponding to the first real-time video data according to the video parameters corresponding to the first real-time video data; generating a blank image; and determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.
According to the video data parameters, the size of each path of video is determined, so that the arrangement positions of multiple paths of videos are set, and the multiple paths of videos are uniformly discharged. To improve the visual effect of the generated picture-in-picture video data under the condition of side-by-side or parallel multi-frames.
In one implementation of the present application, after determining sets of coordinate positions of the plurality of first real-time video data in the blank image, the method further includes: acquiring display timestamps corresponding to a plurality of first real-time video data respectively; if the difference value between the display timestamps is smaller than a preset threshold value, synthesizing a plurality of first real-time video data into a multi-lattice spliced image in a preset splicing mode; each grid in the multi-grid spliced image corresponds to one first real-time video data.
In an implementation manner of the present application, the second real-time video data is transmitted to the preset intelligent analysis module, so as to mark a target image in the second real-time video data through the preset intelligent analysis module, which specifically includes: inputting a current frame image corresponding to the second real-time video data into a preset target identification model, and obtaining a marking frame corresponding to the target image through the preset target identification model; extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame; performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set; and obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.
According to the method and the device, the marking frame of the current frame target image is obtained through the preset target identification model, and the predicted coordinate position information of the current frame image is obtained according to the Kalman filter, so that the position tracking of the target image is realized. Under the condition that a plurality of target images appear in the same frame image, different targets can be tracked and positioned, and tracking and labeling of the plurality of target images are realized.
In an implementation manner of the present application, performing picture-in-picture synthesis processing on a plurality of first real-time video data specifically includes: when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting the motion characteristics of the target images of the current frames of the plurality of first real-time video data based on a Kalman filter to obtain the predicted coordinate position information of the target images in the next frame; and carrying out size adjustment on the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a size frame overlapping mode.
In an implementation manner of the present application, after resizing a plurality of first real-time video data to obtain picture-in-picture video data in a size-frame superposition form, the method further includes: after every preset time interval, counting the quantity of the predicted coordinate position information in the first real-time video data again; and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
According to the embodiment of the application, the number of the target images in the next frame corresponding to each path of video data can be obtained by counting the number of the predicted coordinate position information, so that each path of video data is overlapped and discharged according to the number of the target images, the video data with a large number of targets is placed on the upper layer, the shielding of the target images is reduced, and the accuracy of target image identification is improved.
The embodiment of the application provides a multichannel video analysis equipment, includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
An embodiment of the present application provides a non-volatile computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to: acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera; picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data; the second real-time video data are transmitted to a preset intelligent analysis module, so that a target image in the second real-time video data is marked through the preset intelligent analysis module; coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: according to the embodiment of the application, picture-in-picture synthesis processing is carried out on the acquired first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort. On the attached sheet
In the figure:
fig. 1 is a flowchart of a multi-channel video analysis method according to an embodiment of the present disclosure;
fig. 2 is a block diagram of a multi-channel video analysis process according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a multi-channel video analysis apparatus according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a multi-channel video analysis method, equipment and medium.
In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.
Front-end intelligent analysis refers to performing intelligent analysis on video or image data at a front-end device (such as a network camera) so as to identify a target object. This functionality requires that the front-end equipment must have sufficient computing power, the higher the computing power the higher the hardware cost.
The multi-view camera can provide multi-path videos, and due to the fact that the equipment with the front-end intelligent analysis technology is limited in computing power, simultaneous video analysis of the multi-path videos cannot be achieved, video analysis can only be conducted on single-path videos, therefore, only target objects in one path of video pictures can be identified, and target objects in other lenses can be omitted.
If a plurality of videos are analyzed simultaneously, the calculation power of the equipment is generally required to be increased, so that the cost of the equipment is high, and the popularization and application of the equipment are influenced.
In order to solve the above problem, embodiments of the present application provide a method, an apparatus, and a medium for multi-channel video analysis. Picture-in-picture synthesis processing is carried out on the obtained first real-time video data corresponding to the plurality of lenses respectively, and target analysis and labeling are carried out on the synthesized picture-in-picture video data. The synthesized video data can be used as single-channel video data, so that analysis of all videos can be realized by analyzing the single-channel video. The problem of higher cost when each way of video is analyzed independently is solved. The computational power requirement of front-end equipment is reduced, and therefore the equipment cost is reduced. The popularization and the application of the front-end analysis equipment in the production and living environment are more facilitated.
The technical solutions proposed in the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a multi-channel video analysis method according to an embodiment of the present disclosure. As shown in fig. 1, the multi-channel video analysis method includes the following steps:
s101, acquiring first real-time video data corresponding to a plurality of lenses of a multi-view camera by a multi-path video analysis device.
In an embodiment of the application, the multi-channel video analysis device obtains YUV real-time video data corresponding to a plurality of lenses of the multi-view camera through physical interfaces such as MIPI and USB.
S102, the multi-channel video analysis equipment carries out picture-in-picture synthesis processing on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data.
In an embodiment of the application, when the pip composition manner is multi-frame parallel or multi-frame parallel, the size of each frame of image corresponding to the plurality of first real-time video data is determined according to the video parameters corresponding to the plurality of first real-time video data, respectively. A blank image is generated. And determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.
Specifically, the picture-in-picture combining manner in the embodiment of the present application includes at least two, in which multiple frames are placed side by side or in parallel and large and small frames are placed in an overlapping manner. But is not limited to the two picture-in-picture combining methods. When the picture-in-picture synthesis mode is that multiple frames are arranged side by side or in parallel, the positions of the videos are sequenced according to the image size corresponding to each first real-time video data and the number of the first real-time video data, so that the multiple first real-time videos are uniformly distributed, the visual effect of the picture-in-picture videos is enhanced, the shielding among the videos is reduced, and the accuracy of target image identification is improved.
Further, according to the parameters corresponding to the first real-time video data, the image size corresponding to each video and the number of the first real-time video data can be obtained. The sizes of the first real-time video data of the channels may be the same or different. The sequence number of each path of first real-time video data is labeled in advance, and for example, the sequence number labeling can be performed according to the position of a camera.
Furthermore, the multichannel video analysis equipment generates blank images, determines the size of the blank images, and determines the size of each channel of first real-time video data in a picture-in-picture according to the size of the blank images and the number of the first real-time video data. And carrying out position distribution on the blank image according to the size so as to obtain the coordinate position of each first real-time video data in the blank image. And adjusting the size of the video data according to the size of the video data in the picture-in-picture. And sequentially arranging the plurality of first real-time video data according to the adjusted sizes and the marked serial numbers so as to realize picture-in-picture video data in a multi-frame side-by-side or multi-frame side-by-side form.
In an embodiment of the present application, display timestamps corresponding to a plurality of first real-time video data are obtained. And if the difference value between the display timestamps is smaller than a preset threshold value, synthesizing the first real-time video data into a multi-lattice spliced image in a preset splicing mode. Each grid in the multi-grid spliced image corresponds to one first real-time video data.
Specifically, since the plurality of shots separately and individually perform video shooting, after receiving the plurality of paths of first real-time video data, in order to ensure time synchronization of the plurality of paths of video in the combined picture-in-picture video data, time stamps of the plurality of paths of first real-time video data need to be compared. And under the condition that the difference value of the timestamps of the multiple paths of first real-time video data is smaller than a preset threshold value, synthesizing the multiple paths of first real-time video data into a multi-lattice splicing map.
For example, the preset threshold may be set to 0.1S, and in the case that the timestamps of the multiple paths of first real-time video data are less than 0.1S, the received multiple paths of first real-time video data are combined in picture.
It should be noted that, in the embodiment of the present application, the preset threshold is preferably set to 0.1S, and the preset threshold may be subjected to data adjustment according to actual situations in application.
In an embodiment of the application, when the current picture-in-picture synthesis manner is a large-small frame superposition manner, motion characteristics of target images of a plurality of current frames of first real-time video data are extracted based on a kalman filter, so as to obtain predicted coordinate position information of the target images in a next frame. And adjusting the sizes of the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a large-frame and small-frame overlapping mode.
Specifically, when the current picture-in-picture synthesizing manner is a large-and-small frame superimposing manner, the picture-in-picture synthesizing effect is to superimpose a video frame with a smaller size onto a video frame with a larger size. At this time, a part of the target object may be occluded. Therefore, in order to reduce the occluded target image as much as possible, the kalman filter may be used to extract the motion features of the target image of the current frame of the first real-time video data so as to predict the position of the target image in the next frame. In the same frame image, there may be one or more target images. Therefore, the plurality of first real-time video data can be overlapped and discharged according to the predicted number of the target images.
In an embodiment of the present application, after every preset time interval, the number of predicted coordinate position information in the first real-time video data is counted again. And updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
Specifically, the video images captured by the respective image capturing apparatuses are transformed in real time, and the number of target images thereof is also constantly updated, so that, in order to reduce the number of target images that are occluded when overlapped and laid out, the number of target images in the respective first live video data may be newly predicted at intervals, so that the stacking order of the videos may be newly changed according to the newly predicted number of target images. Therefore, the video image with a larger target number is placed at the uppermost layer of the picture-in-picture video data, and the video image with a minimum target image number is placed at the lowermost layer of the picture-in-picture video data.
In an embodiment of the application, after the number of the predicted coordinate position information corresponding to the next frame of image is obtained, the plurality of first real-time video data are overlapped and arranged from top to bottom according to the number of the predicted coordinate position information. And taking the area except the predicted coordinate position information in the next frame image as a background area. And carrying out size adjustment on the plurality of first real-time video data to enable the previous layer of video images to be stacked in the background area corresponding to the next layer of video images. Wherein the size of the next layer video image is larger than that of the previous layer video image. Thereby minimizing occlusion of the target image in the underlying video data.
According to the embodiment of the application, the number of the target images in the next frame corresponding to each path of video data can be obtained by counting the number of the predicted coordinate position information, so that each path of video data is overlapped and discharged according to the number of the target images, the video data with a large number of targets is placed on the upper layer, the shielding of the target images is reduced, and the accuracy of target image identification is improved.
S103, the multi-channel video analysis equipment transmits the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module.
In an embodiment of the application, a current frame image corresponding to the second real-time video data is input into a preset target recognition model, and a labeling frame corresponding to the target image is obtained through the preset target recognition model. And extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame. And performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set. And obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.
Specifically, the neural network model is trained according to a sample set corresponding to pre-collected second real-time video data to obtain a preset target recognition model. The preset target identification model can identify and label a target image in the input second real-time video data. And inputting the current frame image corresponding to the current second real-time video data into a preset target identification model to obtain the position of the marking frame corresponding to the target image. The target is tracked according to Kalman filtering, namely a Kalman filter is used for predicting the motion state of the target. And predicting the position of each target in the current frame image according to the position of the labeling frame corresponding to each target in the previous frame image, namely predicting the position information of each target in the current frame. For example, a standard Kalman filter based on a constant velocity model and a linear observation model may be used to predict the motion state of each target, and obtain the predicted position of each target in the current frame.
Further, the cosine distance between the features of each target image and the depth features in the stored depth feature set is calculated, and a cosine distance matrix is generated. And calculating the Mahalanobis distance between the predicted position of each target image in the current frame and the corresponding detection frame position. And preprocessing the cosine distance matrix, wherein the elements of which the Mahalanobis distance is greater than a first preset threshold value in the cosine distance matrix are set to be infinite, and the elements of which the cosine distance is greater than a second preset threshold value in the cosine distance matrix are set to be a larger value. The first preset threshold and the second preset threshold can be set according to different scenes. And based on the preprocessed cosine distance matrix, performing first matching on the labeling boxes corresponding to the target images and the predicted coordinate position information by using a Hungarian algorithm, so as to obtain similarity scores between the labeling boxes corresponding to the target images and the predicted coordinate position information, and obtain a matching set. And obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track. And meanwhile, the identification result can be uploaded to the client platform.
According to the method and the device, the marking frame of the current frame target image is obtained through the preset target identification model, and the predicted coordinate position information of the current frame image is obtained according to the Kalman filter, so that the position tracking of the target image is realized. Under the condition that a plurality of target images appear in the same frame image, different targets can be tracked and positioned, and tracking and labeling of the plurality of target images are realized.
And S104, the multi-channel video analysis equipment encodes the second real-time video data with the marks to obtain compressed video data.
In an embodiment of the present application, YUV data corresponding to the marked second real-time video data is sent to an encoder to obtain encoded H264 or H265 data. And transmitting the YUV data corresponding to the first real-time video data to an encoder to obtain encoded H264 or H265 data.
Specifically, the multiple paths of single-lens video YUV data are continuously transmitted to the encoder, and encoded H264 or H265 data are continuously acquired. And continuously transmitting the synthesized picture-in-picture YUV data to an encoder, and continuously acquiring encoded H264 or H265 data.
And S105, receiving a second real-time video data request sent by the user, and pushing the compressed video data to the user.
In one embodiment of the present application, first real-time video data requests respectively transmitted by different users are received. The first real-time video data request at least comprises any one or more paths of video data. And simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
Specifically, the embodiment of the application supports multiple clients to request different video streams simultaneously. Any one or more videos requested by the client are determined based on parameters specified in the request, via a streaming media transport protocol, such as: RTP or RTCP protocol, which pushes real-time video data specified by the client to the client at the same time. Since the picture-in-picture effect is served by a single video channel, the client can also request the video data channel independently, i.e. the picture-in-picture effect can be seen.
Fig. 2 is a block diagram of a multi-channel video analysis process according to an embodiment of the present disclosure. As shown in fig. 2, the multi-channel video analysis process includes:
in an embodiment of the present application, YUN data corresponding to each lens in the multi-view camera is obtained, and the obtained multiple paths of YUN data are combined into a new path of YUN data, that is, picture-in-picture video data combination is performed. And coding YUN data corresponding to each lens in the multi-view camera to generate H264 or H265 data corresponding to each path of video.
In one embodiment of the present application, the synthesized picture-in-picture video data is conveyed to an intelligent analysis module for analysis of a target image in the picture-in-picture video data by the intelligent analysis module. At this time, the recognition result can be directly uploaded to the user platform. And marking the identified target object according to the analysis result to form YUN data with a marking frame or a mark. And coding the YUN data with the mark box or mark to form H264 or H265 data corresponding to the picture-in-picture video data.
In one embodiment of the present application, one or more video data may be simultaneously transmitted to respective clients upon request by the clients.
Fig. 3 is a schematic structural diagram of a multi-channel video analysis apparatus according to an embodiment of the present application. As shown in fig. 3, the multi-channel video analysis apparatus includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data;
and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
An embodiment of the present application provides a non-volatile computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device, and non-volatile computer storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to the partial description of the method embodiments for relevant points.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the embodiments of the present application pertain. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the claims of the present application.
Claims (10)
1. A method for multi-channel video analysis, the method comprising:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data;
and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
2. The method according to claim 1, wherein said encoding the second real-time video data with the flag to obtain compressed video data comprises:
transmitting the YUV data corresponding to the second real-time video data with the marks to an encoder to obtain encoded H264 or H265 data; and
and transmitting the YUV data corresponding to the plurality of first real-time video data to an encoder to obtain encoded H264 or H265 data.
3. The method of claim 2, wherein after the YUV data corresponding to the first real-time video data are respectively transmitted to the encoder to obtain encoded H264 or H265 data, the method further comprises:
receiving the first real-time video data requests respectively sent by different users; the first real-time video data request at least comprises any one or more paths of video data;
and simultaneously sending the one or more paths of videos corresponding to the first real-time video data request to corresponding users through a streaming media transmission protocol.
4. The method according to claim 1, wherein the step of picture-in-picture synthesizing the plurality of first real-time video data comprises:
when the picture-in-picture synthesis mode is multi-frame side-by-side or multi-frame side-by-side, determining the size of each frame of image corresponding to the first real-time video data according to the video parameters corresponding to the first real-time video data;
generating a blank image;
and determining a coordinate position set of the plurality of first real-time video data in the blank image according to the size of the blank image, the number of the first real-time video data and the size of each frame of image.
5. The method of claim 4, wherein after determining the set of coordinate locations of the first real-time video data in the blank image, the method further comprises:
acquiring display time stamps corresponding to the plurality of first real-time video data respectively;
if the difference value between the display timestamps is smaller than a preset threshold value, combining the first real-time video data into a multi-lattice spliced image in a preset splicing mode; wherein each cell in the multi-cell stitched image corresponds to one of the first real-time video data.
6. The method according to claim 1, wherein the step of sending the second real-time video data to a preset intelligent analysis module to mark a target image in the second real-time video data by the preset intelligent analysis module comprises:
inputting a current frame image corresponding to the second real-time video data into a preset target identification model, and obtaining a labeling frame corresponding to a target image through the preset target identification model;
extracting the motion characteristics of the target image of the previous frame through a Kalman filter to obtain the predicted coordinate position information of the target image in the current frame;
performing Hungary calculation based on the marking frame and the predicted coordinate position information to perform cascade matching on the marking frame and the predicted coordinate position information to obtain a matching set;
and obtaining a motion track of the target image according to the matching set, and tracking and labeling the target image in the second real-time video data through the motion track.
7. The method according to claim 1, wherein the step of picture-in-picture synthesizing the plurality of first real-time video data comprises:
when the current picture-in-picture synthesis mode is a large and small frame superposition mode, extracting the motion characteristics of the target images of the current frames of the plurality of first real-time video data based on a Kalman filter to obtain the predicted coordinate position information of the target images in the next frame;
and adjusting the sizes of the plurality of first real-time video data based on the predicted coordinate position information of the target image in the next frame to obtain picture-in-picture video data in a large-frame and small-frame overlapping mode.
8. The method of claim 7, wherein after resizing the first real-time video data to obtain the pip video data in a size-box-overlapped form, the method further comprises:
after every preset time interval, counting the number of the predicted coordinate position information in the first real-time video data again;
and updating the stacking sequence of the plurality of first real-time video data again according to the counted number of the predicted coordinate position information.
9. A multi-channel video analysis apparatus comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data;
and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
10. A non-transitory computer storage medium storing computer-executable instructions configured to:
acquiring first real-time video data respectively corresponding to a plurality of lenses of a multi-view camera;
picture-in-picture synthesis processing is carried out on the plurality of first real-time video data, and the synthesized picture-in-picture video data is used as second real-time video data;
transmitting the second real-time video data to a preset intelligent analysis module so as to mark a target image in the second real-time video data through the preset intelligent analysis module;
coding the second real-time video data with the marks to obtain compressed video data; and receiving a second real-time video data request sent by a user, and pushing the compressed video data to the user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111532772.6A CN114500871B (en) | 2021-12-15 | 2021-12-15 | Multipath video analysis method, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111532772.6A CN114500871B (en) | 2021-12-15 | 2021-12-15 | Multipath video analysis method, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114500871A true CN114500871A (en) | 2022-05-13 |
CN114500871B CN114500871B (en) | 2023-11-14 |
Family
ID=81493135
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111532772.6A Active CN114500871B (en) | 2021-12-15 | 2021-12-15 | Multipath video analysis method, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114500871B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115550608A (en) * | 2022-09-19 | 2022-12-30 | 国网智能科技股份有限公司 | Multi-user high-concurrency AI video real-time fusion display control method and system |
CN115988258A (en) * | 2023-03-17 | 2023-04-18 | 广州佰锐网络科技有限公司 | IoT (Internet of things) -based video communication method, storage medium and system |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102194443A (en) * | 2010-03-04 | 2011-09-21 | 腾讯科技(深圳)有限公司 | Display method and system for window of video picture in picture and video processing equipment |
WO2017005096A2 (en) * | 2015-07-06 | 2017-01-12 | 阿里巴巴集团控股有限公司 | Method and device for encoding multiple video streams |
WO2017211250A1 (en) * | 2016-06-08 | 2017-12-14 | 深圳创维数字技术有限公司 | Image overlay display method and system |
US20180122144A1 (en) * | 2015-05-13 | 2018-05-03 | Aim Sport Vision Ag | Digitally overlaying an image with another image |
CN109711320A (en) * | 2018-12-24 | 2019-05-03 | 兴唐通信科技有限公司 | A kind of operator on duty's unlawful practice detection method and system |
CN110321806A (en) * | 2019-06-12 | 2019-10-11 | 浙江大华技术股份有限公司 | Object detection method, image processing equipment and the equipment with store function |
CN111010605A (en) * | 2019-11-26 | 2020-04-14 | 杭州东信北邮信息技术有限公司 | Method for displaying video picture-in-picture window |
CN112637550A (en) * | 2020-11-18 | 2021-04-09 | 合肥市卓迩无人机科技服务有限责任公司 | PTZ moving target tracking method for multi-path 4K quasi-real-time spliced video |
US20210125639A1 (en) * | 2019-10-28 | 2021-04-29 | Shanghai Bilibili Technology Co., Ltd. | Method and system of clipping a video, computing device, and computer storage medium |
CN112884811A (en) * | 2021-03-18 | 2021-06-01 | 中国人民解放军国防科技大学 | Photoelectric detection tracking method and system for unmanned aerial vehicle cluster |
CN113612922A (en) * | 2021-07-29 | 2021-11-05 | 重庆赛迪奇智人工智能科技有限公司 | Video processing method and device, electronic equipment and computer readable storage medium |
-
2021
- 2021-12-15 CN CN202111532772.6A patent/CN114500871B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102194443A (en) * | 2010-03-04 | 2011-09-21 | 腾讯科技(深圳)有限公司 | Display method and system for window of video picture in picture and video processing equipment |
US20180122144A1 (en) * | 2015-05-13 | 2018-05-03 | Aim Sport Vision Ag | Digitally overlaying an image with another image |
WO2017005096A2 (en) * | 2015-07-06 | 2017-01-12 | 阿里巴巴集团控股有限公司 | Method and device for encoding multiple video streams |
WO2017211250A1 (en) * | 2016-06-08 | 2017-12-14 | 深圳创维数字技术有限公司 | Image overlay display method and system |
CN109711320A (en) * | 2018-12-24 | 2019-05-03 | 兴唐通信科技有限公司 | A kind of operator on duty's unlawful practice detection method and system |
CN110321806A (en) * | 2019-06-12 | 2019-10-11 | 浙江大华技术股份有限公司 | Object detection method, image processing equipment and the equipment with store function |
US20210125639A1 (en) * | 2019-10-28 | 2021-04-29 | Shanghai Bilibili Technology Co., Ltd. | Method and system of clipping a video, computing device, and computer storage medium |
CN111010605A (en) * | 2019-11-26 | 2020-04-14 | 杭州东信北邮信息技术有限公司 | Method for displaying video picture-in-picture window |
CN112637550A (en) * | 2020-11-18 | 2021-04-09 | 合肥市卓迩无人机科技服务有限责任公司 | PTZ moving target tracking method for multi-path 4K quasi-real-time spliced video |
CN112884811A (en) * | 2021-03-18 | 2021-06-01 | 中国人民解放军国防科技大学 | Photoelectric detection tracking method and system for unmanned aerial vehicle cluster |
CN113612922A (en) * | 2021-07-29 | 2021-11-05 | 重庆赛迪奇智人工智能科技有限公司 | Video processing method and device, electronic equipment and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
司佳伟: "视频监控***中指针图像识别研究", 《优秀硕士论文》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115550608A (en) * | 2022-09-19 | 2022-12-30 | 国网智能科技股份有限公司 | Multi-user high-concurrency AI video real-time fusion display control method and system |
CN115988258A (en) * | 2023-03-17 | 2023-04-18 | 广州佰锐网络科技有限公司 | IoT (Internet of things) -based video communication method, storage medium and system |
Also Published As
Publication number | Publication date |
---|---|
CN114500871B (en) | 2023-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114500871A (en) | Multi-channel video analysis method, equipment and medium | |
CN100542303C (en) | A kind of method for correcting multi-viewpoint vedio color | |
CN104063883B (en) | A kind of monitor video abstraction generating method being combined based on object and key frame | |
US9392322B2 (en) | Method of visually synchronizing differing camera feeds with common subject | |
CN104539929B (en) | Stereo-image coding method and code device with motion prediction | |
US20130279813A1 (en) | Adaptive interest rate control for visual search | |
CN106231349B (en) | Main broadcaster's class interaction platform server method for changing scenes and its device, server | |
US11037308B2 (en) | Intelligent method for viewing surveillance videos with improved efficiency | |
CN106060578A (en) | Producing video data | |
DE102020124815A1 (en) | SYSTEM AND DEVICE FOR USER CONTROLLED VIRTUAL CAMERA FOR VOLUMETRIC VIDEO | |
CN110933461B (en) | Image processing method, device, system, network equipment, terminal and storage medium | |
CN105745937A (en) | Method and apparatus for image frame identification and video stream comparison | |
DE112019000271T5 (en) | METHOD AND DEVICE FOR PROCESSING AND DISTRIBUTION OF LIVE VIRTUAL REALITY CONTENT | |
KR20100073080A (en) | Method and apparatus for representing motion control camera effect based on synchronized multiple image | |
CN110418148B (en) | Video generation method, video generation device and readable storage medium | |
CN106231397B (en) | Main broadcaster's class interaction platform main broadcaster end method for changing scenes and its device, Zhu Boduan | |
EP4096227A1 (en) | Coordinates as ancillary data | |
CN106231350B (en) | Main broadcaster's class interaction platform method for changing scenes and its device | |
US11044399B2 (en) | Video surveillance system | |
CN115315939A (en) | Information processing apparatus, information processing method, and program | |
CN115174941B (en) | Real-time motion performance analysis and real-time data sharing method based on multiple paths of video streams | |
CN114830674A (en) | Transmitting apparatus and receiving apparatus | |
CN108900860A (en) | A kind of instructor in broadcasting's control method and device | |
CN112836635B (en) | Image processing method, device and equipment | |
CN114040184B (en) | Image display method, system, storage medium and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |