CN113794756B

CN113794756B - Multi-video-stream unloading method and system supporting mobile equipment

Info

Publication number: CN113794756B
Application number: CN202110985759.XA
Authority: CN
Inventors: 乔秀全; 黄亚坤; 商彦磊; ***
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: China Unicom Online Information Technology Co Ltd
Priority date: 2021-08-26
Filing date: 2021-08-26
Publication date: 2022-09-06
Anticipated expiration: 2041-08-26
Also published as: CN113794756A

Abstract

The invention discloses a multi-video stream unloading method supporting mobile equipment, which comprises the steps of firstly providing a light weight compression model matched with a deep learning model for video analysis; performing fusion preliminary analysis on the multiple video streams according to the timestamp and the video streams; performing real-time frame unloading scheduling after the preliminary fusion analysis is completed, and making a decision on the fused video frame according to the current context information; analyzing the running of video frames, and running a response deep learning analysis model on the edge server and the edge device on the terminal device according to the result of real-time frame unloading scheduling; and after the operation analysis of the video frame is finished, the video frame is operated on the terminal equipment in a single-thread mode in real time to carry out abnormity detection and abnormity recovery, and the abnormity is fed back in time. The invention realizes the accelerated reasoning calculation on different computing equipment according to different available resource states, thereby improving the real-time analysis efficiency of the multi-video stream.

Description

Multi-video-stream unloading method and system supporting mobile equipment

Technical Field

The invention relates to the technical field of multimedia and video analysis, in particular to a method and a system for unloading multiple video streams supporting mobile equipment.

Background

With the rapid development of chip technology and mobile device manufacturing, the need to equip a plurality of cameras with different functions on a mobile device and to implement deep visual analysis becomes more and more urgent. Enabling multi-stream analysis on a mobile device may process multiple cameras simultaneously to improve perception of complex scenes. Furthermore, current advanced mobile device interactions require efficient multi-video stream processing capabilities. One typical scenario is mobile augmented reality glasses that use multiple cameras to perceive the real-world environment while recognizing the user's gestures to provide better interaction. Another example is a drone, which is typically equipped with more than four cameras to obtain multi-angle video streams for complex task analysis. However, implementing multi-stream processing in a simple multi-threaded manner is challenging because of the contradiction between intensive computation and limited resources in native execution, while offloading high transmission energy consumption, severe bandwidth, and cloud resource dependencies in the scheme. The method has the characteristics and innovation that a new mobile depth visual analysis framework is designed to support multi-stream analysis on mobile equipment. The system utilizes efficient and uniform scheduling by mixing various streams into a mixing processing pipeline, applies key contexts such as bandwidth, frame content and the like to online video frame unloading scheduling, and optimizes the analysis and calculation problem of the multiple video streams on the cooperative equipment.

Disclosure of Invention

In view of the above technical problems in the related art, the present invention provides a method and a system for unloading multiple video streams supporting a mobile device, which can overcome the above disadvantages of the prior art methods.

In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:

a method for supporting multiple video streams offload for a mobile device, comprising the steps of:

s1, firstly, providing a platform-aware lightweight multi-video-stream analysis model by a video provider in an off-line stage, and providing a matched lightweight compression model for a deep learning model for video analysis according to the computing power of equipment used by a user side;

s2, fusing and primarily analyzing the video streams according to the time stamps and the marks of the video streams, and taking the analysis result as the input of unloading and scheduling of the subsequent real-time frames;

s3, when the terminal device completes the preliminary analysis and fusion of a plurality of video streams, the real-time frame unloading scheduling is carried out, and the decision is carried out on the fused video frame according to the current context information;

s4, analyzing the operation of the video frame, and according to the real-time frame unloading and dispatching result, operating a response deep learning analysis model on a responding terminal device including a cooperative edge server and an edge device;

and S5, after the operation analysis of the video frame is completed, the video frame is operated on the terminal equipment of the user in a single thread mode in real time to carry out anomaly detection and anomaly recovery, so that the real-time network state and the connection state of the cooperative equipment and other anomalies of cooperative calculation are detected and monitored, and the anomalies are fed back in time.

Further, in step S1, the lightweight multiple-video-stream deep analysis model is based on a teacher network and a student network, and different video-stream analysis tasks are expressed by learning characteristics in the teacher network, so as to improve the progress of the student network adapted to the terminal size, thereby generating an analysis model for online stage deployment inference.

Further, the lightweight multi-video stream depth analysis model adopts a high-precision target detection network of VGG16-SSD as a teacher network, extracts features from the teacher network and transmits the features to a lightweight student network; the student network adopts a lightweight MobileNet-SSD network, and provides a deep learning visual processing model facing different computational power matched with terminal equipment by controlling the number of network layers.

Further, in step S2, when performing preliminary analysis on the multiple video streams, first calculating the one-dimensional picture information entropy at the time t of the multi-stream video frame after fusion in real time, and providing the two-dimensional information entropy at the time t and the two-dimensional information entropy at the time t-1 as input to the real-time offload scheduling calculation of the two-dimensional information entropy at the time t, and updating the cached two-dimensional information entropy at the time t-1.

Further, in step S4, performing real-time offload scheduling, first, a random forest decision algorithm for online scheduling needs to be established and trained offline, and context information, such as bandwidth between the terminal device and the cooperative computing device, real-time available resource utilization rate of the device, time, location information of the device, and information entropy of video frames, is used as a key feature training decision.

Further, the video frame operation analysis mainly comprises the steps of receiving the video frames needing to be processed in real time from the task initiating device according to the calculation result of the real-time unloading scheduling on different computing devices, and loading the pre-deployed deep learning analysis model according to the calculation capacity and the state of the current device to carry out reasoning calculation.

Further, if the current video frame is dispatched to the local task initiating equipment or the cooperative terminal equipment with weaker computing power, the inference is accelerated in a single-thread batch processing mode; if the current video frame is scheduled on a computationally intensive edge server, the computation of the video frame is accelerated in a multi-threaded batch process.

Further, in step S5, the abnormality detection and monitoring is mainly performed by registering and monitoring the real-time resource states and network bandwidth information of the terminal device and the cooperative device through a monitoring thread running on the terminal device in real time and maintaining a real-time state table, and when the terminal device monitors an abnormal phenomenon, the current task is processed and the state table is updated in time by waking up the recovery thread.

According to another aspect of the present invention, there is provided a multi-video stream offload system supporting a mobile device, comprising a lightweight video analysis deep learning model generation module, a multi-stream video frame pre-analysis module at an online stage, a real-time frame offload scheduling module, a video frame operation analysis module, and an anomaly detection and recovery module, wherein,

the lightweight video analysis deep learning model generation module runs on cloud resources provided with GPU computing resources and is a lightweight compression model matched with a deep learning model for analyzing videos according to the computing capacity of equipment used by a user side;

the multi-stream video frame pre-analysis module in the online stage is used for carrying out preliminary real-time analysis on the contents of a plurality of video streams, namely fusing the plurality of streams according to the time stamps and the marks of the video streams and taking the analysis result as the input of the real-time frame unloading scheduling module;

the real-time frame unloading scheduling module is used for deciding the fused video frame according to the current context information, and the decision result is used for analyzing the frame content of the current video frame on a local terminal or a cooperative terminal device;

the video frame operation analysis module is mainly used for operating a response deep learning analysis model on a responding device and a cooperative edge server according to a real-time frame unloading scheduling result, reasoning on a terminal device and a low-calculation cooperative device by using a single-thread batch processing technology, and accelerating on a high-calculation edge server by using a multi-thread technology;

the abnormity detection and recovery module mainly runs on the terminal equipment of a user in a single-thread mode in real time, monitors the real-time network state, the connection state of the cooperative equipment and the abnormity of other cooperative calculation, feeds back the abnormity in time, and provides the recovery of the cooperative calculation for the module recovered in the abnormal state again.

The invention has the beneficial effects that: the accelerated reasoning calculation is realized on different computing equipment according to different available resource states, so that the real-time analysis efficiency of the multiple video streams is improved, the deep visual analysis service of the multiple video streams is effectively supported to be realized on the mobile equipment at the same time, and a set of feasible solution is provided for the high-quality multimedia video stream analysis service of the mobile equipment; and the real-time multi-stream unloading mechanism based on the frame fusion technology is realized on the ubiquitous mobile equipment, the high-quality multi-video stream analysis service is realized, and the communication and calculation cost of the system is effectively reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a multi-video stream offload analysis system for supporting a multi-video stream offload method and system of a mobile device according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a lightweight video stream analysis deep learning model compression for supporting a multi-video stream offloading method and system for a mobile device according to an embodiment of the present invention.

Fig. 3 is a schematic view of a video frame pre-analysis module of the method and system for supporting multi-video stream offloading of a mobile device according to an embodiment of the invention.

Fig. 4 is a schematic diagram of a real-time video frame unloading scheduling module of the method and system for supporting multi-video stream unloading of a mobile device according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of an abnormal situation detection and recovery module of the method and system for supporting multi-video stream offloading of a mobile device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention belong to the protection scope of the present invention, and for the convenience of understanding the technical solutions of the present invention, the technical solutions of the present invention are described in detail below through specific use modes.

As shown in fig. 1, the method and system for unloading multiple video streams supporting a mobile device according to the embodiment of the present invention include an offline lightweight video analysis deep learning model generation module, an online multi-stream video frame pre-analysis module, a real-time frame unloading scheduling module, a video frame operation analysis module, and an anomaly detection and recovery module.

The video service provider provides a platform-aware lightweight multi-video-stream analysis model generation module in an offline stage, the module runs on cloud resources equipped with GPU computing resources, and a lightweight compression model matched with a deep learning model for video analysis is provided according to the computing capacity of equipment used by a user side.

As shown in fig. 2, the offline model generation module provides a lightweight deep learning model compression method for pervasive video stream target detection based on knowledge distillation technology, and a high-precision target detection network of VGG16-SSD is used as a teacher-aware network, and feature representation is extracted from the teacher-aware network and transmitted to a lightweight student network; the student network adopts a lightweight MobileNet-SSD network, the parameters of the backbone network model are few, the reasoning speed is high, and the student network provides a deep learning visual processing model which is matched with terminal equipment for different computing power by controlling the number of network layers. The key loss functions used in the compression training process comprise the loss of training student networks, the loss of regression and a Hint constraint function, so that the student networks can be converged to the minimum value more easily; based on the teacher network and the student network defined in the previous steps, different video stream analysis tasks are expressed through learning features in the teacher network, and the precision of the student network adapting to the size of the terminal is improved, so that a lightweight deep visual analysis model for a plurality of video stream analyses for online stage deployment and reasoning is generated.

When a user opens an application related to multi-video stream processing on a mobile device (for example, a terminal device needs to detect and identify various objects in a video stream acquired by a camera), gesture actions appearing in the video stream need to be identified, and real-time analysis of a plurality of video stream tasks is involved; therefore, the multi-stream video frame pre-analysis in the online stage is started to perform preliminary analysis on the content of the multiple video streams, that is, the multiple streams are fused according to the timestamp and the video stream identifier, and the analysis result is used as the input of the real-time frame unloading scheduling module.

As shown in fig. 3, the video frame pre-analysis module first calculates the one-dimensional picture information entropy of the merged multi-stream video frame at the time t in real time, and provides the one-dimensional information entropy at the time t and the two-dimensional information entropy at the time t-1 as input to the real-time offload scheduling module; then, the module calculates the two-dimensional information entropy at the time t and updates the cached two-dimensional information entropy at the time t-1, and a video frame one-dimensional entropy calculation formula used in the multi-stream video frame pre-analysis module at the online stage is as follows:

meanwhile, the calculation mode for calculating the two-dimensional entropy of the video frame is as follows:

after the terminal device completes the preliminary analysis and fusion of the multiple video streams, the real-time frame unloading scheduling module makes a decision on the fused video frames according to the current context information (network bandwidth, time, position and content information contained in the frames). The decision result is that the current video frame is analyzed on the local terminal or the cooperative terminal equipment, so that the minimum time delay and mobile energy consumption experience are obtained, and the requirement of video analysis on precision can be met.

As shown in fig. 4, the real-time offload scheduling module first establishes and off-line trains a random forest decision algorithm for online scheduling, and the specific process includes training context information such as bandwidth between the terminal device and the cooperative computing device, real-time available resource utilization rate of the device, time, device location information, and information entropy of video frames as key features.

In order to train the video frame unloading scheduling module, a design sample is defined to represent the characteristics, and the current video frame is unloaded to the local equipment, the cooperative equipment and the edge server respectively to complete cooperative calculation. The algorithm used for training is a classification fusion algorithm based on decision trees, a plurality of decision trees are established through sampling to carry out training prediction, and the average result of prediction is taken as the final result. In the training process, a metric is defined to measure the goodness of the decision classification according to the following calculation mode:

and in the online operation stage, the real-time unloading module operated on the terminal equipment executes a trained random forest decision algorithm according to the current input of the video frame, and schedules the current video to corresponding computing equipment according to the result.

The video frame operation analysis module mainly comprises a cooperative edge server and a deep learning analysis model for operating response on the edge device on the responding device according to the result of real-time frame unloading scheduling; in addition, single-threaded batch processing technology is used for reasoning on the terminal equipment and the low-computing-power cooperative equipment, and multithreading technology is used for accelerating on the computing-power edge server.

The online video frame operation analysis module mainly operates on different computing devices, receives a video frame needing real-time processing from the task initiating device according to a computing result of the real-time unloading scheduling module, and loads a pre-deployed deep learning analysis model according to the computing capability and the state of the current device to perform inference calculation. If the current video frame is scheduled to the cooperative terminal equipment with local or weak computing power of the task initiating equipment, the inference is accelerated in a single-thread batch processing mode, and a large amount of computing resources of other service processes are prevented from being occupied. If the current video frame is scheduled to a computationally intensive edge server, then the analysis calculations for the video frame are accelerated in a multi-threaded batch process.

The acceleration sub-process of batch processing mainly comprises video frame acquisition, pre-analysis, real-time decision scheduling, model inference analysis, frame coding, transmission and result rendering. Especially in the case of sufficient computing power of the edge server, the subtasks can be accelerated in a batch mode by using a plurality of threads, and the different threads execute in parallel.

As shown in fig. 5, the anomaly detection and recovery module mainly runs on the terminal device of the user in a single-threaded manner in real time, and a main function room of the module monitors a real-time network state and a connection state of the cooperative device, and other anomalies of cooperative computation, and feeds back the anomalies in time; in addition, the recovery of the collaborative computation is provided for the module which recovers in the abnormal state, so that the stability and reliability of the multi-stream video analysis collaborative computation are ensured. The abnormity monitoring module registers and monitors information such as real-time resource states and network bandwidth of the terminal equipment and the cooperative equipment through a monitoring thread running on the terminal equipment in real time and maintaining a real-time state table, and processes a current task and updates the state table in time by waking up a recovery thread once an abnormity phenomenon is monitored.

In summary, by means of the above technical solutions of the present invention, accelerated inference calculation is implemented on different computing devices according to different available resource states, so as to improve real-time analysis efficiency of multiple video streams, effectively support the implementation of deep visual analysis services of multiple video streams on a mobile device, and provide a set of feasible solutions for high-quality multimedia video stream analysis services of the mobile device; and the real-time multi-stream unloading mechanism based on the frame fusion technology is realized on the ubiquitous mobile equipment, the high-quality multi-video stream analysis service is realized, and the communication and calculation cost of the system is effectively reduced.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for supporting multiple video streams offload for a mobile device, comprising the steps of:

s2, fusing and primarily analyzing the video streams according to the timestamps and the marks of the video streams, and taking the analysis result as the input of unloading and scheduling of the subsequent real-time frames;

s4, then, analyzing the operation of the video frame, and according to the result of the real-time frame unloading scheduling, operating a responding deep learning analysis model on the responding terminal device including the edge server and the edge device which are cooperated;

2. The method for multi-video streaming offload support for mobile devices of claim 1, wherein in step S1, the lightweight multi-video streaming depth analysis model is based on a teacher network and a student network, and different video streaming analysis tasks are expressed by learning features in the teacher network to improve the progress of the student network adapted to the terminal size, thereby generating an analysis model for online stage deployment inference.

3. The method of multi-video streaming offload for mobile devices of claim 2, wherein the lightweight multi-video streaming depth analysis model employs a high precision target detection network of VGG16-SSD as an instructor network, from which features are extracted and passed to a lightweight student network; the student network adopts a lightweight MobileNet-SSD network, and provides a deep learning visual processing model facing different computational power matched with terminal equipment by controlling the number of network layers.

4. The method according to claim 1, wherein in step S2, when performing preliminary analysis on multiple video streams, first calculating the one-dimensional picture information entropy at time t of the multi-stream video frames after merging in real time, and providing the two-dimensional information entropy at time t and the two-dimensional information entropy at time t-1 as input to a real-time offload schedule to calculate the two-dimensional information entropy at time t, and updating the two-dimensional information entropy at time t-1 of the cache.

5. The method for multi-video streaming offload supporting mobile devices of claim 1, wherein in step S4, performing real-time offload scheduling requires first establishing and offline training a random forest decision algorithm for online scheduling, and using context information, such as bandwidth between the terminal device and the cooperating computing device, real-time available resource utilization of the device, time, device location information, and entropy of video frames, as key feature training decisions.

6. The method of claim 1, wherein the video frame operation analysis is performed by receiving a video frame to be processed in real time from the task initiating device according to the computation result of the real-time offload scheduling, and performing inference computation by loading a pre-deployed deep learning analysis model according to the computation capability and state of the current device.

7. The method of multi-video streaming offload for mobile devices of claim 6, wherein the inference is accelerated in a single-threaded batch if the current video frame is scheduled to a cooperative end-point device local to the task-initiating device or less computationally intensive; if the current video frame is scheduled on a computationally intensive edge server, the computation of the video frame is accelerated in a multi-threaded batch process.

8. The method for unloading multiple video streams supporting a mobile device according to claim 1, wherein in step S5, the abnormality detection and monitoring is performed by registering and monitoring the real-time resource status and network bandwidth information of the terminal device and the cooperating device through a monitoring thread running on the terminal device in real time and maintaining a real-time status table, and when the terminal device detects an abnormality, the current task is processed and the status table is updated in time by waking up the recovery thread.

9. A multi-video stream unloading system supporting mobile equipment comprises a lightweight video analysis deep learning model generation module, a multi-stream video frame pre-analysis module at an online stage, a real-time frame unloading scheduling module, a video frame operation analysis module and an abnormity detection and recovery module, wherein,

the lightweight video analysis deep learning model generation module runs on cloud resources provided with GPU computing resources and is a matched lightweight compression model provided by a deep learning model for analyzing videos according to the computing capacity of equipment used by a user side;