CN110889857A

CN110889857A - Mobile Web real-time video frame segmentation method and system

Info

Publication number: CN110889857A
Application number: CN201911121432.7A
Authority: CN
Inventors: 乔秀全; 黄亚坤; 商彦磊
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-11-15
Filing date: 2019-11-15
Publication date: 2020-03-17

Abstract

The invention provides a method and a system for segmenting a mobile Web real-time video frame, wherein the method comprises the following steps: receiving a video frame segmentation request sent by a Web end; feeding back the current network state and the state information of the edge server end to the Web end according to the video frame segmentation request; when the real-time segmentation performance of the edge server side for the video frame reaches the set segmentation performance; and performing video frame segmentation and graph segmentation processing at the edge server side to obtain a segmentation processing result. The invention utilizes the characteristics of high bandwidth and low time delay of the edge server to reduce the transmission time delay of the mobile Web end for acquiring the video frame in real time, and segments the video frame at the edge server, thereby ensuring the efficiency of video frame segmentation while ensuring the advance of the video frame segmentation effect.

Description

Mobile Web real-time video frame segmentation method and system

Technical Field

The invention belongs to the technical field of video frame segmentation, and particularly relates to a mobile Web real-time video frame segmentation method and system.

Background

Video frame segmentation is a process of segmenting video frame images into different regions with similar attributes, and is also a preprocessing step of computer vision tasks. Image segmentation plays an important role in applications such as scene understanding, object detection and 3D reconstruction, and is widely applied to the fields of augmented reality, medical image analysis, automatic driving, safety monitoring and the like. The video frame image segmentation and the Web technology are combined, so that a good foundation can be laid for providing lightweight, cross-platform and universal Web AI application programs for mobile users, and the applications can better perceive and interact in the real world. In addition, the immersive Web workgroup and immersive Web community group of the world wide Web consortium (W3C) are actively creating immersive Web platforms that provide reliable technologies and tools for developing real-time intelligent applications. Therefore, in order to provide smooth, real-time video frame image segmentation on the Web side, the most important evaluation index is the segmentation performance of real-time video frames.

The existing video frame segmentation method is difficult to be directly applied to mobile Web application and cannot meet the real-time requirement. The traditional video frame segmentation is usually performed at a Web end or a cloud server, and the video frame segmentation is performed at the Web end, so that the segmentation efficiency is high, but the segmentation effect is poor; the cloud server has better segmentation effect but low efficiency,

disclosure of Invention

To overcome the existing problems or at least partially solve the problems, embodiments of the present invention provide a method and a system for segmenting a mobile Web real-time video frame.

According to a first aspect of the embodiments of the present invention, a method for segmenting a mobile Web real-time video frame is provided, which includes:

receiving a video frame segmentation request sent by a Web end;

feeding back the current network state and the state information of the edge server end to the Web end according to the video frame segmentation request, so that the Web end judges whether the real-time segmentation performance of the video frame by the edge server end reaches the set segmentation performance or not according to the current network state and the state information of the edge server end;

if so, performing segmentation and graph segmentation processing on the video frame at the edge server side to obtain a segmentation processing result;

and feeding back the segmentation processing result to a Web end.

On the basis of the technical scheme, the invention can be improved as follows.

Further, the state information of the edge server includes a load of the edge server, and the determining whether the real-time segmentation performance of the edge server on the video frame reaches the set segmentation performance includes:

and when the load of the edge server does not exceed the maximum set load and the processing frequency of the edge server to the video frame meets the set frequency, determining that the real-time segmentation performance of the edge server to the video frame reaches the set segmentation performance.

Further, the load of the edge server includes a CPU load and an I/O load of the edge server.

Further, the performing segmentation and graph segmentation processing on the video frame at the edge server end to obtain a segmentation processing result includes:

the edge server divides the video frame to form a plurality of video frame fragments;

and performing parallel stream processing and graph segmentation processing on the video frame fragments to obtain segmentation processing results.

Further, the performing parallel graph segmentation processing on the plurality of video frame fragments to obtain a segmentation processing result includes:

carrying out segmentation processing on a plurality of video frame fragments based on a super-pixel pre-segmentation algorithm to obtain a first segmentation processing result;

carrying out segmentation processing on the video frame fragments based on a non-real-time deep learning DNN segmentation algorithm to obtain a second segmentation processing result;

and correcting the first segmentation result by using the second segmentation result to form a final segmentation result.

Further, the feeding back the segmentation processing result to the Web end includes:

and the edge server returns the final segmentation processing result to the Web end in a JSON format so that the Web end renders and colors the final segmentation processing result and caches the final segmentation processing result to the front end.

Further, if yes, performing video frame segmentation and graph segmentation processing at the edge server, and obtaining a segmentation processing result further includes:

and if the real-time segmentation performance of the edge server side on the video frame cannot reach the set segmentation performance, the Web side adopts a lightweight real-time segmentation algorithm to segment the video frame to obtain a segmentation processing result.

Further, the segmenting the video frame by adopting a lightweight real-time segmentation algorithm at the Web end, and obtaining a segmentation processing result includes:

setting parameters and marking information of the lightweight real-time segmentation algorithm according to the final segmentation processing result cached to the front end;

and carrying out segmentation processing on the video frame by adopting a lightweight real-time segmentation algorithm after setting parameters and marking information to obtain a corresponding segmentation processing result.

Further, the method for segmenting the video frame by using the lightweight real-time segmentation algorithm after the setting of the parameters and the marking information further includes the following steps after obtaining a corresponding segmentation result:

and performing coloring, distinguishing and rendering on different planes in the video frame of the segmentation processing result.

According to a third aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor calls the program instruction to be able to execute the mobile Web real-time video frame segmentation method provided in any one of the various possible implementations of the first aspect.

The embodiment of the invention provides a mobile Web real-time video frame segmentation method and a mobile Web real-time video frame segmentation system.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic overall flow chart of a mobile Web real-time video segmentation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of video segmentation according to an embodiment of the present invention;

fig. 3 is a schematic overall flow chart of a mobile Web real-time video segmentation method according to an embodiment of the present invention;

fig. 4 is a connection block diagram of a mobile Web real-time video segmentation system according to an embodiment of the present invention;

fig. 5 is a schematic view of an overall structure of an electronic device according to an embodiment of the present invention.

Detailed Description

In an embodiment of the present invention, a mobile Web real-time video segmentation method is provided, and fig. 1 is a schematic overall flow chart of the mobile Web real-time video segmentation method provided in the embodiment of the present invention, where the method includes:

receiving a video frame segmentation request sent by a Web end;

and feeding back the segmentation processing result to a Web end.

It can be understood that, the traditional video frame segmentation is usually performed on a Web end or a cloud server, and the segmentation of the video frame performed on the Web end has high segmentation efficiency but poor segmentation effect; the cloud server has a good segmentation effect, but the efficiency is not high, so that the embodiment of the invention reduces the transmission delay of the mobile Web end for acquiring the video frame in real time by using the characteristics of high bandwidth and low delay of the edge server, improves the segmentation efficiency of the video frame, and also improves the segmentation effect of the video frame.

Specifically, a mobile user opens any terminal device provided with a Web browser and requests an edge server to send a real-time video frame segmentation request through the browser, meanwhile, stable and reliable WebSocket communication is established between the Web browsers by the edge server, corresponding page information is returned to render and present on the Web browser, the main content of a page is to call a camera, and real-time video frame data are obtained by sampling 60 frames per second from the camera through a getUserMedia () interface of WebRTC; after the Web browser acquires the video frames, preprocessing each frame image, namely, performing basic processing such as image compression, size and format, denoising, contrast enhancement and the like on the video frame images; and simultaneously sending a real-time video frame segmentation request to an edge server.

And the edge server receives the video frame segmentation request sent by the Web end and feeds back the current network state and the state information of the edge server to the Web end. The Web end judges whether the real-time segmentation performance of the edge server end on the video frame reaches the set segmentation performance according to the current network state fed back by the edge server and the state information of the edge server, if so, the Web end sends the acquired video frame to the edge server, the segmentation of the video frame is completed on the edge server, and the segmentation processing result is fed back to the Web end.

The embodiment of the invention utilizes the characteristics of high bandwidth and low time delay of the edge server to reduce the transmission time delay of the mobile Web end for acquiring the video frame in real time, and segments the video frame at the edge server, thereby ensuring the efficiency of video frame segmentation while ensuring the advance of the video frame segmentation effect.

On the basis of the foregoing embodiment, in the embodiment of the present invention, the state information of the edge server includes a load of the edge server, and the determining whether the real-time segmentation performance of the edge server for the video frame reaches the set segmentation performance includes:

The current network state is 3G, 4G or WiFi, and the load of the edge server comprises the CPU load and the I/O load of the edge server. When the Web end judges whether the real-time segmentation performance of the edge server end on the video frame reaches the set segmentation performance according to the current network state and the load of the edge server, firstly, whether the load of the current edge server exceeds the maximum set load is judged, and if the load exceeds the maximum load, the current server is directly returned to be unavailable; when the load of the current edge server does not exceed the maximum set load, calculating to estimate whether the processing frequency of the edge server to the video frame reaches the set frequency according to the current network condition, for example, if the real-time segmentation performance of the edge server to the video frame is required to meet the requirement that the video frame is not less than 30FPS, estimating whether the processing delay of the edge server to a frame of video frame is less than 33.3ms according to the current network state, namely, the frame processing frequency of 30FPS is calculated, if so, returning to the general edge server to be available, otherwise, returning to the network state to be unavailable.

On the basis of the foregoing embodiments, in the embodiments of the present invention, the performing segmentation and graph segmentation processing on a video frame at an edge server end to obtain a segmentation processing result includes:

It can be understood that, when the real-time segmentation performance of the edge server on the video frame meets the set segmentation performance, the Web end sends the video frame to the edge server, and the line of sight divides the video frame on the edge server. Firstly, the edge server divides a video frame to form a plurality of video frame fragments; and then, performing parallel stream processing on the plurality of video frame fragments and performing graph segmentation processing to obtain segmentation processing results.

On the basis of the foregoing embodiments, in an embodiment of the present invention, the performing parallel graph segmentation processing on the plurality of video frame fragments to obtain a segmentation processing result includes:

Referring to fig. 2, when the real-time segmentation performance of the edge server on the video frame reaches the set segmentation performance, the edge server segments the video frame, and the particular segmentation process is that the edge server performs parallel stream processing on a plurality of segmented video frame fragments to accelerate the segmentation of the video frame, and the parallel stream processing mainly includes decoding, segmentation, encoding, result correction, and the like on the video frame. In order to further improve the efficiency of the graph segmentation algorithm and reduce the processing time delay, the embodiment of the invention provides the pre-segmentation algorithm based on the superpixel, namely, the original picture is primarily segmented according to the information of the adjacent pixel points, and then the graph segmentation algorithm is carried out on the superpixel segmentation graph, so that the complexity of the graph segmentation algorithm is effectively reduced, and the real-time graph segmentation can be provided at a universal edge server. And carrying out segmentation processing on the plurality of video frame fragments by a super-pixel pre-segmentation algorithm to obtain a first segmentation processing result.

In order to further improve the segmentation performance of the real-time image segmentation algorithm based on the super-pixels under the non-ideal scene (dim light and weak contrast), the embodiment of the invention also provides the method for setting the parameters of the real-time image segmentation algorithm based on the super-pixels by using the latest result of the non-real-time DNN segmentation algorithm as the prior knowledge, such as the optimal segmentation block number of the current video frame. In addition, the priori knowledge is also used for splicing and correcting the small video frames subjected to parallel segmentation, particularly for determining the attribution of the segmentation region of the pixel information at the splicing edge of the mark information, and the performance of the super-pixel-based image segmentation algorithm is improved. The method comprises the steps of utilizing a non-real-time DNN segmentation algorithm to segment a plurality of video frame fragments to obtain a second segmentation processing result after segmentation, and utilizing the second segmentation processing result to correct the first segmentation result to form a final segmentation processing result.

On the basis of the foregoing embodiments, in an embodiment of the present invention, feeding back the segmentation processing result to the Web side includes:

It can be understood that the final segmentation processing result is obtained after the video frame is segmented on the edge server, and the final segmentation processing result is returned to the Web end in the JSON format, so that the Web end renders and colors the final segmentation processing result, and the final segmentation of the video frame is realized, and the final segmentation processing result is cached to the front end.

On the basis of the foregoing embodiments, in the embodiments of the present invention, if yes, performing segmentation and graph segmentation processing on a video frame at an edge server, and obtaining a segmentation processing result further includes:

It can be understood that, in a normal case, the video frame can be divided only on the edge server, but when the real-time division performance of the edge server on the video frame cannot reach the set division performance, that is, the current network or edge server state is difficult to support the real-time video frame division, in this case, the division processing on the video frame is implemented on the Web side to supplement the division processing on the video frame by the edge server.

On the basis of the foregoing embodiments, in an embodiment of the present invention, performing segmentation processing on the video frame by using a lightweight real-time segmentation algorithm at a Web end, and obtaining a segmentation processing result includes:

It can be understood that, in the process of segmenting a video frame at a Web end, a lightweight real-time segmentation algorithm running at the Web end is mainly a marker-based watershed algorithm, and since the watershed algorithm provided in opencv.js has an over-segmentation phenomenon in a complex outdoor scene, the method caches a result of the last edge server-assisted segmentation at the Web end, and acquires corresponding parameters and marker information of the watershed algorithm according to a segmentation result on the edge server, that is, sets the parameters and the marker information of the lightweight real-time segmentation algorithm according to a final segmentation processing result cached to a front end; and according to the parameters and the mark information provided by the latest segmentation result, the Web end updates the watershed algorithm based on the mark in real time, dynamically sets the parameters and the mark information, and completes real-time video frame segmentation to obtain a video frame segmentation result at the Web end.

On the basis of the foregoing embodiments, in an embodiment of the present invention, after the performing a segmentation process on the video frame by using a lightweight real-time segmentation algorithm after setting parameters and flag information to obtain a corresponding segmentation process result, the method further includes:

It can be understood that, no matter the video frame is segmented at the edge server or at the Web end, the segmented video frame needs to be rendered at the Web end finally. The Web end analyzes the final segmentation result of the video frame into a data format required by a rendering stage; and finishing real-time rendering on the video frame according to the segmentation result, coloring and distinguishing different planes in the video frame, and finishing real-time video frame segmentation.

Referring to fig. 3, which is an overall flowchart of the method for segmenting the mobile Web real-time video frame according to the embodiment of the present invention, the Web side acquires the real-time video frame, and performs preprocessing (including noise reduction and filtering) on the acquired video frame; and sending a video frame segmentation request to an edge server, and feeding back the current network state and the state information of the edge server to the Web end by the edge server. The Web end judges whether the real-time segmentation performance of the edge server to the video frame meets the requirements or not according to the current network state and the state information of the edge server, if so, the Web end sends the video frame to the edge server, and the edge server segments the video frame. When the real-time segmentation performance of the edge server on the video frame does not meet the requirement, the video frame is segmented at the Web end, the Web end sets parameters of a segmentation algorithm according to the result of the edge server on the video frame, and the video frame is segmented to obtain the segmentation processing result of the video frame.

And finally, rendering the video frame segmentation processing result (including the segmentation result of the video frame at the edge server and the segmentation result of the video frame at the Web end) by the Web end, and completing the real-time segmentation of the video frame.

In another embodiment of the present invention, a mobile Web real-time video frame segmentation system is provided, which is used for implementing the method in the foregoing embodiments. Therefore, the description and definition in the embodiments of the mobile Web real-time video frame division method described above can be used for understanding the execution modules in the embodiments of the present invention. Fig. 4 is a schematic diagram of an overall structure of a mobile Web real-time video frame segmentation system according to an embodiment of the present invention, where the system includes a receiving module 41, a first feedback module 42, a segmentation module 43, and a second feedback module 44.

A receiving module 41, configured to receive a video frame segmentation request sent by a Web end;

a first feedback module 42, configured to feed back, to the Web end, a current network state and state information of the edge server end according to the video frame segmentation request, so that the Web end determines, according to the current network state and the state information of the edge server end, whether the real-time segmentation performance of the video frame by the edge server end reaches a set segmentation performance;

the segmentation module 43 is configured to perform segmentation and graph segmentation on the video frame when the real-time segmentation performance of the video frame at the edge server side reaches a set segmentation performance, so as to obtain a segmentation result;

and the second feedback module 44 is configured to feed back the segmentation processing result to the Web end.

The mobile Web real-time video frame segmentation system provided by the embodiment of the present invention corresponds to the mobile Web real-time video frame segmentation methods provided by the foregoing embodiments, and the relevant technical features of the mobile Web real-time video frame segmentation system may refer to the relevant technical features of the mobile Web real-time video frame segmentation methods provided by the foregoing embodiments, and are not described herein again.

Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: the system comprises a processor (processor)01, a communication Interface (Communications Interface)02, a memory (memory)03 and a communication bus 04, wherein the processor 01, the communication Interface 02 and the memory 03 complete mutual communication through the communication bus 04. Processor 01 may call logic instructions in memory 03 to perform the following method: receiving a video frame segmentation request sent by a Web end; feeding back the current network state and the state information of the edge server end to the Web end according to the video frame segmentation request, so that the Web end judges whether the real-time segmentation performance of the video frame by the edge server end reaches the set segmentation performance or not according to the current network state and the state information of the edge server end; if so, performing segmentation and graph segmentation processing on the video frame at the edge server side to obtain a segmentation processing result; and feeding back the segmentation processing result to a Web end.

In addition, the logic instructions in the memory 03 can be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: receiving a video frame segmentation request sent by a Web end; feeding back the current network state and the state information of the edge server end to the Web end according to the video frame segmentation request, so that the Web end judges whether the real-time segmentation performance of the video frame by the edge server end reaches the set segmentation performance or not according to the current network state and the state information of the edge server end; if so, performing segmentation and graph segmentation processing on the video frame at the edge server side to obtain a segmentation processing result; and feeding back the segmentation processing result to a Web end.

According to the mobile Web real-time video frame segmentation method, the mobile Web real-time video frame segmentation system, the electronic equipment and the storage medium, the transmission delay of the mobile Web end for acquiring the video frame in real time is reduced by using the characteristics of high bandwidth and low delay of the edge server, meanwhile, the video frame is segmented into a plurality of small video frames, and the segmentation of the video frame is accelerated by using a parallel processing technology.

In order to further improve the efficiency of the traditional graph segmentation algorithm, when the real-time video frame segmentation performance is not lower than 30FPS, the initial video frame is pre-segmented on the basis of the superpixel segmentation algorithm in the edge server, and then the graph segmentation algorithm is executed, so that the complexity of the graph segmentation algorithm is effectively reduced, and the real-time video frame segmentation service can be provided for the mobile Web on the edge server.

In addition, in order to improve the segmentation performance of the traditional graph segmentation algorithm, parameters are dynamically provided for the graph segmentation algorithm on an edge server by using the segmentation result of the deep learning segmentation algorithm as prior knowledge, and the segmentation result of the small video frames is merged and corrected, so that the segmentation effect of the graph segmentation algorithm is improved, and the graph segmentation algorithm is particularly applied to dim light and weak contrast scenes.

Finally, considering that the real-time segmentation of the edge server depends on stable network conditions and server states, for example, when the network environment is unstable, the video frame transmission delay may be greatly increased, and the migration of the edge server or the service switching may cause a short service stop. At this time, a lightweight real-time video frame segmentation algorithm is performed on the Web side, and continuous and stable services can still be provided for users in an unstable environment.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A mobile Web real-time video frame segmentation method is characterized by comprising the following steps:

receiving a video frame segmentation request sent by a Web end;

and feeding back the segmentation processing result to a Web end.

2. The method of claim 1, wherein the status information of the edge server includes a load of the edge server, and the determining whether the real-time video frame segmentation performance of the edge server reaches the set segmentation performance comprises:

3. The video frame segmentation method according to claim 2, wherein the load of the edge server includes a CPU load and an I/O load of the edge server.

4. The method of claim 1, wherein the performing the segmentation and graph segmentation processing on the video frame at the edge server side to obtain the segmentation processing result comprises:

5. The method according to claim 4, wherein the performing a parallel graph segmentation process on the plurality of video frame fragments to obtain a segmentation process result comprises:

6. The video frame segmentation method according to claim 5, wherein the feeding back the segmentation processing result to the Web end includes:

7. The method of claim 6, wherein if yes, performing the segmentation and graph segmentation processing on the video frame at the edge server, and obtaining the segmentation processing result further comprises:

8. The video frame segmentation method according to claim 7, wherein the segmenting the video frame by using a lightweight real-time segmentation algorithm at the Web side to obtain a segmentation result comprises:

9. The method of claim 8, wherein the segmenting the video frame by using the lightweight real-time segmentation algorithm with the set parameters and the set tag information to obtain the corresponding segmentation result further comprises:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the mobile Web real-time video frame segmentation method according to any one of claims 1 to 9.