CN112580481B

CN112580481B - Edge node and cloud collaborative video processing method, device and server

Info

Publication number: CN112580481B
Application number: CN202011466467.7A
Authority: CN
Inventors: 李治军; 韩朴宇; 刘劼; 杨波; 梁兴伟; 杨建祥
Original assignee: Shenzhen Hit Technology Innovation Industry Development Co ltd; Konka Group Co Ltd; Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Hit Technology Innovation Industry Development Co ltd; Konka Group Co Ltd; Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2024-05-28
Anticipated expiration: 2040-12-14
Also published as: CN112580481A

Abstract

The invention discloses a method, a device and a server for processing collaborative video based on edge nodes and cloud, wherein the method comprises the following steps: acquiring video image pixel data after video image compression coding processing by an edge node; video image decoding is carried out on the video image pixel data to obtain image decoding data; and performing visual feature analysis training based on the convolutional neural network model on the image decoding data to obtain video image visual feature analysis data. In the embodiment, the video data is processed by cooperating the edge node and the cloud server, so that the task with lower calculation power requirement runs at the edge node, the task with high calculation power requirement runs at the cloud server, the high performance of the cloud server and the low delay and privacy of the edge node calculation are combined, and meanwhile, the throughput rate of the calculation task is improved by using a pipeline mechanism, so that the calculation efficiency is improved.

Description

Edge node and cloud collaborative video processing method, device and server

Technical Field

The invention relates to the technical field of the Internet of things, in particular to a method, a device and a server for processing collaborative video based on edge nodes and cloud.

Background

In recent years, cloud computing, internet of things and artificial intelligence technologies are continuously developed, more and more infrastructures are transformed to intelligent transformation, and a large amount of picture and video data are generated by widely applied camera sensors. However, the cloud data center has limited bandwidth, the sensor has certain delay to the cloud data center, the requirement of low delay and high throughput cannot be met by simply processing massive video streams by means of a cloud computing server, unavoidable network delay and bandwidth limitation exist, and because the end consumer and the cloud platform are connected through the Internet, the network delay is far greater than the application deployed locally, and meanwhile, the available bandwidth is also smaller than the local area network bandwidth; the public cloud has a complex charging mode, and the Internet of things can bring great expense when a large amount of uploading data is applied; the cloud platform has a small probability that service interruption is possibly caused by software, hardware or network faults, so that a consumer cannot avoid the faults, and the recovery time cannot be determined or maintenance work cannot be carried out after the faults; due to security holes or configuration errors of a firewall and permission control of the cloud platform, data on the cloud can be accidentally revealed, and security and privacy problems are caused. The problem of throughput and bandwidth is solved by simply relying on edge calculation to place calculation near data, but the edge calculation nodes are generally low in performance, cannot finish the operation of a large-scale deep neural network, the throughput is low due to the serial processing flow, a large amount of funds are needed for constructing the high-performance edge calculation nodes, the repeated construction also brings about the waste of calculation force, and the method for processing the image data in the prior art cannot meet the requirements of bandwidth, delay, throughput rate and high calculation capacity.

Accordingly, there is a need for improvement and development in the art.

Disclosure of Invention

The invention aims to solve the technical problems that aiming at the defects in the prior art, an edge node and cloud collaborative video processing method is provided, and aims to solve the problems that in the prior art, the requirement of low delay and high throughput cannot be met by simply processing a mass video stream by a cloud computing server, the computational power is low by simply relying on edge computing, a large-scale deep neural network model cannot be operated, and the throughput is low due to serial processing flow.

The technical scheme adopted by the invention for solving the problems is as follows:

In a first aspect, an embodiment of the present invention provides a method for processing collaborative video based on an edge node and a cloud, where the method includes:

acquiring video image pixel data after video image processing by an edge node;

performing video image decoding on the video image pixel data to obtain image decoding data;

And performing visual feature analysis training based on a convolutional neural network model on the image decoding data to obtain video image visual feature analysis data.

In one implementation manner, the generating manner of the video image pixel data after the video image processing by the edge node is as follows:

Acquiring video image pixel data shot by a camera;

filtering image pixel data without change of an object in the video image pixel data to obtain effective video image pixel data;

and performing image scaling and image compression coding on the effective video image pixel data to obtain video image pixel data.

In one implementation manner, the filtering the image pixel data repeated by the object in the video image pixel data, after obtaining the effective video image pixel data, further includes:

and gamma correction, sharpening and fish eye correction are carried out on the effective video image pixel data.

In one implementation, the performing image scaling and image compression encoding on the effective video image pixel data to obtain video image pixel data includes:

Adjacent pixel combination is carried out on the effective video image pixel data to obtain scaled video image pixel data;

And according to the coding redundancy and the pixel redundancy between the scaled video image pixel data, binary coding is carried out on the scaled video image pixel data to obtain video image pixel data.

In one implementation, the performing image scaling and image compression encoding on the effective video image pixel data to obtain video image pixel data further includes:

and when the effective video image pixel data is subjected to image scaling and image compression coding, an image processing hardware unit or an image processing software thread is added to improve the speed of image scaling and image compression coding.

In one implementation manner, the performing the visual feature analysis training based on the convolutional neural network model on the image decoding data to obtain the visual feature analysis data of the video image includes:

And inputting the image decoding data into a convolutional neural network model to obtain video image visual characteristic analysis data.

In one implementation manner, the performing the visual feature analysis training based on the convolutional neural network model on the image decoding data to obtain the visual feature analysis data of the video image further includes:

and when the image decoding data is subjected to visual feature analysis training based on a convolutional neural network model, an image processing hardware unit or an image processing software thread is added to increase the rate of the visual feature analysis training.

In one implementation manner, the convolutional neural network model generation manner is as follows:

Acquiring input sample data;

inputting the input sample data into a modeling model to obtain modeling model output data;

Re-inputting the modeling model output data to the modeling model for training iteration;

Repeating the step of inputting the modeling model output data into the modeling model again for training iteration until the modeling model output data meets the preset requirement, stopping training iteration, and obtaining the convolutional neural network model.

In a second aspect, an embodiment of the present invention further provides an apparatus for collaborative video processing based on an edge node and a cloud, where the apparatus includes:

the video image pixel data acquisition unit is used for acquiring video image pixel data after video image processing by the edge node;

an image decoding data obtaining unit, configured to perform video image decoding on the video image pixel data to obtain image decoding data;

The video image visual characteristic analysis data acquisition unit is used for performing visual characteristic analysis training based on the convolutional neural network model on the image decoding data to obtain video image visual characteristic analysis data.

In a third aspect, an embodiment of the present invention further provides a server, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors, where the one or more programs include a method for performing the edge node and cloud-based collaborative video processing according to any one of the above.

The invention has the beneficial effects that: according to the embodiment of the invention, after the edge node firstly obtains the video image pixel data shot by the camera, video image compression coding processing is carried out on the video image pixel data to obtain compressed coding video image pixel data, the edge node sends the compressed coding video image pixel data to the cloud server, the edge node codes the video image data, so that the cloud server carries out video image decoding on the compressed coding video image pixel data to obtain image decoding data, and finally, the cloud server carries out visual feature analysis training on the image decoding data based on a convolutional neural network model to obtain video image visual feature analysis data. According to the method, the edge nodes and the cloud server are cooperated to process video data, so that tasks with low calculation power requirements run on the edge nodes, tasks with high calculation power requirements run on the cloud server, high performance of cloud server calculation and low delay of edge node calculation are combined, communication bandwidth is reduced through image lossy compression and image filtering of the edge nodes, meanwhile, a pipeline mechanism is used, and the throughput rate of calculation tasks is improved by adding hardware units or software threads, so that operation efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

Fig. 1 is a schematic flow chart of a collaborative video processing method based on edge nodes and cloud

Fig. 2 is a schematic block diagram of an apparatus based on edge node and cloud collaborative video processing according to an embodiment of the present invention.

Fig. 3 is a schematic block diagram of an internal structure of a server according to an embodiment of the present invention.

Detailed Description

The invention discloses a method, a device and a server for processing video based on edge nodes and cloud cooperation, which are used for making the purposes, the technical scheme and the effects of the invention clearer and more definite, and the invention is further described in detail below by referring to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In the prior art, a cloud data center simply depends on a cloud computing server to have limited bandwidth, a sensor has a certain delay to the cloud data center, and the requirement of low delay and high throughput cannot be met by simply depending on the cloud computing server to process massive video streams. The problem of throughput and bandwidth is solved by simply relying on edge calculation and placing calculation near data, but the edge calculation node is generally low in performance, operation of a large-scale deep neural network cannot be completed, a large amount of funds are needed for building the high-performance edge calculation node, the repeated construction also brings about waste of calculation power, and the method for processing the image data in the prior art cannot meet the requirements of bandwidth, delay, throughput rate and high calculation capability.

In order to solve the problems in the prior art, the embodiment provides a collaborative video processing method based on edge nodes and cloud, and in the embodiment of the invention, the edge nodes are nodes close to the network edge side of a request place, and the edge nodes can be a server or an intelligent driving automobile, and are not particularly limited. After the edge node acquires the video image pixel data shot by the camera, video image compression coding processing is carried out on the video image pixel data to obtain compression coding video image pixel data, the edge node sends the compression coding video image pixel data to the cloud server through the RPC, the RPC is remote procedure call, and the RPC is similar to local procedure call and only works on the internet. Because the edge node encodes the video image data, the cloud server decodes the video image of the compressed encoded video image pixel data to obtain image decoding data, and finally, the cloud server performs visual feature analysis training based on a convolutional neural network model on the image decoding data to obtain video image visual feature analysis data, the cloud server sends the video image visual feature analysis data back to the edge node, and the edge node receives the video image visual feature analysis data. According to the method, the edge nodes and the cloud server are cooperated to process video data, so that tasks with low calculation power requirements run on the edge nodes, tasks with high calculation power requirements run on the cloud server, high performance of cloud server calculation and low delay of edge node calculation are combined, communication bandwidth is reduced through image lossy compression and image filtering of the edge nodes, meanwhile, a pipeline mechanism is used, and the throughput rate of calculation tasks is improved by adding hardware units or software threads, so that operation efficiency is improved.

Illustrative examples

Cloud computing refers to a computing mode in which a service provider provides shared software and hardware resources to users as needed. According to the national institute of standards and technology definition (Peter Mell;Timothy Grance(September 2011).The NIST Definition of Cloud Computing(Technical report).National Institute of Standards and Technology:U.S.Department of Commerce), cloud computing has three modes of service (1) infrastructure as a service (IaaS), cloud computing providers offer "basic computing resources," such as computing resources, storage resources, and network resources, which consumers can use to deploy operating systems, applications, firewalls, etc. programs, but cannot control the underlying infrastructure. Typically IaaS provides computing resources to consumers in the form of virtual machines (e.g., XEN or KVM virtual machines) or containers (e.g., docker containers), and cloud orchestration techniques (e.g., openStack or Kubernetes) to programmably control the virtual machines or containers. (2) Platform as a service (PaaS), cloud computing providers provide the running environment for applications. A consumer may deploy an application (e.g., a blog or forum) but cannot control an operating system, network, or hardware environment. Compared with IaaS, paaS eliminates the need for configuring complex software such as an operating system, a firewall, a database, etc., and consumers only need to deploy final application programs using a high-level language (such as Python or Java), cloud computing providers run programs submitted by consumers in a cloud platform. But PaaS also loses control of the underlying software. (3) Software is a service (SaaS), and cloud computing manufacturers rent software to consumers in the form of services, and consumers can use programs through the internet, but cannot control the operating system and the running environment of the programs. The SaaS manufacturer provides various software such as office software, instant messaging software, content management system, enterprise resource planning and the like as network services, and a user can access cloud services through clients such as a browser or through Application Programming Interfaces (APIs) disclosed by the SaaS manufacturer. SaaS speeds up the software delivery process because the software is located on the cloud, rather than in the consumer's computer, so the SaaS vendor only needs to update the software on the cloud to deliver the latest version. According to the deployment mode of cloud service, cloud computing is divided into the following deployment models:

(1) Public clouds are typically provided by public cloud computing vendors, and hardware resources and software architecture on the cloud are also responsible for the cloud computing vendors from which consumers need to purchase computing services. (2) Private clouds present privacy and privacy risks (although public clouds typically do not reveal confidential data) because data from different consumers on public clouds is commonly stored at third-party cloud computing vendors. Public clouds also have limitations such as uncontrolled availability, limited network bandwidth, and the like. The private cloud is a cloud platform which is self-erected by an organization (an enterprise, a school or a government agency) where a cloud user is located, and hardware resources and data belong to the organization. Private clouds still have the advantages of cloud computing and are more privacy. (3) The public cloud and the private cloud are used by the hybrid cloud, key data are usually located in the private cloud, and non-key data are located in the public cloud, so that the cost required for constructing private cloud hardware can be reduced.

Edge computation is defined as a form of computation that exists to reduce latency in a location close to a request. In the application of the internet of things, a sensor device generates a large amount of real-time data, and submitting all the data to cloud computing can cause great bandwidth overhead and unavoidable delay, and the requirements on economy and application effects cannot be met.

Pipeline like a production line in engineering, a pipeline of computers is a set of connected computation processes, where the output of each computation is the input of the next computation, and the computation is done in parallel.

The throughput of the pipeline depends on its slowest calculation process, and since it is unlikely that the calculation time for each stage will be the same, the pipeline will slightly increase the delay of a single task, and the introduction of the pipeline is better as long as the benefit of increased throughput is stronger than the increase in delay.

If a certain computational process is slower, but can be made to have multiple instances executing at the same time by increasing the number of hardware units or software threads, the process can be prevented from reducing the rate of throughput. Also, increasing the number of execution units does not reduce the latency of a single task.

When the intelligent driving vehicle is running, a lot of operations are involved, for example, some operations are needed, such as the operation on the edge node of the intelligent driving vehicle directly with the data of road condition identification and the like, but the processing on the edge node of the video image pixel data which is needed to be processed by the camera for face recognition, target detection, target classification and the like is relatively low in efficiency, and the operation efficiency of the edge node is low due to the fact that the requirement of the part of data on the operation force is very high, the part of data operation with high requirement on the operation performance can be placed in the cloud, and the operation efficiency is improved. In this embodiment, the edge node firstly obtains the video image pixel data shot by the camera, then performs preprocessing on the video image pixel data, such as filtering invalid data in the video image pixel data, scaling, compression encoding and the like, then sends the video image pixel data subjected to the video image compression encoding processing to the cloud server, and the cloud server decodes the video image pixel data to obtain image decoding data. According to the method, the edge nodes and the cloud server are cooperated to process video data, so that tasks with low calculation power requirements run on the edge nodes, tasks with high calculation power requirements run on the cloud server, high performance of cloud server calculation and low delay and privacy of edge node calculation are combined, communication bandwidth is reduced through image lossy compression and image filtering of the edge nodes, meanwhile, a pipeline mechanism is used, and the throughput rate of calculation tasks is improved by increasing hardware units or software threads, so that operation efficiency is improved.

Exemplary method

The embodiment provides a collaborative video processing method based on edge nodes and cloud, which can be applied to a server for image processing. As shown in fig. 1, the method includes:

Step S100, compressed and encoded video image pixel data after video image compression and encoding processing is carried out on the edge node is obtained;

Specifically, the edge node calls video image pixel data shot by the camera API, and then preprocesses the video image pixel data, such as filtering invalid data in the video image pixel data, scaling, compression coding and the like, so as to obtain compression coded video image pixel data after the video image compression coding processing, and prepare for image operation to be carried out on a cloud server.

In order to obtain the compressed and encoded video image pixel data, the generating mode of the compressed and encoded video image pixel data after the video image processing by the edge node is as follows: acquiring video image pixel data shot by a camera; filtering image pixel data without change of an object in the video image pixel data to obtain effective video image pixel data; and performing image scaling and image compression coding on the effective video image pixel data to obtain video image pixel data.

In this embodiment, for example, a camera on an intelligent driving vehicle shoots video image pixel data in real time, and an edge node acquires the video image pixel data, and since some data in the video image pixel data are invalid data for calculation, if the data are uploaded to a cloud server, bandwidth is increased, image processing is performed on the edge node, traditional machine learning or CNN network with smaller scale filters out image data without effective data or image data without change of an object, that is, the effective video image pixel data, so that an image sent to the cloud server by the edge node only contains effective images, thereby reducing the collaborative processing bandwidth based on the edge node and the cloud server. In addition, the edge node also preprocesses the effective video image pixel data, such as image scaling and image compression coding the effective video image pixel data to obtain compression coding video image pixel data; the image scaling is to change the resolution of the image data, and the image compression can further compress the image data and compress the image into a file suitable for network transmission; the video compression can further reduce the flow of network transmission by comparing the difference compression data between frames, and simultaneously, the cooperative processing bandwidth based on the edge node and the cloud server is reduced.

Further, filtering the image pixel data of the video image pixel data, where the object is not changed, further includes performing an operation on the image input and the image output, so that the image input and the image output are all the video image pixel data with the same size, and specifically the method may be adopted: gamma correction, sharpening, fish eye correction, and the like. Gamma correction is called gamma nonlinearity or gamma coding, and is used to perform nonlinear operation or inverse operation on the brightness or tri-stimulus value of light in a film or image system; sharpening is to focus the blurred edge quickly, so that the definition or focal length of a certain part in the image is improved, and the color of a specific area of the image is clearer. The fisheye correction corrects the photograph taken by the fisheye lens.

In order to obtain video image pixel data, the image scaling and image compression encoding are performed on the effective video image pixel data, and the video image pixel data are obtained, which comprises the following steps: adjacent pixel combination is carried out on the effective video image pixel data to obtain scaled video image pixel data; and according to the coding redundancy and the pixel redundancy between the scaled video image pixel data, binary coding is carried out on the scaled video image pixel data to obtain video image pixel data.

Specifically, the edge node performs adjacent pixel merging on the effective video image pixel data to obtain scaled video image pixel data. For example: the method comprises the steps of directly merging adjacent pixels to reduce the data size, reducing the image resolution from the output resolution of a camera sensor to the input resolution of a CNN network (convolutional neural network), greatly reducing the transmission data amount due to the lower input resolution of the CNN network, and binary coding the scaled video image pixel data according to coding redundancy and pixel redundancy among the scaled video image pixel data to obtain video image pixel data. Lossy coding exploits the visual characteristics of the human eye, which is difficult to identify by the human eye even if some information is lost. Common lossless encodings are PNG, webp, etc., and common lossy encodings are JEPG, webp, HEIF, etc. Video coding encodes pixel data of a sequence of images into binary data, and video coding uses the relationship between adjacent image pixel data to achieve higher compression ratios, but with higher encoding complexity.

In addition, in order to improve the efficiency of image scaling and image compression coding, the performing image scaling and image compression coding on the effective video image pixel data to obtain video image pixel data further includes the following operations: and when the effective video image pixel data is subjected to image scaling and image compression coding, an image processing hardware unit or an image processing software thread is added to improve the speed of image scaling and image compression coding.

In this embodiment, when the edge node performs image scaling and image compression encoding on the effective video image pixel data, in order to improve the computing efficiency, the thread computing speed, such as an image processing hardware unit or an image processing software thread, may be accelerated by adding hardware or software. In practice, many socs have separate components for image processing tasks, such as ISPs (image signal processors), VPUs (video processing units), NPUs (neural network processing units), hardware codecs. The image processing hardware units can avoid the realization of related processing subtasks by using CPU software, reduce the processing time and reduce the CPU occupancy rate. The hardware acceleration and unloading is combined with the pipeline, so that the related hardware units can be guaranteed to process tasks with higher throughput. If the image processing hardware unit cannot process part of tasks due to parameter limitation or quantity limitation, for example, the input resolution is too high, so that Soc (system-in-chip) does not support processing or the image processing hardware unit only supports Soc of two paths of videos, three paths of videos cannot be processed, the image processing hardware unit can be combined with an image processing software thread, such as CPU software, to realize joint processing of the image processing software thread and the image processing hardware unit.

Step 200, video image decoding is carried out on the video image pixel data to obtain image decoding data;

Specifically, video image decoding of video image pixel data is an inverse process of encoding the video image pixel data, and mainly recovers video image pixel data before encoding, so as to prepare for subsequent visual feature analysis training based on a convolutional neural network model.

The embodiment provides a collaborative video processing method based on edge nodes and cloud, which can be applied to a server for image processing. As shown in fig. 1, the method comprises the following steps:

and step S300, performing visual feature analysis training based on a convolutional neural network model on the image decoding data to obtain video image visual feature analysis data.

Specifically, the Convolutional Neural Network (CNN) belongs to one of the artificial neural networks, and the network structure of the convolutional neural network with shared weights obviously reduces the complexity of the model and the number of the weights. The convolutional neural network can directly take the picture as the input of the network, automatically extract the characteristics, has high non-deformation on the deformation (such as translation, scaling, tilting) and the like of the picture, is commonly used for the neural network of visual analysis, and has wide application in the fields of face recognition, target detection, target classification, natural language processing, medical treatment and the like. In this embodiment, in order to perform efficient processing on the image decoding data, visual feature analysis training based on a convolutional neural network model is performed on the image decoding data, so as to obtain video image visual feature analysis data.

In order to obtain video image visual feature analysis data, the image decoding data is subjected to visual feature analysis training based on a convolutional neural network model, and the video image visual feature analysis data is obtained, which comprises the following steps:

And step S301, inputting the image decoding data into a convolutional neural network model to obtain video image visual characteristic analysis data.

Specifically, the cloud server inputs the image decoding data into the convolutional neural network model, and the complexity of the model and the number of weights are reduced due to the network structure of the weight sharing of the convolutional neural network model, in addition, the convolutional neural network can directly take the picture as the input of the network, automatically extract the characteristics, and has high non-deformation on the deformation (such as translation, scaling, inclination) and the like of the picture, and the high-quality image data can be obtained by carrying out convolutional neural network operation on the cloud server, so that the calculation efficiency can be improved.

In addition, if the cloud server wishes to increase the operation rate at the time of the visual feature analysis training based on the convolutional neural network model, the number of image processing hardware units or image processing software threads can be increased so that a plurality of instances are simultaneously executing at the time of the visual feature analysis training based on the convolutional neural network model, and the decrease of throughput by the process can be prevented. That is, the process can increase the number of execution units without changing the delay of a single task, and the throughput bottleneck brought by a CPU can be eliminated by increasing an image processing hardware unit or an image processing software thread or simultaneously increasing the image processing hardware unit and the image processing software thread to execute visual feature analysis training based on a convolutional neural network model.

In order to generate the convolutional neural network model, the convolutional neural network model is generated by the following steps: acquiring input sample data; inputting the input sample data into a modeling model to obtain modeling model output data; re-inputting the modeling model output data to the modeling model for training iteration; repeating the step of inputting the modeling model output data into the modeling model again for training iteration until the modeling model output data meets the preset requirement, stopping training iteration, and obtaining the convolutional neural network model.

Specifically, image input sample data in practice is obtained, the image input sample data is input into a modeling model, the modeling model outputs modeling model output data, the modeling model output data is input into the modeling model again for training iteration, the steps are repeated until the modeling model output data meets the preset requirement, that is, the mean square error value of the modeling model output data and the actual sample output data is smaller than a preset value, the modeling model training is successful, the training iteration is stopped, a convolutional neural network model is obtained, and the generated convolutional neural network model can perform visual characteristic analysis training on image decoding data.

Exemplary apparatus

As shown in fig. 2, an embodiment of the present invention provides a collaborative video processing device based on an edge node and a cloud, the device includes a video image pixel data obtaining unit 401, an image decoding data obtaining unit 402, and a video image visual feature analysis data obtaining unit 403, wherein:

A video image pixel data obtaining unit 401, configured to obtain video image pixel data after video image processing by the edge node;

an image decoding data obtaining unit 402, configured to perform video image decoding on the video image pixel data to obtain image decoding data;

The video image visual feature analysis data obtaining unit 403 is configured to perform visual feature analysis training based on a convolutional neural network model on the image decoding data, so as to obtain video image visual feature analysis data.

Based on the above embodiment, the present invention also provides a server, and a functional block diagram thereof may be shown in fig. 3. The server comprises a processor, a memory, a network interface, a display screen and a temperature sensor which are connected through a system bus. Wherein the processor of the server is configured to provide computing and control capabilities. The memory of the server includes nonvolatile storage medium and internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the server is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a method for co-pipelining video processing based on edge nodes and cloud. The display screen of the server can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the server is preset in the server and is used for detecting the running temperature of the internal equipment.

It will be appreciated by those skilled in the art that the schematic diagram of fig. 3 is merely a block diagram of some of the structures associated with the present invention and is not limiting of the servers to which the present invention may be applied, and that a particular server may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a server is provided that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for:

acquiring video image pixel data after video image processing by an edge node;

In summary, the invention discloses a method, a device and a server for processing collaborative video based on edge nodes and cloud, wherein the method comprises the following steps:

according to the embodiment of the invention, after the edge node firstly obtains the video image pixel data shot by the camera, video image compression coding processing is carried out on the video image pixel data to obtain compressed coding video image pixel data, the edge node sends the compressed coding video image pixel data to the cloud server, the edge node codes the video image data, so that the cloud server carries out video image decoding on the compressed coding video image pixel data to obtain image decoding data, and finally, the cloud server carries out visual feature analysis training on the image decoding data based on a convolutional neural network model to obtain video image visual feature analysis data. According to the method, the edge nodes and the cloud server are cooperated to process video data, so that tasks with low calculation power requirements run on the edge nodes, tasks with high calculation power requirements run on the cloud server, high performance of cloud server calculation and low delay of edge node calculation are combined, communication bandwidth is reduced through image lossy compression and image filtering of the edge nodes, meanwhile, a pipeline mechanism is used, and the throughput rate of calculation tasks is improved by adding hardware units or software threads, so that operation efficiency is improved.

It should be understood that the present invention discloses a method for collaborative video processing based on edge nodes and cloud, and it should be understood that the application of the present invention is not limited to the above examples, and those skilled in the art can make modifications or changes according to the above description, and all such modifications and changes should fall within the scope of the appended claims.

Claims

1. An edge node and cloud collaborative video processing method is characterized by comprising the following steps:

acquiring video image pixel data after video image processing by an edge node;

the video image pixel data generation mode of the edge node after video image processing is as follows:

Acquiring video image pixel data shot by a camera;

performing image scaling and image compression coding on the effective video image pixel data to obtain video image pixel data;

the filtering of the image pixel data without change of the object in the video image pixel data further comprises the following steps of:

Gamma correction, sharpening and fish eye correction are carried out on the effective video image pixel data;

The image scaling and image compression coding are carried out on the effective video image pixel data, and the video image pixel data obtaining comprises the following steps:

Binary encoding is carried out on the scaled video image pixel data according to the encoding redundancy and the pixel redundancy among the scaled video image pixel data to obtain video image pixel data;

Performing visual feature analysis training based on a convolutional neural network model on the image decoding data to obtain video image visual feature analysis data; the convolutional neural network directly takes an image as the input of the network, automatically extracts characteristics, and processes deformation of the image so that the image with a height is not deformed.

2. The method for collaborative video processing based on an edge node and a cloud as claimed in claim 1, wherein said performing image scaling and image compression encoding on the effective video image pixel data to obtain video image pixel data further comprises:

3. The edge node and cloud collaborative video processing method according to claim 2, wherein the performing visual feature analysis training on the image decoding data based on a convolutional neural network model to obtain video image visual feature analysis data includes:

4. The method for collaborative video processing based on edge nodes and cloud computing according to claim 3, wherein the training the image decoding data for visual feature analysis based on a convolutional neural network model to obtain video image visual feature analysis data further comprises:

5. The method for processing the video based on the edge node and the cloud cooperation as claimed in claim 4, wherein the convolutional neural network model is generated by:

Acquiring input sample data;

6. An edge node and cloud-based collaborative video processing device, wherein the device comprises:

Acquiring video image pixel data shot by a camera;

the video image visual characteristic analysis data acquisition unit is used for performing visual characteristic analysis training based on a convolutional neural network model on the image decoding data to obtain video image visual characteristic analysis data; the convolutional neural network directly takes an image as the input of the network, automatically extracts characteristics, and processes deformation of the image so that the image with a height is not deformed.

7. A server comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors, the one or more programs comprising instructions for performing the method of any of claims 1-5.