CN114254156A

CN114254156A - Video processing method, algorithm bin creating method, device and server

Info

Publication number: CN114254156A
Application number: CN202111601358.6A
Authority: CN
Inventors: 黄定江; 袁艺; 江芸; 陈龙; 崔江鹤; 张宇峰; 李忠
Original assignee: China Telecom Digital Intelligence Technology Co Ltd
Current assignee: China Telecom Digital Intelligence Technology Co Ltd
Priority date: 2021-12-24
Filing date: 2021-12-24
Publication date: 2022-03-29

Abstract

The application provides a video processing method, an algorithm bin creating method, a device and a server. The method comprises the following steps: acquiring a task to be processed, wherein the task to be processed comprises video data to be processed and N preset labels to be detected, the N labels to be detected represent item types expected by a user to be detected on the video data to be processed, and N is an integer greater than or equal to 1; determining all AI algorithms corresponding to the N labels to be detected from a preset algorithm bin; and detecting the video data to be processed through all AI algorithms to obtain detection results corresponding to the N labels to be detected. The algorithm bin is obtained by arranging a plurality of algorithms, the corresponding AI algorithm is selected from the algorithm bin, and then the selected AI algorithm is used for detecting the tasks, so that the detection processing of diversified tasks can be realized, and diversified detection services can be flexibly provided for users.

Description

Video processing method, algorithm bin creating method, device and server

Technical Field

The application relates to the technical field of machine vision processing, in particular to a video processing method, an algorithm bin creating method, a device and a server.

Background

In the field of machine vision processing, deep learning techniques can be used to analyze objects such as people, vehicles, objects, etc. in videos. At present, a single target is detected based on a single algorithm, a single detection mode does not meet the requirements of a scene and the popularization of services, and diversified detection services cannot be provided for users aiming at diversified tasks.

Disclosure of Invention

An object of the embodiments of the present application is to provide a video processing method, an algorithm bin creating device, and a server, which can solve the problem that a single detection mode cannot provide diversified detection services for a user.

In order to achieve the above object, embodiments of the present application are implemented as follows:

in a first aspect, an embodiment of the present application provides a video processing method, where the method includes:

acquiring a task to be processed, wherein the task to be processed comprises video data to be processed and N preset labels to be detected, the N labels to be detected represent item types expected by a user to detect the video data to be processed, and N is an integer greater than or equal to 1;

determining all AI algorithms corresponding to the N labels to be detected from a preset algorithm bin, wherein each AI algorithm in the preset algorithm bin is associated with at least one functional label, and each functional label corresponds to the type of the item to be detected;

and detecting the video data to be processed through all AI algorithms to obtain detection results corresponding to the N labels to be detected.

In the above embodiment, since a plurality of types of scheduled AI algorithms are pre-stored in the preset algorithm bin, when diversified tasks are to be executed, the corresponding AI algorithm can be selected from the algorithm bin based on the tasks and the corresponding tags to be detected, and then the selected AI algorithm detects the tasks, so that detection processing of the diversified tasks can be realized, and diversified detection services can be flexibly provided for users.

With reference to the first aspect, in some optional embodiments, before acquiring the task to be processed, the method further includes:

acquiring multi-class AI algorithms of a plurality of manufacturers;

and adding at least one corresponding functional label to each type of AI algorithm in the multiple types of AI algorithms to form a preset algorithm bin, wherein the functional label comprises at least one label of representation target identification, action identification and applicable scene types.

With reference to the first aspect, in some optional embodiments, the target recognition includes at least one of object recognition, face recognition and pedestrian recognition, and the AI algorithm is used for detecting video data and/or image data.

With reference to the first aspect, in some optional embodiments, the detecting the video data to be processed by using all AI algorithms to obtain detection results corresponding to the N tags to be detected includes:

and in all AI algorithms, detecting the video data to be processed according to a preset sequence by the AI algorithm corresponding to the corresponding label to be detected in the N labels to be detected, so as to obtain a detection result corresponding to each label to be detected.

With reference to the first aspect, in some optional embodiments, the method further comprises:

fusing the detection result corresponding to each label to be detected through a preset fusion algorithm to obtain a fused detection result;

and outputting the fused detection result to a user terminal.

In a second aspect, the present application further provides an algorithm bin creating method, including:

acquiring multi-class AI algorithms of a plurality of manufacturers;

In a third aspect, the present application further provides a video processing apparatus, including:

the task processing device comprises a task obtaining unit, a task processing unit and a task processing unit, wherein the task to be processed comprises video data to be processed and N preset labels to be detected, the N labels to be detected represent item types which are expected by a user and are used for detecting the video data to be processed, and N is an integer which is greater than or equal to 1;

the determining unit is used for determining all AI algorithms corresponding to the N labels to be detected from a preset algorithm bin, wherein each AI algorithm in the preset algorithm bin is associated with at least one functional label, and each functional label corresponds to an item type to be detected;

and the detection unit is used for detecting the video data to be processed through all AI algorithms to obtain detection results corresponding to the N labels to be detected.

In a fourth aspect, the present application further provides an algorithm bin creating apparatus, including:

the algorithm obtaining unit is used for obtaining multi-class AI algorithms of a plurality of manufacturers;

and the label adding unit is used for adding at least one corresponding functional label to each type of AI algorithm in the multiple types of AI algorithms to form a preset algorithm bin, wherein the functional label comprises at least one label of representation target identification, action identification and applicable scene types.

In a fifth aspect, the present application further provides a server comprising a processor and a memory coupled to each other, the memory storing a computer program which, when executed by the processor, causes the server to perform the video processing method described above or to perform the algorithm bin creation method described above.

In a sixth aspect, the present application further provides a computer-readable storage medium having stored therein a computer program, which, when run on a computer, causes the computer to execute the above-mentioned video processing method or the above-mentioned algorithm bin creation method.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic diagram of a communication connection between a user terminal and a server according to an embodiment of the present application.

Fig. 2 is a schematic flowchart of a video processing method according to an embodiment of the present application.

Fig. 3 is a functional diagram of the algorithm arrangement provided in the present application.

Fig. 4 is a block diagram of a video processing apparatus according to an embodiment of the present application.

Fig. 5 is a block diagram of an algorithm bin creating apparatus according to an embodiment of the present application.

Icon: 10-a detection system; 20-a server; 30-a user terminal; 200-video processing means; 210-a task obtaining unit; 220-a determination unit; 230-a detection unit; 400-algorithm bin creation means; 410-an algorithm obtaining unit; 420-tag addition unit.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that the terms "first," "second," and the like are used merely to distinguish one description from another, and are not intended to indicate or imply relative importance. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, the present application provides a detection system 10 that can establish communication with a user terminal 30 for data interaction. Wherein the inspection system 10 can perform a variety of visual processing tasks. In the detection system 10, one or more servers may be included. In performing a visual processing task, a dispatch server in detection system 10 may select one or more servers 20 from a plurality of servers based on a load balancing algorithm to perform the visual processing task based on a corresponding AI (Artificial Intelligence) algorithm. The dispatch server may be one of a plurality of servers in the inspection system 10, or may be configured to perform vision processing tasks.

The application also provides a server 20, and the server 20 may include a processing module and a storage module. The storage module stores therein a computer program which, when executed by the processing module, enables the server 20 to perform the steps of the video processing method or algorithm bin creation method described below.

In this embodiment, the server 20 may be an executing device in the detection system 10 for executing corresponding vision processing tasks. The visual processing task can be flexibly determined according to actual conditions, and the visual processing task comprises but is not limited to face recognition, flame recognition, vehicle recognition, behavior recognition and the like.

Referring to fig. 2, the present application further provides a video processing method, which can be applied to the server 20, and each step in the method is executed or implemented by the server 20. The method may comprise the steps of:

step S110, acquiring a task to be processed, wherein the task to be processed comprises video data to be processed and N preset labels to be detected, the N labels to be detected represent item types expected by a user to detect the video data to be processed, and N is an integer greater than or equal to 1;

The individual steps in the process are explained in detail below, as follows:

prior to step S110, the method may include the step of creating an algorithm bin, for example, the method may include:

step S101, obtaining multi-class AI algorithms of a plurality of manufacturers;

step S102, adding at least one corresponding function label to each type of AI algorithm in the multiple types of AI algorithms to form the preset algorithm bin, wherein the function label comprises at least one label of representation target identification, action identification and applicable scene type.

In step S101, the manner in which the server obtains the multi-class AI algorithms may be flexibly determined, and is not particularly limited herein.

Illustratively, algorithm manufacturers can provide various types of AI algorithms for developers in a cooperative manner, so that the developers can acquire various types of AI algorithms of multiple manufacturers in advance and then store the various types of AI algorithms in the server. The type and number of the obtained AI algorithms may be flexibly determined according to actual situations, and are not specifically limited herein.

In step S102, the developer can arrange the AI algorithms according to the obtained multiple dimensions of each type of AI algorithm, such as functions, applicable scenes, and the like, to realize arrangement of multiple algorithms, thereby obtaining an arranged algorithm bin. For example, add a corresponding function label to the AI algorithm. The type of the function tag added to each type of AI algorithm can be flexibly determined according to the actual situation, and is not specifically limited herein.

Understandably, the user can perform self-defined management on the function tags, and can flexibly delete, change, add and the like the function tags of the AI algorithm. The AI algorithm is used to detect video data, or image data, or both. The type of the item detected by the AI algorithm can be flexibly determined according to actual situations, for example, the type of the detected item includes but is not limited to target recognition, action recognition, and the like. Functional tags include, but are not limited to, tags characterized for use in face parsing, vehicle parsing, event detection, environmental hygiene detection, garment specification detection, and the like.

Understandably, the target recognition may include, but is not limited to, object recognition, face recognition, pedestrian recognition, and the like. For example, the object recognition may be a detection video for shooting videos of various objects, such as a video or a picture of a shot animal, a plant, a living article, a product part, and the like.

In this embodiment, the arranged algorithm bins can provide a scheduling function of multiple algorithm services, satisfy the task issuing analysis of a video source or a picture source, and output rich structured data.

In step S110, the task to be processed may be a diversified task, and may be flexibly created by the user according to actual needs. That is, in one pending task, there may be a plurality of item types to be detected. For example, for the same video stream, the task to be processed may be detecting items such as detecting whether flames exist or not, detecting whether people exist or not, and the like in a scene of the monitored video.

The video data to be processed may be a surveillance video stream shot in real time or historical surveillance video data. That is, the server can obtain the real-time video stream and perform real-time detection. Alternatively, the server may retrieve historical video data for detection.

Referring to FIG. 3, in an algorithm bin, multiple classes of orchestration models (or orchestration types) may be stored. The arrangement model comprises a plurality of arrangement types such as video and picture AI analysis, video frame extraction and picture AI analysis, multi-video source AI analysis, multi-picture AI analysis, picture AI analysis and video verification, wherein the video frame extraction refers to the extraction of image frames from videos. The user may select a corresponding arrangement model for the video data to be processed, then create a task to be processed, and set a corresponding to-be-detected tag for the task to be processed, for example, the to-be-detected tag may include, but is not limited to, tags for area intrusion detection, flame detection, vehicle identification, pedestrian identification, face comparison, and the like. In addition, the user can also set the processing order of various algorithms, for example, pre-processing, post-processing or other configurations (such as intermediate processing). For example, the pre-algorithm is face recognition.

In step S120, the server may find the corresponding AI algorithm from the preset algorithm bin based on each tag to be tested in the task to be processed. Understandably, the various labels to be tested have mapping relations with the functional labels of the AI algorithm in advance, so that the server can index the corresponding AI algorithm through the various labels to be tested.

In step S130, the server may detect the item type that the user desires to detect by using all AI algorithms corresponding to the N tags to be detected of the task to be processed. When the detection is performed, the detection of the corresponding item may be performed by an AI algorithm selected by the user among all AI algorithms.

Step S130 may include: and in all AI algorithms, detecting the video data to be processed according to a preset sequence by the AI algorithm corresponding to the corresponding label to be detected in the N labels to be detected, so as to obtain a detection result corresponding to each label to be detected.

When a plurality of labels to be detected exist, a user can flexibly determine the detection sequence of each detection item as a preset sequence according to the actual situation. In the detection process, the server can detect the video data to be processed according to the preset sequence, so as to obtain the detection result of each tag to be detected.

For example, in a warehousing scenario, it is necessary to detect whether there is a fire in a warehouse where goods are stored, and whether there is a need for strangers to enter and exit the warehouse. The server can acquire the monitoring video uploaded by the warehouse camera in real time to serve as to-be-processed video data, then a user distributes two to-be-detected labels for the to-be-processed video data, the corresponding detection item types are 'face detection' and 'flame detection', and the 'flame detection' is in front, so that a to-be-processed task can be created. When the task to be processed is created, the server can index the AI algorithm for flame detection and the AI algorithm for face detection based on the tags to be detected of face detection and flame detection, and then detect whether flame exists or not and whether strangers exist in the monitoring video in sequence, so that a corresponding detection result is obtained.

Understandably, a tag under test can typically index into one or more AI algorithms. When one to-be-detected tag indexes multiple AI algorithms, a user can flexibly select one, part or all of the AI algorithms according to actual conditions to detect the item type corresponding to the to-be-detected tag.

For example, when the detection item of one to-be-detected label is "face detection", assuming that 5 classes of AI algorithms corresponding to the detection label exist, the user may select 3 classes of AI algorithms to perform face detection, thereby obtaining the recognition result of the three classes of AI algorithms for face detection.

As an optional implementation, the method may further include:

and outputting the fused detection result to the user terminal 30.

The preset fusion algorithm can be flexibly determined according to actual conditions, and the detection results can be subjected to duplicate removal, combination and the like. For example, for the same item type, if multiple detection results for the same item type are obtained by the multi-type AI algorithm, the detection results may be arbitrated to determine a final effective detection result. For example, when a class-3 AI algorithm is used to detect whether a stranger (such as a human face that is not pre-entered into the system) exists in the scene, an effective detection result can be obtained through a "majority obeys a minority" principle. For example, if there are 2 detection results indicating that a stranger exists in the scene and 1 detection result indicating that a stranger does not exist in the scene among 3 results obtained by the 3-type AI algorithm for the same frame of monitoring screen, the detection result indicating that a stranger exists in the scene may be used as the final result for the item.

In addition, the preset fusion algorithm can be directly combined according to the detection results of different project types. For example, the detection items are "face detection" and "flame detection" described above. If the detection result of the face detection indicates that strangers exist in the scene, and the detection result of the flame detection indicates that flames do not exist in the scene, the two detection results can be combined to obtain the detection result that strangers exist in the scene and flames do not exist in the scene.

In this embodiment, the developer can perform orchestration type management using a server. For example, according to the service scene and the AI parsing source, multiple layout types such as video stream framing + picture AI parsing, video stream AI parsing + picture AI parsing, multiple video stream AI parsing, multiple picture AI parsing, picture AI parsing + video verification can be supported.

Illustratively, the type of choreography and corresponding business scenario may be as follows:

1) multi-picture AI parsing, video stream decimation and multi-picture AI parsing

For example, bright kitchen light (kitchen violation detection): mouse detection (AI algorithm of manufacturer A) + chef cap taking detection (AI algorithm of manufacturer B) + glove detection (AI algorithm of manufacturer C) + mask detection (AI algorithm of manufacturer D);

2) multi-video streaming AI parsing

For example, scenic spot passenger flow statistics: video traffic flow statistics (manufacturer A AI algorithm) + video people flow statistics (manufacturer B AI algorithm);

3) video stream AI parsing + picture AI parsing

The scene graph or the small graph overlay picture AI analyzed from the video stream is analyzed, for example, an intelligent family: video face snapshot (factory A AI algorithm) + face comparison (factory B AI algorithm);

4) picture AI parsing + video verification

The method is suitable for algorithm scenes with low probability of detection content occurrence and high accuracy requirement, such as flame recognition, tumble recognition, fighting recognition and the like, for example, intelligent factory areas (factory area danger detection): picture flame detection (manufacturer a AI algorithm) + video flame detection (manufacturer B AI algorithm).

When the same detection item is detected by using algorithms of different manufacturers, the detection result can be combined and screened by using a preset fusion algorithm, and then packaged and output to the user terminal 30.

In this embodiment, the AI algorithm may set a corresponding function label based on a function dimension and a scene dimension, where the algorithm dimension may include, but is not limited to:

face: face recognition and face comparison;

vehicle: vehicle identification and vehicle comparison;

behavior: smoking, putting on a shelf, falling down, making a call, playing a mobile phone, and the like;

environment: garbage stacking, ground water accumulation, illegal lane occupation and the like;

object of the article: mouse inspection and cutter detection;

general purpose: video frame extraction and product quality diagnosis.

Scene dimensions may include, but are not limited to:

open far scene: the camera is erected highly, the field is large, for example, expressway, scenic spot, etc.;

narrow and close scenes: the camera is erected lowly, and close-range shooting is carried out, for example, in an elevator, a car, a room, an entrance guard gate and the like.

General purpose: shops, districts, parks, etc.

Developers can configure various algorithm definition tags for various AI algorithms managed by the algorithm bin by combining the information of the functional tags, so that the algorithms have scene attributes. The server can realize the capability arrangement function for different algorithms according to the arrangement type and the function label of the AI algorithm, support the superposition analysis of various AI capabilities, and improve diversified machine vision processing Service for users based on SaaS (Software-as-a-Service) application.

Referring to fig. 4, an embodiment of the present application further provides a video processing apparatus 200, which can be applied to the server described above for executing the steps of the method. The video processing apparatus 200 includes at least one software functional module which can be stored in a memory module in the form of software or Firmware (Firmware) or solidified in a server Operating System (OS). The processing module is used for executing executable modules stored in the storage module, such as software functional modules and computer programs included in the video processing apparatus 200.

The video processing apparatus 200 may include a task obtaining unit 210, a determining unit 220, and a detecting unit 230, and each unit may have the following functions:

the task obtaining unit 210 is configured to obtain a to-be-processed task, where the to-be-processed task includes to-be-processed video data and N preset to-be-detected tags, where the N to-be-detected tags represent a project type that a user desires to detect the to-be-processed video data, and N is an integer greater than or equal to 1;

a determining unit 220, configured to determine all AI algorithms corresponding to the N tags to be detected from a preset algorithm bin, where each AI algorithm in the preset algorithm bin is associated with at least one function tag, and each function tag corresponds to an item type to be detected;

the detecting unit 230 is configured to detect the video data to be processed through all the AI algorithms, and obtain detection results corresponding to the N tags to be detected.

Optionally, the video processing apparatus 200 may further include an algorithm obtaining unit and a tag adding unit. Before the task obtaining unit 210 obtains the task to be processed, the algorithm obtaining unit is configured to obtain multiple types of AI algorithms of multiple manufacturers; the label adding unit is used for adding at least one corresponding functional label to each type of AI algorithm in the multiple types of AI algorithms to form the preset algorithm bin, wherein the functional label comprises at least one label of representation target identification, action identification and applicable scene types.

Optionally, the detection unit 230 may be further configured to: and in all AI algorithms, detecting the video data to be processed according to a preset sequence by the AI algorithm corresponding to the corresponding label to be detected in the N labels to be detected, so as to obtain a detection result corresponding to each label to be detected.

Optionally, the video processing apparatus 200 may further include a fusion unit and a transmission unit. The fusion unit is used for fusing the detection result corresponding to each label to be detected through a preset fusion algorithm to obtain a fused detection result; the sending unit is configured to output the fused detection result to the user terminal 30.

The embodiment of the present application further provides an algorithm bin creating method, which can be applied to the server, where the server executes or implements each step in the method, and the method may include:

step S210, obtaining multi-class AI algorithms of a plurality of manufacturers;

step S220, adding at least one corresponding function tag to each type of AI algorithm in the multiple types of AI algorithms to form a preset algorithm bin, wherein the function tag includes at least one tag representing target identification, action identification, and applicable scene type.

Understandably, the specific implementation process of the algorithm bin creating method can be referred to the above description of step S101 and step S102, and is not described herein again.

Referring to fig. 5, an algorithm bin creating apparatus 400 is further provided in the embodiment of the present application, which can be applied to the above-mentioned server for executing the steps in the method. The algorithm bin creating device 400 comprises at least one software functional module which can be stored in a storage module in the form of software or Firmware (Firmware) or solidified in a server Operating System (OS). The processing module is used for executing executable modules stored in the storage module, such as software functional modules and computer programs included in the algorithm bin creating device 400.

The algorithm bin creating device 400 may include an algorithm obtaining unit 410 and a label adding unit 420, and each unit may have the following functional functions:

an algorithm obtaining unit 410, configured to obtain multiple types of AI algorithms of multiple manufacturers;

a tag adding unit 420, configured to add at least one corresponding function tag to each type of AI algorithm in the multiple types of AI algorithms to form a preset algorithm bin, where the function tag includes at least one tag of representation target identification, action identification, and applicable scene type.

It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the server, the video processing apparatus 200, and the algorithm bin creating apparatus 400 described above may refer to the corresponding processes of the steps in the foregoing method, and will not be described in detail herein.

In this embodiment, the processing module may be an integrated circuit chip having signal processing capability. The processing module may be a general purpose processor. For example, the Processor may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Network Processor (NP), or the like; the method, the steps and the logic block diagram disclosed in the embodiments of the present Application may also be implemented or executed by a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The memory module may be, but is not limited to, a random access memory, a read only memory, a programmable read only memory, an erasable programmable read only memory, an electrically erasable programmable read only memory, and the like. In this embodiment, the storage module may be used to store an algorithm bin, a task to be processed, and the like. Of course, the storage module may also be used to store a program, and the processing module executes the program after receiving the execution instruction.

The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to execute a video processing method as described in the above embodiments, or to execute an algorithm bin creation method.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by hardware, or by software plus a necessary general hardware platform, and based on such understanding, the technical solution of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions to enable a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments of the present application.

In summary, in the present solution, since multiple types of arranged AI algorithms are pre-stored in the preset algorithm bin, when diversified tasks are to be executed, the corresponding AI algorithm can be selected from the algorithm bin based on the tasks and the corresponding tags to be detected, and then the selected AI algorithm detects the tasks, so that detection processing of the diversified tasks can be implemented, and diversified detection services can be flexibly provided for users.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus, system, and method may be implemented in other ways. The apparatus, system, and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of video processing, the method comprising:

2. The method of claim 1, wherein prior to obtaining the pending task, the method further comprises:

acquiring multi-class AI algorithms of a plurality of manufacturers;

and adding at least one corresponding functional label to each type of AI algorithm in the multiple types of AI algorithms to form the preset algorithm bin, wherein the functional label comprises at least one label of representation target identification, action identification and applicable scene types.

3. The method of claim 2, wherein the target recognition comprises at least one of object recognition, face recognition, and pedestrian recognition, and wherein the AI algorithm is used to detect video data and/or image data.

4. The method according to claim 1, wherein detecting the video data to be processed through all AI algorithms to obtain detection results corresponding to the N tags to be detected comprises:

5. The method of claim 4, further comprising:

and outputting the fused detection result to a user terminal.

6. A method of algorithm bin creation, the method comprising:

acquiring multi-class AI algorithms of a plurality of manufacturers;

7. A video processing apparatus, characterized in that the apparatus comprises:

8. An algorithm bin creation apparatus, the apparatus comprising:

9. A server, characterized in that the server comprises a processor and a memory coupled to each other, in which memory a computer program is stored, which computer program, when executed by the processor, causes the server to carry out the method according to any one of claims 1-5 or to carry out the method according to claim 6.

10. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to perform the method of any one of claims 1-5, or to perform the method of claim 6.