CN112529026A

CN112529026A - Method for providing AI model, AI platform, computing device and storage medium

Info

Publication number: CN112529026A
Application number: CN201910878323.3A
Authority: CN
Inventors: 杨洁; 黄嘉伟; 孙井花; 陈轶; 李鹏飞; 白小龙
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2021-03-19
Anticipated expiration: 2039-09-17
Also published as: CN117893845A; CN112529026B; WO2021051918A1

Abstract

The application provides a method for providing an artificial intelligence AI model, an AI platform, a computing device and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: the method comprises the steps that an AI platform receives a plurality of unmarked images of a first user, the first user is an entity registering an account number in the AI platform, and the AI platform marks the plurality of images according to an initial AI model; and the AI platform determines the difficult cases in the plurality of images according to the labeling result, and trains an initial AI model by using the difficult cases to obtain an optimized AI model. By adopting the method and the device, the initial AI model is trained difficultly on the AI platform, so that the reasoning capability of the AI model provided by the AI platform is stronger.

Description

Method for providing AI model, AI platform, computing device and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method for providing an AI model, an AI platform, a computing device, and a storage medium.

Background

The AI model acquisition process generally trains the AI model based on training data to obtain a final AI model. Since the initial AI model is trained based on training data only and the AI model is not optimized, this results in a low inference capability of the AI model.

Disclosure of Invention

The application provides a method for providing an artificial intelligence AI model, which can provide an AI model with stronger reasoning capability for a developer who registers an account number on an AI platform.

In a first aspect, the present application provides a method of providing an artificial intelligence AI model, the method comprising:

the method comprises the steps that an AI platform receives a plurality of unmarked images of a first user, wherein the first user is an entity registering an account number in the AI platform; the AI platform marks the plurality of images according to the initial AI model; the AI platform determines the difficult cases in the plurality of images according to the labeling result; the AI platform utilizes the difficult case to train the initial AI model to obtain an optimized AI model.

By the method, the AI platform can provide the first user (such as an AI model developer) registered on the platform with the optimized AI model with stronger reasoning capability, so that the first user can conveniently and quickly obtain the optimized AI model, and the time and labor input are saved.

In a possible implementation manner, the determining, by the AI platform, the difficult cases in the plurality of images according to the labeling result includes: the AI platform provides a confirmation interface for the first user, and displays a candidate difficulty case to the first user in the confirmation interface, wherein the candidate difficulty case is at least one image in the plurality of images; and the AI platform determines the difficult cases in the candidate difficult cases according to the operation of the first user on the confirmation interface. The AI platform obtains the difficult cases confirmed by the first user through interaction with the first user, so that the accuracy of the difficult cases is improved, and the reasoning capability of the optimized AI model trained through the determined difficult cases is further improved.

In one possible implementation, the method further includes: the AI platform receives a correction mark of the first user on the difficult case; the AI platform training the initial AI model to obtain an optimized AI model using the difficult case comprises: the AI platform trains the initial AI model using the difficult cases and corresponding correction labels to obtain the optimized AI model. The AI platform obtains the correction label of the first user to the difficult case through the interaction with the first user and is used for training the initial AI model, and the reasoning capability of the trained optimized AI model is further improved.

In one possible implementation, the method further includes: the method comprises the steps that an AI platform obtains one or more images with labels from a first user; the AI platform obtains an initial AI model using the annotated one or more images.

In one possible implementation, the method further includes: the AI platform providing the optimized AI model to a device of a second user to cause the device to perform a task objective with the optimized AI model; or, the AI platform receives the inference image sent by the second user's equipment, utilizes the optimized AI model to infer the inference image, and provides an inference result to the second user's equipment. The method provides two methods of sending the optimized AI model to the equipment of the second user or providing reasoning service to the user by utilizing the optimized AI model on line, so that the optimized AI model can be conveniently used for reasoning and can also adapt to different task targets.

In one possible implementation, the annotating, by the AI platform, the plurality of unlabeled images according to the initial AI model includes: the AI platform provides a label selection interface for the first user, wherein the label selection interface comprises at least one label mode which can be selected by the first user; and the AI platform receives the marking mode selected by the first user and marks the unmarked images according to the initial AI model corresponding to the marking mode selected by the first user. According to the method, different annotation selection modes are provided for the first user, so that the first user can determine which annotation mode to select according to the image to be uploaded to the AI platform, and the flexibility of the AI platform in responding to various users or various scenes is improved.

In one possible implementation, the tagging, by the AI platform, the plurality of images according to an initial AI model includes: classifying the plurality of images according to the initial AI model and/or performing object detection on the plurality of images according to the initial AI model.

In a second aspect, the present application further provides an artificial intelligence AI platform, the AI platform including: the user input/output I/O module is used for receiving a plurality of unlabelled images of a first user, wherein the first user is an entity for registering an account number in the AI platform; the data preprocessing module is used for marking the plurality of images according to the initial AI model; the difficult example mining module is used for determining difficult examples in the plurality of images according to the labeling result; and the model training module is used for training the initial AI model by utilizing the difficult cases to obtain an optimized AI model.

In a possible implementation manner, the user I/O module is further configured to provide a confirmation interface to the first user, and present a candidate difficulty case to the first user in the confirmation interface, where the candidate difficulty case is at least one image in the plurality of images; the difficult example mining module is further used for determining a difficult example in the candidate difficult examples according to the operation of the first user on the confirmation interface.

In a possible implementation manner, the user I/O module is further configured to receive a correction label of the user on the difficult case; the model training module is specifically configured to train the initial AI model using the difficult cases and the corresponding correction labels to obtain the optimized AI model.

In a possible implementation manner, the user I/O module is further configured to obtain one or more images with labels from the first user; the model training module is further configured to obtain the initial AI model using the one or more images with labels.

In a possible implementation, the user I/O module is further configured to provide the optimized AI model to a device of a second user, so that the device performs a task objective with the optimized AI model; or, the AI platform further includes an inference module, the user I/O module is further configured to receive an inference image sent by the device of the second user; the reasoning module is used for reasoning the reasoning image by utilizing the optimized AI model; the user I/O module is further configured to provide the inference result to the device of the second user.

In a possible implementation manner, the user I/O module is further configured to provide a label selection interface to the first user, where the label selection interface includes at least one label manner selectable by the first user; the user I/O module is also used for receiving the marking mode selected by the first user; the data preprocessing module is specifically configured to label the multiple unlabeled images according to the initial AI model corresponding to the labeling mode selected by the first user.

In a possible implementation, the data preprocessing module is specifically configured to classify the plurality of images according to the initial AI model and/or perform object detection on the plurality of images according to the initial AI model.

In a third aspect, the present application further provides a method for optimizing an artificial intelligence AI model, where the method includes: training the initial AI model according to the training image set to obtain an optimized AI model; receiving a reasoning image set, and reasoning each reasoning image in the reasoning image set according to the optimized AI model to obtain a reasoning result; determining a difficult case in the inference image set according to the inference result, wherein the difficult case indicates an inference image of which the error rate of the inference result obtained by the inference through the optimized AI model is higher than a target threshold; and training the optimized AI model according to the difficult case to obtain a re-optimized AI model. The method determines the difficult cases according to the reasoning result, and retrains the optimized AI model by using the difficult cases, so that the obtained re-optimized AI model has stronger reasoning capability.

In a possible implementation manner, the determining, according to the inference result, a difficult case in the inference image set specifically includes: determining the inference image set as a video segment; determining difficult cases in the inference image set according to the inference result of each image in the inference image set; or determining the inference image set as a non-video segment, and determining the difficult cases in the inference image set according to the inference result of each image in the inference image set and the training image set. The method determines the difficult cases by using different difficult case determination modes according to the types of the inference image sets, fully considers the characteristics of the inference image sets, improves the accuracy of the determined difficult cases, and further improves the inference capability of the re-optimization AI model.

In a possible implementation manner, the determining, according to the inference result of each image in the inference image set, a difficult case in the inference image set includes: determining a target image in the inference image set, wherein the inference result of the target image is different from the inference result of an adjacent image of the target image in the video segment; and determining the target image as an inexplicable case in the inference image.

In a possible implementation manner, the determining, according to the inference result of each image in the inference image set and the training image set, a difficult case in the inference image set includes: obtaining confidence degrees of the images in the inference image set under each category, and determining a first difficult case value of each image in the inference image set according to the highest two confidence degrees of each image in the inference image set; acquiring surface layer feature distribution information of the images in the training image set, and determining a second difficultly-formulated value of each image in the inference image set according to the surface layer feature distribution information and the surface layer features of each image in the inference image set; acquiring deep features of the images in the training image set and deep features of the images in the inference image set, and clustering the images in the training image set according to the deep features of the images in the training image set to obtain an image clustering result; determining a third difficult example value of each image in the inference image set according to the deep features of each image in the inference image set, the image clustering result and the inference result of each image in the inference image set; determining a target difficult example value of each image in the inference image set according to one or more of the first difficult example value, the second difficult example value and the third difficult example value; and determining a first number of images with the maximum target difficulty case value in the inference image set as difficulty cases in the inference image set.

In a possible implementation manner, the determining, according to the inference result of each image in the inference image set, a difficult case in the inference image set includes: for a first target frame of a first image in the inference image set, judging whether a similar frame corresponding to the first target frame exists in images, in the video segment, of which the time sequence interval with the first image is less than or equal to a second number; if the similar frame corresponding to the first target frame does not exist, determining the first target frame as a difficult case frame; if a similar frame corresponding to the first target frame exists and a first image to which the first target frame belongs and a second image to which the similar frame belongs are not adjacent in the video clip, determining a difficult frame in an image between the first image and the second image according to the first target frame and the similar frame; and determining the difficult cases in the inference image set according to the number of the difficult case frames of each image in the inference image set.

In one possible implementation manner, the determining whether a similar frame corresponding to the first target frame exists in images of the video segment, the images having a time-series interval with the first image smaller than or equal to a second number, includes: determining a tracking frame with the highest similarity to a first target frame in images with the time sequence interval less than or equal to a second number from the first images in the video clip; determining the overlapping rate of the first target frame and each bounding box according to the tracking frame, all bounding boxes in the images of the video clip, the time sequence interval of which with the first image is less than or equal to the second number, and the first target frame; if the boundary frame with the overlapping rate larger than the second numerical value exists, determining the boundary frame with the overlapping rate larger than the second numerical value as a similar frame corresponding to the first target frame; if the boundary box with the overlapping rate larger than the second numerical value does not exist, determining that the similar box corresponding to the first target box does not exist.

In a possible implementation manner, the determining, according to the inference result of each image in the inference image set and the training image set, a difficult case in the inference image set includes: acquiring surface feature distribution information of the images in the training image set, and determining a fourth difficult example value of each image in the inference image set according to the surface feature distribution information of the images in the training image set and the surface features of the images in the inference image set, wherein the surface features comprise surface features of a boundary frame and surface features of the images; acquiring the deep features of each frame in each image in the training image set and the deep features of each frame in each image in the inference image set, and clustering each frame in each image in the training image set according to the deep features of each frame in each image in the training image set to obtain a frame clustering result; determining a fifth difficultly-sample value of each image in the inference image set according to the deep features of each frame in each image in the inference image set, the frame clustering result and the inference result of each frame in each image in the inference image set; determining a target difficulty value of each image of the inference image set according to one or more of the fourth difficulty value and the fifth difficulty value; and determining a first number of images with the maximum target difficulty case value in the inference image set as difficulty cases in the inference image set.

In a fourth aspect, the present application further provides an artificial intelligence AI platform, including: the model training module is used for training the initial AI model according to the training image set to obtain an optimized AI model; the reasoning module is used for receiving the reasoning image set, reasoning each reasoning image in the reasoning image set according to the optimized AI model and obtaining a reasoning result; the difficult example mining module is used for determining difficult examples in the inference image set according to the inference result, wherein the difficult examples indicate inference images of which the error rate of the inference result obtained by inference through the optimized AI model is higher than a target threshold value; the model training module is further used for training the optimized AI model according to the difficult case to obtain a re-optimized AI model.

In a possible implementation manner, the difficult-to-sample mining module is specifically configured to: determining the inference image set as a video segment; determining difficult cases in the inference image set according to the inference result of each image in the inference image set; or determining the inference image set as a non-video segment, and determining the difficult cases in the inference image set according to the inference result of each image in the inference image set and the training image set.

In a possible implementation manner, the difficult-to-sample mining module is specifically configured to: determining a target image in the inference image set, wherein the inference result of the target image is different from the inference result of an adjacent image of the target image in the video segment; and determining the target image as an inexplicable case in the inference image.

In a possible implementation manner, the difficult-to-sample mining module is specifically configured to: obtaining confidence degrees of the images in the inference image set under each category, and determining a first difficult case value of each image in the inference image set according to the highest two confidence degrees of each image in the inference image set; acquiring surface layer feature distribution information of the images in the training image set, and determining a second difficultly-formulated value of each image in the inference image set according to the surface layer feature distribution information and the surface layer features of each image in the inference image set; acquiring deep features of the images in the training image set and deep features of the images in the inference image set, and clustering the images in the training image set according to the deep features of the images in the training image set to obtain an image clustering result; determining a third difficult example value of each image in the inference image set according to the deep features of each image in the inference image set, the image clustering result and the inference result of each image in the inference image set; determining a target difficult example value of each image in the inference image set according to one or more of the first difficult example value, the second difficult example value and the third difficult example value; and determining a first number of images with the maximum target difficulty case value in the inference image set as difficulty cases in the inference image set.

In a possible implementation manner, the difficult-to-sample mining module is specifically configured to: for a first target frame of a first image in the inference image set, judging whether a similar frame corresponding to the first target frame exists in images, in the video segment, of which the time sequence interval with the first image is less than or equal to a second number; if the similar frame corresponding to the first target frame does not exist, determining the first target frame as a difficult case frame; if a similar frame corresponding to the first target frame exists and a first image to which the first target frame belongs and a second image to which the similar frame belongs are not adjacent in the video clip, determining a difficult frame in an image between the first image and the second image according to the first target frame and the similar frame; and determining the difficult cases in the inference image set according to the number of the difficult case frames of each image in the inference image set.

In a possible implementation manner, the difficult-to-sample mining module is specifically configured to: determining a tracking frame with the highest similarity to a first target frame in images with the time sequence interval less than or equal to a second number from the first images in the video clip; determining the overlapping rate of the first target frame and each bounding box according to the tracking frame, all bounding boxes in the images of the video clip, the time sequence interval of which with the first image is less than or equal to the second number, and the first target frame; if the boundary frame with the overlapping rate larger than the second numerical value exists, determining the boundary frame with the overlapping rate larger than the second numerical value as a similar frame corresponding to the first target frame; if the boundary box with the overlapping rate larger than the second numerical value does not exist, determining that the similar box corresponding to the first target box does not exist.

In a possible implementation manner, the difficult-to-sample mining module is specifically configured to: acquiring surface feature distribution information of the images in the training image set, and determining a fourth difficult example value of each image in the inference image set according to the surface feature distribution information of the images in the training image set and the surface features of the images in the inference image set, wherein the surface features comprise surface features of a boundary frame and surface features of the images; acquiring the deep features of each frame in each image in the training image set and the deep features of each frame in each image in the inference image set, and clustering each frame in each image in the training image set according to the deep features of each frame in each image in the training image set to obtain a frame clustering result; determining a fifth difficultly-sample value of each image in the inference image set according to the deep features of each frame in each image in the inference image set, the frame clustering result and the inference result of each frame in each image in the inference image set; determining a target difficulty value of each image of the inference image set according to one or more of the fourth difficulty value and the fifth difficulty value; and determining a first number of images with the maximum target difficulty case value in the inference image set as difficulty cases in the inference image set.

In a fifth aspect, the present application further provides a computing device comprising a memory for storing a set of computer instructions and a processor; the processor executes a set of computer instructions stored by the memory to cause the computing device to perform the method provided by the first aspect or any one of the possible implementations of the first aspect.

In a sixth aspect, the present application provides a computer-readable storage medium storing computer program code, which, when executed by a computing device, performs the method provided in the foregoing first aspect or any one of the possible implementations of the first aspect. The storage medium includes, but is not limited to, volatile memory such as random access memory, non-volatile memory such as flash memory, hard disk (HDD), Solid State Disk (SSD).

In a seventh aspect, the present application provides a computer program product comprising computer program code which, when executed by a computing device, performs the method provided in the foregoing first aspect or any possible implementation manner of the first aspect. The computer program product may be a software installation package, which may be downloaded and executed on a computing device in case it is desired to use the method as provided in the first aspect or any possible implementation manner of the first aspect.

In an eighth aspect, the present application further provides a computing device comprising a memory for storing a set of computer instructions and a processor; the processor executes a set of computer instructions stored by the memory to cause the computing device to perform the method provided by the third aspect or any one of the possible implementations of the third aspect.

In a ninth aspect, the present application provides a computer-readable storage medium storing computer program code, which, when executed by a computing device, performs the method provided in any one of the possible implementations of the third or fourth aspect. The storage medium includes, but is not limited to, volatile memory such as random access memory, non-volatile memory such as flash memory, hard disk (HDD), Solid State Disk (SSD).

In a tenth aspect, the present application provides a computer program product comprising computer program code which, when executed by a computing device, performs the method provided in the aforementioned third aspect or any possible implementation manner of the third aspect. The computer program product may be a software installation package, which may be downloaded and executed on a computing device in case it is desired to use the method as provided in the third aspect or any possible implementation form of the third aspect.

Drawings

Fig. 1 is a schematic structural diagram of an AI platform 100 according to an embodiment of the present disclosure;

fig. 2 is a schematic view of an application scenario of an AI platform 100 provided in the present application;

fig. 3 is a schematic deployment diagram of an AI platform 100 according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a computing device 400 for deploying the AI platform 100 according to an embodiment of the present disclosure;

fig. 5 is a schematic flow chart illustrating an AI model provided in an embodiment of the present application;

fig. 6 is a schematic diagram of an upload interface of data provided in an embodiment of the present application;

fig. 7 is a schematic diagram of an interface for initiating an intelligent annotation according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a data annotation interface according to an embodiment of the present disclosure;

fig. 9 is a schematic diagram of a process of reasoning using an optimized AI model according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an interface for initiating hard case mining according to an embodiment of the present disclosure;

FIG. 11 is a schematic flow chart diagram illustrating another method for determining difficulty in an embodiment of the present application;

FIG. 12 is a schematic illustration of a profile of a surface feature provided in an embodiment of the present application;

FIG. 13 is a schematic diagram of determining a hard case value according to an embodiment of the present disclosure;

FIG. 14 is a schematic flow chart diagram illustrating another method for determining difficulty in an embodiment of the present application;

FIG. 15 is a schematic flow chart diagram illustrating another method for determining difficulty in an embodiment of the present application;

FIG. 16 is a schematic diagram of determining a hard case value according to an embodiment of the present disclosure;

fig. 17 is a schematic flowchart of an AI model optimization according to an embodiment of the present disclosure;

fig. 18 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

At present, artificial intelligence is hot and continuous, machine learning is a core means for realizing AI, and the machine learning permeates into various industries such as medicine, traffic, education, finance and the like. Not only professional technicians, but also non-AI technical specialties of various industries expect to use AI and machine learning to complete specific tasks.

In order to facilitate understanding of the technical solutions and embodiments provided by the present application, concepts of an AI model, training of the AI model, difficult examples, difficult example mining, an AI platform, and the like are explained in detail below:

the model of the AI model is used,the method is a mathematical algorithm model for solving practical problems by using a machine learning idea, the AI model includes a large number of parameters and calculation formulas (or calculation rules), the parameters in the AI model are values obtained by training the AI model through a training image set, and the parameters of the AI model are weights of the calculation formulas or calculation factors in the AI model, for example. The AI model also comprises a plurality of hyper (hyper) parameters, the hyper parameters are parameters which can not be obtained by training the AI model through a training image set, the hyper parameters can be used for guiding the construction of the AI model or the training of the AI model, and the hyper parameters are various. For example, the number of iterations (iteration) of AI model training, learning rate (learning rate), batch size (batch size), number of layers of AI model, number of neurons per layer. In other words, the hyper-parameters of the AI model differ from the parameters in that: the values of the hyper-parameters of the AI model cannot be obtained by analyzing the training images in the training image set, and the values of the parameters of the AI model can be modified and determined by analyzing the training images in the training image set during the training process.

The AI models are various, and a widely used class of AI models are neural network models, which are mathematical algorithm models simulating the structure and function of a biological neural network (the central nervous system of an animal). A neural network model may include a number of different functional neural network layers, each layer including parameters and computational formulas. Different layers in the neural network model have different names according to different calculation formulas or different functions. For example, the layers that perform convolution calculations are called convolutional layers, which are often used to perform feature extraction on an input signal (e.g., an image). One neural network model may also be composed of a combination of a plurality of existing neural network models. Neural network models of different structures may be used in different scenarios (e.g., classification, recognition, etc.) or provide different effects when used in the same scenario. The neural network model structure specifically includes one or more of the following: the neural network model has different network layers, different sequences of the network layers, and different weights, parameters or calculation formulas in each network layer. There are many different neural network models with higher accuracy for identifying or classifying application scenarios in the industry, wherein some neural network models can be trained by a specific training image set and then used alone or in combination with other neural network models (or other functional modules) to complete a task.

Most AI models, except for neural network models, need to be trained before they can be used to complete a task.

The AI model is trained and,the method is characterized in that the parameters in the AI model are determined by utilizing the rule that the existing image is fitted by the AI model through a certain method. Training an AI model requires preparing a training image set, and the training of the AI model can be divided into supervised training (i.e., unsupervised training) and unsupervised training (i.e., unsupervised training) according to whether the training images in the training image set have labels (i.e., whether the images have a specific type or name). When the AI model is supervised trained, the training images in the training image set for training are provided with labels (labels). When training the AI model, the training images in the training image set are used as the input of the AI model, the labels corresponding to the training images are used as the reference of the output value of the AI model, the loss function (loss) is utilized to calculate the loss (loss) value of the output value of the AI model and the labels corresponding to the training images, and the parameters in the AI model are adjusted according to the loss value. And (3) training the AI model by using each training image in the training image set in an iterative manner, and continuously adjusting the parameters of the AI model until the AI model can output the output value which is the same as the label corresponding to the training image with high accuracy according to the input training image. If the AI model is subjected to unsupervised training, the training images in the image set for training are not labeled, and the training images in the image set are concentratedThe training images are sequentially input to the AI model, and associations and potential rules between the training images in the training image set are gradually identified by the AI model until the AI model can be used to judge or identify the type or features of the input images. For example, clustering, in which an AI model for clustering receives a large number of training images, features of the respective training images and associations and differences between the training images can be learned, and the training images can be automatically classified into a plurality of types. Different task types can adopt different AI models, some AI models can be trained only in a supervised learning mode, some AI models can be trained only in an unsupervised learning mode, and other AI models can be trained in both a supervised learning mode and an unsupervised learning mode. The trained AI model may be used to perform a particular task. Generally speaking, the AI model in machine learning needs to be trained in a supervised learning manner, and the AI model is trained in the supervised learning manner, so that the AI model can learn the association between training images in a training image set and corresponding labels more specifically in a training image set with labels, and the accuracy of the trained AI model for predicting other input inference images is high.

An example of training a neural network model for an image classification task in a supervised learning manner is as follows: in order to train a neural network model for completing an image classification task, firstly, images are collected according to the task, a training image set is constructed, and the constructed training image set comprises 3 types of images which are respectively as follows: apple, pear, banana, the training image of gathering is deposited in 3 folders according to the type respectively, and the folder name is the mark of all images in this folder. After the training image set is constructed, selecting a neural network model (such as a Convolutional Neural Network (CNN)) capable of realizing image classification, inputting the training images in the training image set into the CNN, performing feature extraction and feature classification on images by convolutional cores of each layer in the CNN, finally outputting confidence (confidence) that the images belong to each type, calculating a loss value by using a loss function according to the confidence and labels corresponding to the images, and updating parameters of each layer in the CNN according to the loss value and the CNN structure. The training process continues until the loss value output by the loss function converges or all the images in the training image set are used for training, and the training ends.

Loss functionIs a function that measures how well the AI model is trained (i.e., to compute the difference between the outcome of the AI model prediction and the true goal). In the process of training the AI model, because it is desirable that the output of the AI model is as close as possible to the value that is really expected to be predicted, the parameters in the AI model can be updated by comparing the predicted value of the current AI model according to the input image with the really expected target value (i.e. the label of the input image) and then according to the difference between the predicted value and the really expected target value (of course, there is usually an initialization process before the first update, that is, the initial values are configured in advance for the parameters in the AI model). And judging the difference between the current predicted value of the AI model and the real target value through a loss function every time of training, updating the parameters of the AI model until the AI model can predict the real desired target value or a value which is very close to the real desired target value, and considering that the AI model is trained.

After the AI model is trained, the trained AI model can be used to reason the image to obtain an inference result. The specific reasoning process is as follows: in a scene of image classification, an image is input into an AI model, feature extraction is performed on the image by convolution kernel of each layer in the AI model, and a category to which the image belongs is output based on the extracted features. In a scene of object detection (which may also be referred to as object detection), an image is input to an AI model, feature extraction is performed on the image by convolution kernels of respective layers in the AI model, and the position and the category of a bounding box of each object included in the image are output based on the extracted features. When a scene of image classification and target detection is covered, an image is input into an AI model, feature extraction is carried out on the image by convolution kernels of layers in the AI model, and the category to which the image belongs and the position and the category of a boundary box of each target included in the image are output based on the extracted features. It should be noted here that, for the AI models, some AI models have stronger inference capability, and some AI models have weaker inference capability. The strong reasoning ability of the AI model means that the accuracy of the reasoning result is greater than or equal to a certain value when the AI model is used for reasoning the image. The weak reasoning ability of the AI model means that the accuracy of the reasoning result is lower than the certain value when the AI model is used for reasoning the image.

Difficult to do (hard example),in the process of training the initial AI model or reasoning the trained AI model, the output result of the initial AI model or the trained AI model is the input data of the corresponding model when the error or the error rate is high. For example, in the training process of the AI model, in the process of labeling the unlabeled image, the image with the error rate of the labeling result higher than the target threshold is difficult to be exemplified. In the inference process of the AI model, it is difficult to infer the image with the error rate of the inference result output by the AI model in the image set higher than the target threshold.

Difficult excavationThis refers to a method for determining that an image is difficult to instantiate.

An AI platform, wherein the AI platform is used for carrying out the operation,the platform is a platform for providing a convenient AI development environment and a convenient development tool for AI developers and users. The AI platform is internally provided with various AI models or AI submodels for solving different problems, the AI platform can search and establish the applicable AI model according to the requirements of the user, the user only needs to determine the own requirements in the AI platform, and a training image set is prepared according to prompts to be uploaded to the AI platform, and the AI platform can train an AI model which can be used for realizing the requirements of the user for the user. Or, the user prepares the algorithm and the training image set according to the prompt and uploads the algorithm and the training image set to the AI platform, and the AI platform can train an AI model which can be used for realizing the user requirement based on the algorithm and the training image set of the user. The user can use the trained AI model to complete the specific task of the user.

If the AI platform obtains the AI model by using the traditional AI model training mode, the trained AI model has weak reasoning ability. The embodiment of the application provides an AI platform, wherein a difficult-to-case mining technology is introduced into the AI platform, so that the AI platform forms a closed-loop process of AI model construction, training, reasoning, difficult-to-case mining, retraining and retraining, and the accuracy of the AI model is improved (namely the reasoning capability of the AI model is improved) while the requirements of developers are met.

It should be noted that the AI model mentioned above is a generic term, and the AI model includes a deep learning model, a machine learning model, and the like.

Fig. 1 is a schematic structural diagram of the AI platform 100 in the embodiment of the present application, and it should be understood that fig. 1 is only a schematic structural diagram illustrating the AI platform 100 by way of example, and the present application does not limit the partitioning of the modules in the AI platform 100. As shown in fig. 1, the AI platform 100 includes an input/output (I/O) module 101, a difficult case mining module 102, a model training module 103, an inference module 104, and a data preprocessing module 105. Optionally, the AI platform may further include an AI model storage module 106 and a data storage module 107.

The functions of the various modules in the AI platform 100 are briefly described below:

user I/O module 101: for receiving a task goal input or selected by a user, receiving a training image set of a first user, receiving an inference image set sent by a device of a second user, and the like, wherein the training image set comprises a plurality of unlabeled images (which may be referred to as a plurality of unlabeled training images). The user I/O module 101 is further configured to receive a correction annotation of an awkward case by a first user, obtain one or more images with the annotation from the first user, provide an optimized AI model to a device of a second user, receive an inferred image sent by the device of the second user, and so on. As an example of the user I/O module 101, a Graphical User Interface (GUI) or a Command Line Interface (CLI) may be used. For example, the AI platform 100 displayed on the GUI may provide a variety of AI services to the user, (e.g., image classification services, object detection services, etc.). The user may select a task objective on the GUI, for example, the user selects an image classification service, the user may continue to upload a plurality of unlabeled images in the GUI of the AI platform, and so on. The GUI receives the task object and the plurality of unlabeled images and communicates with the model training module 103. The model training module 103 selects or searches for an AI model for the user that can be used to complete the construction of the user task objective according to the user-determined task objective. The user I/O module 101 is further configured to receive the difficult cases output by the difficult case mining module 102, and provide a GUI for the user to confirm the difficult cases.

Optionally, the user I/O module 101 may also be configured to receive user input of an effect expectation on the AI model to accomplish the task goal. For example, the accuracy of inputting or selecting the finally obtained AI model for face recognition is higher than 99%.

Optionally, the user I/O module 101 may also be used to receive AI models and the like input by the user. For example, a user may enter an initial AI model at the GUI based on their task goals.

Optionally, the user I/O module 101 may be further configured to receive user input of surface features and deep features of the inference image in the set of inference images. For the image classification scenario, the surface features include one or more of resolution of the image, aspect ratio of the image, mean and variance of red, green and blue (RGB) of the image, brightness of the image, saturation of the image, or sharpness of the image, and the deep features refer to abstract features of the image extracted using convolution kernels in a feature extraction model (e.g., CNN, etc.). For the target detection scene, the surface features include surface features of the bounding box and surface features of the image, the surface features of the bounding box may include one or more of an aspect ratio of each bounding box in the single-frame image, a proportion of an area of each bounding box in the single-frame image to an area of the image, a degree of marginalization of each bounding box in the single-frame image, a stack map of each bounding box in the single-frame image, a brightness of each bounding box in the single-frame image, or a blur of each bounding box in the single-frame image, and the surface features of the image may include one or more of a resolution of the image, an aspect ratio of the image, a mean and a variance of RGB of the image, a brightness of the image, a saturation or a sharpness of the image, a number of boxes in the single-frame image, or a variance of an area of boxes in the single-frame image. Deep features refer to abstract features of an image extracted using a convolution kernel in a feature extraction model (e.g., CNN, etc.).

Optionally, the user I/O module 101 may be further configured to provide a GUI for user annotation of training images in the training image set.

Optionally, the user I/O module 101 may also be used to provide various pre-built-in initial AI models for user selection. For example, a user may select an initial AI model on the GUI based on their task goals.

Optionally, the user I/O module 101 may be further configured to receive various configuration information of the user on the initial AI model, the training images in the training image set, and the like.

And the difficult case mining module 102 is configured to determine a difficult case in the inference image set received by the user I/O module 101. The brute force mining module 102 may communicate with both the inference module 104 and the user I/O module 101. The difficult case mining module 102 may obtain a reasoning result from the reasoning module 104, where the reasoning module 104 performs reasoning on the reasoning image set, and mine a difficult case in the reasoning image set based on the reasoning result. Difficult cases mining module 102 may also provide mined difficult cases to user I/O module 101.

Optionally, the intractable mining module 102 may be further configured to obtain, from the user I/O module 101, the surface layer feature and the deep layer feature of the inference image in the inference image set input by the user.

The model training module 103: for training the AI model. The model training module 103 may be in communication with the user I/O module 101, the inference module 104, and the AI model storage module 106. The specific treatment is as follows:

in the present embodiment, the initial AI model includes an untrained AI model, an AI model that is trained but not based on difficult optimization. The untrained AI model means that the constructed AI model is not trained by using the training image set, and parameters in the constructed AI model are preset values. The AI model that is trained but not optimized based on the difficult case refers to an AI model that can be used for reasoning but is not optimized based on the difficult case, and may include two types, namely an initial AI model that is selected directly in the AI model storage module 105 by a user, and an AI model that is obtained by training an AI model constructed by using only labeled training images in a training image set. The visible AI platform may obtain an initial AI model from the AI model storage module 106, or use the model training module 103 to train the training image set to obtain the initial AI model.

The initial AI model is obtained by training the constructed AI model only by using the training images with labels in the training image set, and the specific treatment is as follows: and the AI platform determines an AI model for completing construction of the task target of the user for the user according to the task target of the user. The model training module 103 may communicate with both the user I/O module 101 and the AI model storage module 106. The model training module 103 selects an existing AI model from an AI model library stored in the AI model storage module 106 according to a task target of a user as a constructed AI model, or the model training module 103 searches an AI sub-model structure in the AI model library according to the task target of the user, an expected effect of the user on the task target or some configuration parameters input by the user, and specifies hyper-parameters of some AI models, such as the number of layers of the model, the number of neurons in each layer, and the like, to construct the AI model, and finally obtains a constructed AI model. Notably, some of the hyper-parameters of the AI model may be hyper-parameters that the AI platform determines empirically from the construction and training of the AI model.

The model training module 103 obtains a set of training images from the user I/O module 101. The model training model 103 determines some hyper-parameters for training the constructed AI model according to the characteristics of the training image set and the structure of the constructed AI model. Such as number of iterations, learning rate, batch size, etc. After the hyper-parameters are set, the model training module 103 performs automatic training on the constructed AI model by using the image with the label in the acquired training image set, and continuously updates the parameters in the constructed AI model in the training process to obtain an initial AI model. It is noted that some of the hyper-parameters in training the constructed AI model may be hyper-parameters that the AI platform determines from experience with model training.

The model training module 103 inputs the unlabeled images in the training image set to the initial AI model and outputs the inference result of the unlabeled images, the model training module 103 transmits the inference result to the difficult case mining module 102, and the difficult case mining module 102 excavates the difficult cases in the unlabeled images based on the inference result and feeds the difficult cases back to the model training module 103. The model training module 103 continues to perform optimization training on the initial AI model using the difficult case to obtain an optimized AI model. The model training module 103 provides the optimized AI model to the inference module 104 for inference processing. It should be noted here that if the initial AI model is the initial AI model stored in the AI model storage module 106, the training images in the training image set may be all unlabeled images. If the initial AI model is the constructed AI model, the training images in the training image set comprise part of the unlabeled images and part of the images with labels.

The inference module 104 infers the inference image set using the optimized AI model, and outputs an inference result of the inference image set inference image. The difficult case mining module 102 acquires the reasoning result from the reasoning module 104, and determines the difficult cases in the reasoning image set based on the reasoning result. The model training module 103 continues to train the optimized AI model based on the difficult cases provided by the difficult case mining module 102, to obtain a more optimized AI model. The model training module 103 transmits the more optimized AI model to the AI model storage module 106 for storage and transmits the more optimized AI model to the inference module 104 for inference processing. It should be noted here that, when the inference module 104 infers the inference image in the inference image set to obtain the difficult cases, and then optimizes the optimized AI model, the process is actually the same as the process of optimizing the initial AI model by using the difficult cases in the training image, and the difficult cases in the inference image are taken as the training image.

Optionally, the model training module 103 may be further configured to determine an AI model selected by the user on the GUI as the initial AI model. Or determines an AI model input by the user on the GUI as the initial AI model.

Optionally, the initial AI model may further include an AI model trained on AI models in the AI model storage module 106 using images in the training image set.

And the reasoning module 104 is used for reasoning the reasoning image set according to the AI model to obtain a reasoning result. The inference module 104 may communicate with the difficult case mining module 102, the user I/O module 101, and the AI model storage module 105. The inference module 104 acquires the inference image set inference image from the user I/O module 101, and performs inference processing on the inference image set inference image to obtain an inference result of the inference image set inference image. The inference module 104 transmits the inference results to the difficult case mining module 102 so that the difficult case mining module 102 mines difficult cases in the inference image set based on the inference results.

And the data preprocessing module 105 is configured to perform preprocessing operations on the training images in the training image set and the inference images in the inference image set received by the user I/O module 101. The data preprocessing module 105 can read the training image set or the inference image set received by the user I/O module 101 from the data storage module 107, and then preprocess the inference image in the inference image set or the training image in the training image set. The preprocessing is carried out on the training image set training images or the inference image set inference images uploaded by the user, so that the training image set training images or the inference image set inference images have consistency in size, and improper data in the training image set training images or the inference image set inference images can be removed. The preprocessed training image set can be suitable for training the constructed AI model or training the initial AI model, and the training effect can be better. The preprocessed inference image set inference images may be adapted to be input into the second AI model for inference processing. After the data preprocessing module 105 completes preprocessing of the training images or the inference images in the training image set, the preprocessed training image set or inference image set is stored in the data storage module 107. Or sending the preprocessed training image set to the model training module 103, and sending the preprocessed inference image set to the inference module 104. It should be appreciated that in another embodiment, the data storage module 107 may also be included as part of the data pre-processing module 105, even though the data pre-processing module 105 has the capability to store images.

The AI model storage module 106: the method is used for storing initial AI models, optimizing AI models, AI sub-model structures and the like, and can also be used for storing AI models determined and constructed according to the AI sub-model structures. The AI model storage module 106 may be in communication with both the user I/O module 101 and the model training module 103. The AI model storage module 106 receives and stores the trained initial AI model and the optimized AI model transmitted by the model training module 103. The AI model storage module 106 provides the constructed AI model or the initial AI model to the model training module 103. The AI model storage module 106 stores the initial AI model uploaded by the user and received by the user I/O module 101. It should be appreciated that in another embodiment, the AI model storage module 106 may also be included as part of the model training module 103.

Data Storage module 107 (e.g., may be a data Storage resource corresponding to an Object Storage Service (OBS) provided by a cloud Service provider): for storing the training image set and the inference image set uploaded by the user, and for storing the data processed by the data preprocessing module 105.

It should be noted that the AI platform in the present application may be a system capable of interacting with a user, and the system may be a software system, a hardware system, or a system combining software and hardware, which is not limited in the present application.

Due to the functions of the modules, the AI platform provided by the embodiment of the application can provide services for training the AI model for the user, so that the AI platform can provide the trained optimized AI model. The AI platform can dig out the difficult cases from the unmarked images, further continue to train the initial AI model based on the difficult cases to obtain the optimized AI model, so that the inference result of the AI model for reasoning is more accurate.

Fig. 2 is a schematic view of an application scenario of the AI platform 100 according to an embodiment of the present disclosure, and as shown in fig. 2, in an embodiment, the AI platform 100 may be completely deployed in a cloud environment. The cloud environment is an entity which provides cloud services to users by using basic resources in a cloud computing mode. A cloud environment includes a cloud data center that includes a large number of infrastructure resources (including computing resources, storage resources, and network resources) owned by a cloud service provider, and a cloud service platform, and the computing resources included in the cloud data center may be a large number of computing devices (e.g., servers). The AI platform 100 may be deployed independently on a server or a virtual machine in the cloud data center, or the AI platform 100 may also be deployed in a distributed manner on multiple servers in the cloud data center, multiple virtual machines in the cloud data center, or both servers and virtual machines in the cloud data center. As shown in fig. 2, the AI platform 100 is abstracted as an AI cloud service by a cloud service provider on a cloud service platform and provided to a user, and after the user purchases the cloud service on the cloud service platform (the user can pre-charge the value and then settle according to the final use condition of resources), the AI platform 100 deployed in a cloud data center is used by the cloud environment to provide the AI platform cloud service to the user. When using the AI platform cloud service, a user may determine, through an Application Program Interface (API) or a GUI, a task to be completed by an AI model, upload a training image set and an inference image set to a cloud environment, where the AI platform 100 in the cloud environment receives task information, the training image set and the inference image set of the user, performs data preprocessing, AI model training, performs inference on the inference image in the inference image set using the trained AI model, performs difficult mining, and retrains the AI model based on the mined difficult mining. And the AI platform returns the contents of mining difficulty cases and the like to the user through an API or a GUI. The user further selects whether to retrain the AI model based on the difficulty case. The trained AI model may be downloaded by a user or used online for completing a particular task.

In another embodiment of the present application, when the AI platform 100 in the cloud environment is abstracted to the AI cloud service and provided to the user, the AI cloud service can be divided into two parts, that is: basic AI cloud services and AI unmanageable cloud services. A user can firstly purchase basic AI cloud services on a cloud service platform, then purchase the basic AI cloud services when needing to use the AI difficult-to-excavate cloud services, provide AI difficult-to-excavate cloud service APIs by a cloud service provider after purchase, and finally perform extra charging on the AI difficult-to-excavate cloud services according to the times of calling the APIs.

The AI platform 100 provided by the present application is relatively flexible to deploy, and as shown in fig. 3, in another embodiment, the AI platform 100 provided by the present application may also be deployed in different environments in a distributed manner. The AI platform 100 provided herein can be logically divided into multiple sections, each section having a different functionality. For example, in one embodiment the AI platform 100 includes a user I/O module 101, a difficult case mining module 102, a model training module 103, an AI model storage module 105, and a data storage module 106. Each portion of the AI platform 100 can be deployed in any two or three of the terminal computing device, the edge environment, and the cloud environment, respectively. The terminal computing device includes: terminal server, smart mobile phone, notebook computer, panel computer, personal desktop computer, intelligent camera etc.. An edge environment is an environment that includes a set of edge computing devices that are closer to a terminal computing device, the edge computing devices including: edge servers, edge kiosks with computing power, etc. Various parts of the AI platform 100 deployed in different environments or devices cooperate to implement functions such as providing a constructed AI model determination and training for a user. For example, in one scenario, the user I/O module 101, the data storage module 106, and the data preprocessing module 107 in the AI platform 100 are deployed in a terminal computing device, and the hard case mining module 102, the model training module 103, the inference module 104, and the AI model storage module 105 in the AI platform 100 are deployed in an edge computing device of an edge environment. The user sends the training image set and the inference image set to the user I/O module 101 in the terminal computing device, the terminal computing device stores the training image set and the inference image set to the data storage module 106, the data preprocessing module 102 preprocesses the training images in the training image set and the inference images in the inference image set, and stores the preprocessed training images in the training image set and the preprocessed inference images in the inference image set in the data storage module 106. The model training module 103 in the edge computing device determines a constructed AI model according to a task target of a user, trains based on the constructed AI model and a training image in a training image set to obtain an initial AI model, and trains based on a difficulty case in an unlabeled image in the training image set and the initial AI model to obtain an optimized initial AI model. Optionally, the difficult case mining module 102 may also mine difficult cases included in the inference image set based on the optimized AI model. The model training module 103 trains the optimized AI model based on the difficult cases to obtain a more optimized AI model. It should be understood that, in the present application, the portions of the AI platform 100 to be deployed are not limited to what environment, and in actual application, adaptive deployment may be performed according to the computing capability of the terminal computing device, the resource occupation of the edge environment and the cloud environment, or the specific application requirement.

The AI platform 100 can also be deployed separately on one computing device in any environment (e.g., separately on one edge server of an edge environment). Fig. 4 is a hardware configuration diagram of a computing device 400 in which the AI platform 100 is deployed, and the computing device 400 shown in fig. 4 includes a memory 401, a processor 402, a communication interface 403, and a bus 404. The memory 401, the processor 402 and the communication interface 403 are connected to each other by a bus 404.

The memory 401 may be Read Only Memory (ROM), Random Access Memory (RAM), a hard disk, flash memory, or any combination thereof. The memory 401 may store a program, and when the program stored in the memory 401 is executed by the processor 402, the processor 402 and the communication interface 403 are used to perform a method for the AI platform 100 to train an AI model for a user, mine difficult cases, and further optimize the AI model based on the difficult cases. The memory may also store a set of images. For example, a part of the storage resources in the memory 401 is divided into a data storage module 106 for storing data required by the AI platform 100, and a part of the storage resources in the memory 401 is divided into an AI model storage module 105 for storing an AI model library.

The processor 402 may employ a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or any combination thereof. Processor 402 may include one or more chips. The processor 402 may include an AI accelerator, such as a Neural Processing Unit (NPU).

The communication interface 403 enables communication between the computing device 400 and other devices or communication networks using transceiver modules, such as transceivers. For example, the data may be acquired through the communication interface 403.

Bus 404 may include a path that transfers information between components of computing device 400 (e.g., memory 401, processor 402, communication interface 403).

With the development of AI technology, the AI technology is widely applied in many fields, for example, the AI technology is applied in the fields of self-service driving and auxiliary driving of vehicles, in particular, the processes of lane line identification, traffic light identification, automatic parking space identification, sidewalk detection and the like are performed. These processes may be summarized as image classification and/or object detection using the AI model in the AI platform. For example, traffic light identification is determined using an AI model, and lane lines are identified using the AI model. The image classification is mainly used to determine the category to which the image belongs (i.e., the category to which one frame of image is input and the output image belongs). Object detection may include both determining whether an object belonging to a particular category is present in the image, on the one hand, and locating the object (i.e., determining where the object is present in the image), on the other hand. The embodiment of the application will take image classification and target detection as examples to explain how to provide an AI model in an AI platform.

The following describes a specific flow of a method for providing an AI model in an embodiment with reference to fig. 5, and the method is described as an example performed by an AI platform:

in step 501, the AI platform receives a plurality of unlabeled images of a first user.

The first user is a user of an entity registering an account on the AI platform. E.g., developers of AI models, etc.

In this embodiment, the first user wants to obtain the AI model on the AI platform, may set a plurality of unlabelled images in one folder, and then open an upload interface of the images provided by the AI platform. The uploading interface comprises an input position of the image, the first user can add a storage position of the training image set at the input position of the image, and the plurality of unlabelled images are uploaded to the AI platform. In this way, the AI platform can receive a plurality of unlabeled images of the first user.

As shown in fig. 6, in the upload interface, an identifier (for marking an image uploaded this time), a type of label (for indicating a use of the AI model trained using the image, such as object detection or image classification, etc.), a creation time, an image input position, a tag set of the image (such as a person, a car, etc.), a name (such as an object, etc.), a description, a version name, etc. are also displayed.

Step 502, the AI platform labels a plurality of images according to the initial AI model.

In this embodiment, the AI platform may obtain the initial AI model, and then input the plurality of unlabelled images into the initial AI model to obtain the labeling results of the plurality of unlabelled images. Here, if the initial AI model is used for image classification, the labeling result of the image is the category to which the image belongs, and if the image is an apple image, the category to which the image belongs is an apple. If the initial AI model is used for target detection, the labeling result of the image is the position of the bounding box of the target included in the image and the category to which the target belongs, where the target may be an object included in the image, such as a car, a person, a cat, etc. If the initial AI model is used for both image classification and target detection, the labeling result of the image is the category to which the image belongs, the position of the bounding box of the target included in the image, and the category to which the target belongs.

Optionally, the first user may also upload the annotated image to the AI platform.

In step 503, the AI platform determines the difficult cases in the multiple images according to the labeling result.

In this embodiment, after obtaining the labeling results of the plurality of images that are not labeled, the AI platform may determine, according to the labeling results, the difficult cases included in the plurality of images that are not labeled (the concept of the difficult cases is explained in the foregoing, and is not described here again).

In step 504, the AI platform trains the initial AI model using the hard case to obtain an optimized AI model.

In this embodiment, after the AI platform determines the difficult example, the AI platform may continue to train the initial AI model using the difficult example, and the specific processing is: and inputting part of the difficult cases into the initial AI model to obtain an output result. And determining the difference between the output result and the labeling result of the difficult example, adjusting parameters in the initial AI model based on the difference, continuing to use another part of the difficult example in the difficult example, and executing the process in a circulating manner until the difficult example is completely used for training, or determining that the optimized AI model is obtained until the difference between the result predicted by the optimized AI model and the labeling result is less than a certain threshold value.

Optionally, the present application further provides a method for obtaining the initial AI model in step 502, and a process for labeling a plurality of unlabeled images based on the initial AI model, where the process is as follows:

the AI platform provides a label selection interface for the first user, and the label selection interface comprises at least one label mode which can be selected by the first user. And receiving the marking mode selected by the first user, and marking a plurality of unmarked images according to the initial AI model corresponding to the marking mode selected by the first user.

In this embodiment, after the first user provides the unmarked multiple images, the first user marks a part of the unmarked multiple images, and if the remaining images do not want to be marked, the first user can click the intelligent marking option to trigger entering the marking selection interface. Or after the first user provides a plurality of unmarked images, the first user can directly click the intelligent marking option without marking, and the user triggers to enter the marking selection interface. One or more annotation modes are provided in the annotation selection interface. And if only one annotation mode exists, displaying an option whether to select the annotation mode in the annotation selection interface. If the first user wants to select the marking mode, the first user can click the 'yes' option and can trigger the selection of the marking mode, and if the first user does not select the marking mode, the first user can click the 'no' option and cannot select the marking mode. If a plurality of marking modes are provided in the marking selection interface, the plurality of marking modes can be displayed in the marking selection interface, and selection options are displayed corresponding to each marking mode, a first user can select the marking mode to be used through the selection option corresponding to the marking mode and then submit the marking mode, and therefore the AI platform can receive the marking mode selected by the user.

As shown in fig. 7, the multiple annotation modes provided in the annotation selection interface may include an active learning mode and a pre-annotation mode. The AI platform in the active learning mode comprises the following steps: the AI platform firstly trains the constructed AI model by using a plurality of images with labels provided by a first user to obtain an initial AI model. And then annotating the plurality of images which are not annotated based on the initial AI model, and obtaining the annotation results of the plurality of images. The AI platform in the pre-labeling mode is processed as follows: the AI platform directly obtains the existing initial AI model, and labels the unmarked multiple images based on the initial AI model to obtain the marking results of the multiple images. In addition, the number of all images, the number of unlabelled images, the number of images with labels, and the number to be confirmed (the number to be confirmed means the number difficult to confirm by the user) are also displayed in the label selection interface.

If the labeling mode selected by the first user and received by the AI platform is the active learning mode, the AI platform can train the constructed AI model by using a plurality of images with labels provided by the first user to obtain an initial AI model. And then the AI platform inputs the unmarked multiple images into the initial AI model to obtain the unmarked multiple image marking results. It should be noted here that the plurality of images with labels may be a plurality of images with labels obtained by labeling some of the plurality of images without labels by the first user, or a plurality of images with labels directly provided by the first user.

If the tagging mode received by the AI platform and selected by the first user is a pre-tagging mode, the AI platform may directly obtain an initial AI model (the initial AI model may be an AI model uploaded by the first user or an AI model preset in the AI platform). And then the AI platform inputs the unmarked multiple images into the initial AI model to obtain the unmarked multiple image marking results.

In addition, in the embodiment of the present application, a process of labeling a training image by a first user on an AI platform is further provided, and the specific processing is: when the first user selects the image which is marked and not marked, the first user can determine whether the AI model to be trained is used in the scene of image classification, the scene of target detection or the scene of combination of the image classification and the target detection. And if the AI model to be trained is applied to the image classification scene, starting the AI platform to provide an image annotation interface of the image classification scene. And if the AI model to be trained is applied to the scene of the target detection, starting an image annotation interface of the scene of the target detection provided by the AI platform. As shown in fig. 8, in the image annotation interface of the scene of object detection, options such as selecting an image, a bounding box, a return key, enlarging the image, reducing the image, etc. are provided, and the first user can open one frame of image by selecting the image option. Then, in the image, the target is labeled by using the bounding box, and a label is added to the target, wherein the label can comprise the category of the target in the bounding box and the position of the bounding box in the image (the position can be identified by using the position coordinates of the upper left corner and the lower right corner due to the general rectangle of the bounding box). After the first user marks the target with the bounding box, the AI platform obtains the bounding box of the target, and the image marking interface can also display a marking information column of the image. And displaying information of the target which is marked by the first user on the marking information column, wherein the information comprises a mark, a boundary box and an operation, the mark is used for indicating the category of the target, the boundary box is used for indicating the shape of the used box, and the operation comprises a deletion option and a modification option. The first user can modify the added annotation in the image by operation.

The bounding box is a rectangular frame that can surround the entire object.

In addition, in the application, an initial AI model with labels at the training position of the image can also be used, and the specific processing is as follows:

one or more images with annotations are obtained from a first user. An initial AI model is obtained using the annotated one or more images.

In this embodiment, when the first user provides a plurality of unmarked images, the first user may also provide one or more images with marks, and the two images may be uploaded together, or the one or more images with marks may be uploaded first, and then the plurality of images without marks may be uploaded. The AI platform may obtain a preselected AI model, which may be an AI model selected by a user (which may include an AI model uploaded by the user or an AI model selected by the user in the AI platform), or an AI model selected by the AI platform based on the task target of this time.

The AI platform then trains the preselected AI model using the annotated image or images to obtain an initial AI model (see the training process for a supervised training process).

Optionally, after step 503, the AI platform may further provide the difficult cases to the first user, so that the first user further confirms whether the candidate difficult cases screened by the AI platform are difficult cases, specifically processing includes:

the AI platform provides a confirmation interface for the first user, and presents a candidate difficulty case to the first user in the confirmation interface, wherein the candidate difficulty case is at least one image in the plurality of images. And determining the difficult cases in the candidate difficult cases according to the operation of the first user on the confirmation interface.

In this embodiment, the AI platform determines a difficult candidate in the unmarked multiple images based on the marking result (the difficult candidate refers to determining one or more images in the unmarked multiple images only after the marking result, and not yet confirmed by the first user). The AI platform can provide the candidate difficulty cases to the first user in a manner that provides a confirmation interface to the first user, wherein the candidate difficulty cases in the unmarked plurality of images are presented to the first user in the confirmation interface. The first user can open any one candidate difficult case, then can subjectively judge whether the labeling result of the candidate difficult case is correct, if the labeling result of the candidate difficult case is correct, the confirmation operation can be carried out, and the AI platform can receive the confirmation operation to determine that the candidate difficult case is a difficult case. Thus, the determination is made more accurate by providing the first user with confirmation.

In addition, when the first user determines that the labeling result of some difficult cases is not correct, the labeling result of the difficult cases may be corrected, and the processing may be performed after step 503 or after the user determines that the difficult cases are correct as follows:

and the AI platform receives the correction marking of the difficult case by the user. Training the initial AI model to obtain an optimized AI model using the difficult cases comprises: the initial AI model is trained using the difficult cases and corresponding rectification labels to obtain an optimized AI model.

In this embodiment, in step 503, after the AI platform determines the difficult example, the AI platform may provide the difficult example to the first user in a manner of providing a confirmation interface to the first user, and show the difficult example in the unmarked multiple images to the first user in the confirmation interface. The first user can open any difficult case, and then the first user can subjectively judge whether the labeling result of the difficult case is correct. If not, the first user can correct the labeling result, after the correction is completed, the first user confirms the correction of the difficult case, the AI platform receives the confirmation operation, the AI platform can confirm that the difficult case is available, and the labeling result of the difficult case is the correction labeling submitted by the first user.

Subsequently, in step 504, the AI platform may train the initial AI model using the difficult case determined in step 503 and the correction label corresponding to the difficult case to obtain an optimized AI model. Or the AI platform can use the difficult case confirmed by the first user and the correction label corresponding to the difficult case to train the initial AI model to obtain the optimized AI model. In this way, the difficult annotation result is corrected by the first user, so that the difficult annotation result is correct, and the inference capability of the optimized AI model at the training position is stronger. Alternatively, the AI platform may train the initial AI model using difficult corrective annotations to obtain an optimized AI model.

Optionally, after step 504, an optimized AI model may be further provided in the present application for a second user to use, and specifically, the method may include two providing manners, an offline providing manner and an online providing manner, where the first providing manner is an offline providing manner, and the second providing manner is an online providing manner:

the first method is as follows: the optimized AI model is provided to the AI device of the second user such that the AI device performs the task objective with the optimized AI model.

The AI device refers to a device for running an AI model, such as a vehicle event data recorder.

In this embodiment, after the second user obtains the usage right of the optimized AI model through a certain manner (e.g., purchasing the usage right of the optimized AI model), the AI platform may send the optimized AI model to the AI device, and after the AI device receives the optimized AI model, the AI device may run the optimized AI model on the AI device, so that the AI device executes the task target using the optimized AI model. For example, the AI device is a tachograph, and the optimized AI model can be used to detect lane lines, etc.

Or the second user may download the optimized AI model from the AI platform to a device and then install the optimized AI model onto the AI device so that the AI device can use the optimized AI model to perform the task goal.

The second method comprises the following steps: and receiving the inference image sent by the equipment of the second user, inferring the inference image by using the optimized AI model, and providing an inference result for the equipment of the second user.

In this embodiment, the second user may open the AI platform through his own device, register an account on the AI platform, and then log in the AI platform using the registered account, when wanting to use the optimized AI model. And then the second user can find an optimized AI model in the AI models provided by the AI platform, and upload the inference image to the AI platform by using the operation guidance provided by the AI platform. After receiving the inference image, the AI platform may input the inference image to the optimized AI model to obtain an inference result of the inference image, and then send the inference result to the device of the second user. And if the optimized AI model is used for image classification, the inference result is the class to which the inference image belongs. If the optimized AI model is used for target detection, the inference result is the position of the bounding box of the target and the category to which the target belongs, wherein the bounding box is included in the inference image. If the optimized AI model is used for target detection and image classification, the inference result is the category to which the inference image belongs, the position of the bounding box of the target included in the inference image and the category to which the target belongs.

Therefore, the trained optimized AI model uses a difficult case in the training process, so that the reasoning capability of the trained optimized AI model is stronger.

Optionally, after the difficult example is determined in step 503, when the difficult example is fed back to the first user, a one-key online option is further provided in the confirmation interface, and the user may trigger the AI platform to automatically train the initial AI model using the difficult example by operating the one-key online option, so as to obtain the optimized AI model.

Optionally, after the training of the optimized AI model is completed, the optimized AI model may be used to reason the inference image, as shown in fig. 9, specifically processing includes:

in step 901, the AI platform receives a plurality of inference images uploaded by a user.

In this implementation, after the training of the optimized AI model is completed, the user wants to use the optimized AI model to reason the inference image, and can upload the inference image in an upload interface of the inference image, where the inference image includes a plurality of inference images (the inference image is also an unlabelled image). The process of uploading the plurality of inference images is the same as the process of uploading the plurality of images which are not labeled, and the detailed description is omitted here.

Step 902, the AI platform provides a difficult screening selection interface to the user, where the difficult screening selection interface includes difficult screening parameters selectable by the user.

In this embodiment, after the user uploads the plurality of inference images, if the user further wants to optimize the optimized AI model, a difficult-to-filter selection interface may be triggered to be displayed, and the difficult-to-filter selection interface may include difficult-to-filter parameters selectable by the user. The user can select the difficult screening parameters according to the reasoning images and the actual self requirements. As shown in fig. 10, the difficult example filtering parameter may include one or more of a difficult example filtering manner, an inference image type, a type of task target, and difficult example output path information. The difficult example screening mode can comprise confidence-based screening and algorithm-based screening. The inference image types may include continuous (continuous to indicate that the plurality of inference images are temporally continuous), discontinuous (discontinuous to indicate that the plurality of inference images are temporally discontinuous). The types of task targets may include target detection and image classification. The difficult case output path information may be used to indicate a storage location in the inference image where the difficult case is mined to be stored. If the plurality of inference images are sequential in time sequence (meaning that the inference images are video clips), the inference image type is selected to be sequential. And if the plurality of inference images are discontinuous in time sequence (the inference images are not video segments), selecting the type of the inference images to be discontinuous. If the user wants to perform image classification on the plurality of inference images, the task target type can be selected as image classification, and if the user wants to perform target detection on the plurality of inference images, the task target type can be selected as target detection.

It should be noted here that when the inference image type is discontinuous, the difficult-to-filter parameter further includes storage location information of the labeled training image.

And 903, the AI platform deduces the plurality of inference images according to the optimized AI model to obtain an inference result.

In this embodiment, the AI platform may input the plurality of inference images into the optimized AI model, and the optimized AI model may output inference results of the plurality of inference images. And if the optimized AI model is used for image classification, outputting an inference result to the plurality of inference images as the class to which the images belong. And if the optimized AI model is used for target detection, outputting the inference result for the plurality of inference images, wherein the inference result is the type of the target in the boundary box included in each frame of inference image and the position of the boundary box in the inference image.

And 904, the AI platform determines the difficult cases in the multiple inference images according to the inference result and the difficult case screening parameters selected by the user.

In this embodiment, the AI platform may screen out the difficult cases in the plurality of inference images using the inference results and the types and screening manners of the task targets in the difficult case screening parameters selected by the user. And then, the difficult cases in the plurality of inference images are stored through difficult case output paths in difficult case screening parameters.

And 905, training the optimized AI model by the AI platform according to the difficult case to obtain a re-optimized AI model.

In this embodiment, in the inference process of the optimized AI model, difficult cases can be continuously mined, and the optimized AI model is trained to obtain a re-optimized AI model.

Optionally, after outputting the difficult case in step 904, the AI platform may further provide the difficult case to the user, so that the user further confirms whether the difficult case is difficult, and the specific processing is as follows:

the AI platform provides a confirmation interface for the first user, and presents a candidate difficulty case to the first user in the confirmation interface, wherein the candidate difficulty case is at least one image in the plurality of images. And the AI platform determines the difficult cases in the candidate difficult cases according to the operation of the first user on the confirmation interface.

In this embodiment, the first user may determine at least one candidate difficult case included in the plurality of inference images according to the inference result and the difficult case screening parameter selected by the user. And then the AI platform provides at least one candidate difficulty case for the user I/O module, and the user I/O module provides a confirmation interface for the user, and the candidate difficulty cases in the plurality of inference images are shown for the user in the confirmation interface. The user can open any one candidate difficult case, the user can subjectively judge whether the marking information of the difficult case is accurate or not, if not, the user can modify the marking information, after the modification is completed, the modification of the candidate difficult case is confirmed, the AI platform receives the confirmation operation to confirm that the difficult case is available, and the marking information of the difficult case is the correction marking after the modification of the user. Or the user subjectively judges that the difficult example is marked without problems, the modification of the candidate difficult example can be directly confirmed, and the AI platform receives the confirmation operation. The AI platform can confirm that the difficult case is available and that the difficult case's label is the label provided by the original AI platform.

It should be noted that, in step 503, the process of determining the difficult case by using the initial AI model is specifically to extract features of the unlabeled image by using the initial AI model, determine an labeling result of the unlabeled image based on the features of the unlabeled image, and then find the difficult case in the unlabeled image based on the labeling result. In step 504, training of the initial AI model is continued based on the difficult cases in the unlabeled image to obtain an optimized AI model. In step 904, the process of determining the difficult cases by using the optimized AI model is performed, specifically, the optimized AI model is used to extract the features of the inference image, the inference result of the inference image is determined based on the features of the inference image, and then the difficult cases in the inference image are found based on the inference result. In step 905, the optimized AI model is continuously trained based on the difficult cases in the inference image to obtain a re-optimized AI model. It can be seen that the processing principles of step 503 and step 904 are similar, and both are difficult cases of using an AI model to determine the unlabeled image, and the AI models used are different only in that the inference capability of the optimized AI model is higher than that of the initial AI model. The processing principles of step 504 and step 905 are similar, and the existing AI model is trained based on the difficulty, so that the inference capability of the obtained AI model is superior to that of the current AI model. The above-mentioned flow of fig. 5 and the flow of fig. 9 are actually difficult cases to find and optimize the current AI model. By the method, the AI platform can provide an optimized AI model with stronger reasoning capability for an AI model developer, so that the developer can deploy the AI model in one key without concerning the development process.

In step 503, the implementation process of determining the difficulty case may be as follows:

and marking the unmarked image by using the initial AI model, obtaining marking information of each image in the unmarked image, and judging whether the unmarked image is a video segment. And if the unmarked image is a video clip, determining the difficult cases in the unmarked image according to the marking result of each image in the unmarked image. And if the image in the unmarked image is not the video segment, determining the difficult case in the unmarked image according to the marking result and the training image set of each image in the unmarked image.

In this embodiment, the AI platform may determine whether the unmarked images are difficult to be distinguished by using any one or more of an optical flow method and a hamming distance. For example, the AI platform may determine the distance of each frame of image from the next frame of image temporally adjacent to the image by the hamming distance. If the Hamming distance between the image and the next frame image adjacent to the image in time sequence is less than a certain value, the image and the next frame image are determined to be continuous in time sequence, and if the Hamming distance is greater than or equal to the certain value, the image and the next frame image are determined to be discontinuous in time sequence. When the image and the next frame image are judged to be continuous in time sequence, an optical flow method can be used for judging whether the image and the next frame image are continuous again, and if the image and the next frame image are judged to be continuous in time sequence, the image and the next frame image are finally determined to be continuous in time sequence. And finally determining that the image and the next frame image are discontinuous in time sequence if the optical flow method is used for judging that the image and the next frame image are discontinuous in time sequence. Thus, continuing to traverse each image determines whether the unannotated images are consecutive images or non-consecutive images. If the unmarked images are continuous images, determining that the unmarked images are video segments, and if the unmarked images are not continuous images, determining that the unmarked images are not video segments. Here, since a plurality of ways are used in combination to determine whether the images are continuous in time sequence, the accuracy of the determined result is high.

When the unmarked multiple images are continuous in time sequence, the AI platform can determine the difficult cases in the unmarked multiple images by using the marking results of the unmarked multiple images. When the unmarked images are not continuous in time sequence, the AI platform can determine the difficult cases in the unmarked images by using the marking result and the training image set of each image in the unmarked images. The training image set herein refers to a set of training images for training the initial AI model.

Here, the images are adjacent in time sequence, which means that the numbers are adjacent, for example, if a certain frame image is numbered 1 and another frame image is numbered 2, the two frame images are adjacent. The images being adjacent in time sequence may also refer to the order of uploading being adjacent, for example, if a certain frame of image is the first uploading and another frame of image is the second uploading, it indicates that the two frames of images are adjacent in time sequence. For another example, if a certain frame of image is uploaded first and another frame of image is uploaded third, it indicates that the two frames of images are not adjacent in time sequence.

The following will describe the determination manner of the AI platform applied to the image classification scene and the target detection scene, respectively, which is difficult to be taken:

when the AI platform is used to determine that an AI model is used in a scene of image classification, for a plurality of unlabelled images as video segments, the difficult determination method is as follows:

the AI platform determines a target image in a plurality of unmarked images, wherein the marking result of the target image is different from the marking result of the image adjacent to the target image in time sequence. The target image is determined to be a difficult case among the plurality of images that are not annotated.

In this embodiment, the annotation result of each frame of image output in step 502 may include a category to which the image belongs. For any frame image, the AI platform may determine whether the category to which the image belongs is the same as the category to which an adjacent frame image, where an adjacent frame image refers to a frame image that is chronologically adjacent to the image, belongs. If the images are the same, the images can be determined to be not difficult cases, and if the images are not the same, the recognition error rate of the optimized AI model to the images is higher, and the images can be determined to be difficult cases. The image is the target image.

Here, in the consecutive images, the first frame image is only the next frame image adjacent in time series, and the last frame image is only the previous frame image adjacent in time series.

When the AI platform is applied in the scene of image classification, for a plurality of unlabelled images which are not video segments, a difficult process is determined, as shown in fig. 11:

step 1101, the AI platform acquires confidence degrees of each image in the unmarked multiple images under each category, and determines a first difficulty value of each image in the unmarked multiple images according to the highest two confidence degrees of each image in the unmarked multiple images.

The difficult example value is used for measuring whether the image is difficult to be example, the larger the difficult example value is, the higher the probability that the image is difficult to be example is, and conversely, the smaller the difficult example value is, the lower the probability that the image is difficult to be example is.

In this embodiment, in step 502, the output of the optimized AI model may include the confidence of the unlabeled plurality of images under each category. The confidence of the output of the optimized AI model in each category indicates the possibility that the labeling result of the optimized AI model after reasoning the input data belongs to each category. For any one of the plurality of unlabeled images, the maximum two confidences corresponding to that image may be obtained. And subtracting the minimum confidence from the maximum confidence in the two confidences to obtain the difference of the two confidences. And then acquiring the corresponding relation between the difference range of the confidence degrees stored in the data storage module and the difficult-to-instantiate value, and determining the first difficult-to-instantiate value corresponding to the difference range of the confidence degrees to which the difference of the two confidence degrees belongs in the corresponding relation. Thus, according to the method, the first difficulty example value of each image in the plurality of images which are not marked can be determined.

Step 1102, the AI platform obtains surface feature distribution information of training images in the training image set, and determines a second difficult-to-sample value of each image in the unmarked multiple images according to the surface feature distribution information and the surface features of each image in the unmarked multiple images.

In this embodiment, for each frame of the unlabeled plurality of images, a surface feature of each frame of the images may be determined, and the surface feature may include one or more of a resolution of the image, an aspect ratio of the image, a mean and a variance of Red, Green, and Blue (RGB) of the image, a brightness of the image, a saturation of the image, or a sharpness of the image.

Specifically, the AI platform may obtain, from the attributes of the image, the resolution of the image, which refers to the number of pixels included in a unit inch, and the brightness of the image, which determines the brightness of the color in the color space.

The AI platform may use the length divided by the width of the image to obtain the aspect ratio of the image.

The AI platform may use R, G, B of each pixel point in the image to respectively determine the mean value of R, the mean value of B, and the mean value of G of the image, i.e., the mean value of RGB of the image. The AI platform then determines the mean value of R, the mean value of G and the mean value of B of all pixel points in the image, calculates the square of the difference between the mean value of R and R of each pixel point, calculates the sum of the squares corresponding to all pixel points in the image, and obtains the variance of R in the image.

The AI platform can calculate the saturation of the image, which refers to the vividness of the color, also called the purity of the color. For any image in a plurality of images which are not marked, the saturation of the image is calculated in the following mode: (max (R, G, B) -min (R, G, B))/max (R, G, B), max (R, G, B) representing the maximum value in R, G, B in the image, min (R, G, B) representing the minimum value in R, G, B in the image.

The AI platform can also calculate the definition of the image, wherein the definition is an index for measuring the quality of the image, and the definition of the image can be determined through a Brenner gradient function or a Laplacian gradient function and the like.

And then, the AI platform acquires the surface layer characteristics of all the images in the training image set and determines the distribution of the images on each surface layer characteristic. In particular, the distribution of the image over each surface feature may be represented using a histogram. As shown in FIG. 12, FIG. 12(a) is a histogram of the mean value of R of the images, the horizontal axis is the mean value of R, the vertical axis is the number of the images, the number of the images in the training image set is 1000, the number of the images in the training image set is 52 for the mean value of R from 10 to 20, the number of the images in the mean value of R from 20 to 30 is 204, the number of the images in the mean value of R from 30 to 40 is 320, the number of the images in the mean value of R from 40 to 50 is 215, the number of the images in the mean value of R from 50 to 60 is 99, the number of the images in the mean value of R from 60 to 70 is 69, the number of the images in the mean value of R from 70 to 80 is 22, the number of the images in the mean value of R from 80 to 90 is 13, the number of the images in the mean value of R. Fig. 12(b) is a histogram of saturation of images, the horizontal axis is saturation, the vertical axis is the number of images, and 1000 images in the training image set are not listed one by one.

And then, the AI platform acquires the stored preset numerical value, and multiplies the preset numerical value by the number of the images in the plurality of images which are not marked to obtain a target numerical value for the distribution of the images on any surface layer characteristic. And the AI platform arranges the numerical values of the surface features of all the images in the plurality of unlabeled images according to an ascending sequence, finds the numerical value at the position of the first target numerical value according to the ascending mode and obtains the limit value of the surface features. In the training image set, determining images with surface features larger than a limit value and images with surface features smaller than or equal to the limit value, determining the difficult example value of the images with the surface features larger than the limit value as a, and determining the difficult example value of the surface features smaller than or equal to the limit value as b. For example, the surface feature is the brightness of the image, the number of images is 1000, the preset value is 90%, the target value is 1000 × 90% — 900, the 900-th value among the brightness values arranged in ascending order in the histogram of the brightness is 202.5, the difficulty value of the image with the brightness greater than 202.5 in the training image set is determined to be 1, and the difficulty value of the image with the brightness less than or equal to 202.5 in the training image set is determined to be 0. According to the mode of determining the difficult example value based on the brightness, the difficult example value of each frame of image under each surface layer characteristic can be determined. The above is only an alternative implementation and other ways of determining the limit value may be used.

In this way, for each surface feature in each image of the plurality of images that are not labeled, a difficult-to-determine value can be determined, and then the weight corresponding to each surface feature can be obtained. And for each frame of image in the plurality of images which are not marked, multiplying the difficult example value of each surface layer feature in the image by the weight corresponding to the surface layer feature to obtain a numerical value corresponding to each surface layer feature. And then adding the numerical values corresponding to all the surface features by the AI platform to obtain a second difficult example value of the image.

It should be noted here that the weights may be different for different surface features, and the sum of the weights of all surface features is equal to 1. For example, the brightness of the image and the sharpness of the image are weighted more heavily than the aspect ratio of the image.

It should be noted that the preset values corresponding to each of the above surface features may be different. In the above step 1102, the AI platform determines the surface layer features of each frame of image, and during actual processing, the AI platform may directly upload the surface layer features of each image in the training image set and the unmarked multiple images, and store the surface layer features in the data storage module. When the AI platform is used, the surface layer characteristics of each image in the plurality of images which are not marked are obtained from the data storage module.

1103, the AI platform respectively extracts deep features of the images in the training image set and deep features of the images in the unmarked images by using the first feature extraction model, and performs clustering processing on the images in the training image set according to the deep features of the images in the training image set to obtain an image clustering result; and determining a third difficulty sample value of each image in the plurality of unlabeled images according to the deep features of each image in the plurality of unlabeled images, the image clustering result and the labeling result of each image in the plurality of unlabeled images.

In this embodiment, as shown in fig. 13, the AI platform may obtain a first feature extraction model, where the first feature extraction model may be CNN, and then the AI platform inputs each image in the training image set to the first feature extraction model to determine deep features of each image. The AI platform can also input each image in the plurality of images which are not marked into the first feature extraction model to determine the deep features of each image. The deep features of each frame of image can be represented by a one-dimensional array, and the dimensions of the one-dimensional array of the deep features of each frame of image are equal.

Then, the AI platform may input the deep features of each image in the training image set to a clustering algorithm (the clustering algorithm may be any one, such as a K-means clustering algorithm, etc.), so as to obtain an image clustering result. The image clustering result comprises a plurality of image groups, and each frame of image group comprises one or more images.

For each frame of image set, the AI platform may determine an average of the values for the ith dimension for each frame of image in the image set. For example, the image group includes 3 images, the deep features of each frame of image are represented by a three-dimensional array, the deep features of the 3 images are sequentially (1, 2, 5), (4, 2, 4), (4, 8, 9), the average value of the 1-dimensional value is 3, the average value of the 2-dimensional value is 4, and the average value of the 3-dimensional value is 6, so that the center in the image group is (3, 4, 6). In this way, the center of the group of images per frame can be determined in this way.

For any image in a plurality of unlabelled images, the AI platform may determine a distance between deep features of the image and a center of each frame of image group in the image clustering result, specifically, calculate an euclidean distance between the image and the center, and the formula may be:

where i is any dimension in the deep features and N is the total number of dimensions in the deep features. x is the number of_1iIs the ith dimension, x, in the deep features of the image_2iDimension i in the depth feature of the center. And determining the image group with the minimum distance as the image group to which the image belongs (the process can be regarded as a clustering result of a plurality of unlabelled images). And judging whether the image is the same as the image in the image group or not. If the values are the same, the difficult example value is determined to be a, and the third difficult example value of the image is a. If not, determining the difficult example value as b, and the third image of the imageA difficult value is b. Similarly, any image in the unmarked multiple images is clustered to the existing frame image group, and the K-means clustering method can also be used for determining the frame image group to which any image belongs. Other ways of clustering may also be used.

And 1104, determining a target difficult example value of each image of the plurality of images which are not marked by the AI difficult example value according to one or more of the first difficult example value, the second difficult example value and the third difficult example value.

In this embodiment, for any image in a plurality of images that are not labeled, the AI platform may determine a target difficult example value of the image by using one or more of the first difficult example value, the second difficult example value, and the third difficult example value of the image. Specifically, the AI platform may determine the first difficult-to-case value as a target difficult-to-case value, may determine the second difficult-to-case value as a target difficult-to-case value, may obtain the target difficult-to-case value by weighting the first difficult-to-case value and the second difficult-to-case value, may obtain the target difficult-to-case value by weighting the first difficult-to-case value and the third difficult-to-case value, may obtain the target difficult-to-case value by weighting the second difficult-to-case value and the third difficult-to-case value, and may obtain the target difficult-to-case value by weighting the first difficult-to-case value, the second difficult-to-case value and the third difficult-to-case value.

When the first, second and third difficult example values are used simultaneously, the difficult example values of the three layers are considered simultaneously, so that the determined target difficult example value is more accurate.

In step 1105, the AI platform determines a first number of images with the largest target difficulty case value among the plurality of images that are not labeled as difficulty cases among the plurality of images that are not labeled.

The first number may be preset and stored in a data storage module of the AI platform.

In this embodiment, the AI platform may sort the plurality of images that are not labeled according to a descending order of the target difficulty value, select the first number of images that are sorted at the top, and determine that the images are difficult to label among the plurality of images that are not labeled.

When the AI platform is applied to a scene of target detection, for a plurality of unlabelled images that are video clips, as shown in fig. 14, the difficult determination method is as follows:

in step 1401, the AI platform determines, for a first target frame of a first image in the unmarked plurality of images, a tracking frame with the highest similarity to the first target frame in images whose time-series interval with the first image is less than or equal to a second number.

Wherein the second number may be preset, such as 2.

In this embodiment, any image in the plurality of unlabeled images may be referred to as a first image, and any bounding box in the first image may be referred to as a first target box. The AI platform can determine that the first images are less than or equal to the second number of images apart in time sequence. For example, the first image is the 5 th frame image, the second number is 2, and then the images of which the interval in time sequence of the first image is less than or equal to the second number are the 3 rd frame image, the 4 th frame image, the 6 th frame image and the 7 th frame image. In this way, when the second number is greater than or equal to 2, not only the adjacent one-frame image but also the adjacent multi-frame image is considered, so that the accuracy of determination of false detection and missed detection can be improved.

The AI platform may acquire a plurality of bounding boxes in the image of which the time-series interval of the first image is less than or equal to the second number, and then determine the similarity of the plurality of bounding boxes to the first target box. The method specifically comprises the following steps: for each bounding box, a first absolute value of the difference between the area of the bounding box and the area of the first target box is calculated, a second absolute value of the difference between the length of the bounding box and the length of the first target box is calculated, and a third absolute value of the difference between the width of the bounding box and the width of the first target box is calculated. And multiplying the first absolute value by the weight corresponding to the area to obtain a first weight, multiplying the second absolute value by the weight corresponding to the length to obtain a second weight, and multiplying the third absolute value by the weight corresponding to the width to obtain a third weight. And adding the first weight, the second weight and the third weight to obtain the similarity of the first target frame and the boundary frame. It should be noted here that the sum of the first weight, the second weight and the third weight is equal to 1, and the second weight and the third weight may be equal.

The AI platform may determine a bounding box with the highest similarity to the first target box among the plurality of bounding boxes. And then determining the boundary box with the highest similarity as the tracking box corresponding to the first target box. In this way, the loss of the target due to motion can be avoided because the most similar bounding box in the adjacent multi-frame images is considered.

In step 1402, the AI platform determines the overlapping rate of the first target frame and each bounding box according to the tracking frame, all bounding boxes in the images with the time sequence interval less than or equal to the second number with the first image, and the first target frame.

In this embodiment, the AI platform may determine the overlapping rate of the first target box and each bounding box by using the following formula:

overlap＝ max(iou(curbox,bbox),iou(trackedbox,bbox)) (1)

wherein, overlap refers to the overlapping rate of the first target frame and the boundary frame. The first target frame is represented by a curbox, the bounding box is represented by a bbox, and iou (curbox, bbox) represents the intersection ratio (iou) of the first target frame and the bounding box. The trace frame of the first target frame is represented by a tracedbox, and iou (tracedbox, bbox) represents the intersection ratio of the trace frame and the bounding box. overlap is equal to the maximum of the two intersection ratios. And the intersection ratio of the first target frame and the boundary frame is equal to the ratio of the area of the intersection of the first target frame and the boundary frame to the area of the union of the first target frame and the boundary frame. Similarly, the intersection ratio of the tracking frame and the boundary frame is equal to the ratio of the area of the intersection of the tracking frame and the boundary frame to the area of the union of the tracking frame and the boundary frame.

Step 1403, if a bounding box with the overlap ratio larger than a second numerical value exists, the AI platform determines the bounding box with the overlap ratio larger than the second numerical value as a similar box corresponding to the first target box; if the boundary box with the overlapping rate larger than the second numerical value does not exist, determining that the similar box corresponding to the first target box does not exist.

The second value can be preset and stored in the data storage module. Such as a second value of 0.5, etc.

In this embodiment, after determining the overlapping rates of the first target frame and the bounding boxes, the AI platform may determine the overlapping rates and the second values. And if the overlapping rate of the first target frame and a certain boundary frame is greater than the second value, determining that the boundary frame is a similar frame corresponding to the first target frame. Here, there may be a plurality of bounding boxes with the overlapping ratio larger than the second value, and the first target box may correspond to a plurality of similar boxes.

And if the overlapping rate of the first target frame and any boundary frame is less than or equal to a second value, determining that no similar frame corresponding to the first target frame exists in the images of which the time interval of the first image is less than or equal to a second number.

In step 1404, if there is no similar frame corresponding to the first target frame, the AI platform determines the first target frame as a difficult frame.

In this embodiment, in step 1403, it is determined that there is no similar frame corresponding to the first target frame, which indicates that the first target frame is a suddenly appearing frame and may be regarded as a false detection frame. The AI platform may determine the first target box as a difficult box.

Step 1405, if there is a similar frame corresponding to the first target frame and the first image to which the first target frame belongs and the second image to which the similar frame belongs are not adjacent in time sequence, determining, by the AI platform, a difficult frame in the image between the first image and the second image according to the first target frame and the similar frame.

In this embodiment, in step 1403, it is determined that a similar frame corresponding to the first target frame exists, and the AI platform may determine whether the images where the similar frame and the first target frame are located are adjacent in time sequence. If the time sequence is adjacent, the missed detection frame does not exist. If the frames are not adjacent in time sequence, it is indicated that the frames suddenly disappear, and a missed detection frame exists, the AI platform may use the similar frame and the first target frame to perform a sliding average to mark the missed detection frame in the image between the first image and the second image, where the missed detection frame is a difficult frame in the image between the first image and the second image. In this way, the missed frames and the false frames in the continuous frame images are marked by adopting a principle of few obeys to majority.

The process of the sliding average in step 1405 may be: typically, the bounding box is rectangular, and the position coordinates of the upper left and lower right corners of the bounding box are used to mark the position of the bounding box in the belonging image. The position coordinates refer to position coordinates in the image. The AI platform may subtract the abscissa of the top left corner of the similar box from the abscissa of the top left corner of the first target box to obtain an abscissa difference value, and multiply the abscissa difference value by x/(n +1) (where n is equal to the number of images between the image to which the first target box belongs and the image to which the similar box belongs, and x is the x-th image between the image to which the first target box belongs and the image to which the similar box belongs). And adding the difference value of the abscissa and the abscissa of the upper left corner of the frame with the earlier time sequence in the image to which the first target frame belongs and the image to which the similar frame belongs to the numerical value of x/(n-1) to obtain the abscissa of the upper left corner of the difficult case of the x-th image between the image to which the first target frame belongs and the image to which the similar frame belongs. Similarly, the vertical coordinate of the upper left corner of the difficult case of the x-th image between the image to which the first target frame belongs and the image to which the similar frame belongs, and the position coordinate of the lower right corner of the difficult case in the x-th image between the image to which the first target frame belongs and the image to which the similar frame belongs can be obtained.

And step 1406, determining the difficult cases in the plurality of images which are not marked according to the number of the difficult case frames of each image in the plurality of images which are not marked.

In this embodiment, based on the processing of steps 1401 to 1405, the number of difficult cases of each of the plurality of images that are not labeled may be determined, and then the AI platform may determine, as a difficult case of the plurality of images that are not labeled, an image in which the number of difficult cases exceeds a third number.

When the AI platform is applied to a target detection scene, for a plurality of unlabelled images that are not video clips, as shown in fig. 15, the difficult determination method is as follows:

in step 1501, the AI platform obtains surface feature distribution information of the images in the training image set, and determines a fourth hard case value of each image in the plurality of images that are not labeled according to the surface feature distribution information of the images in the training image set and the surface features of the plurality of images that are not labeled.

In this embodiment, for each frame of image in the plurality of images that are not labeled, the surface layer feature of each frame of image may be determined, and the surface layer feature may include the surface layer feature of the image and the surface layer feature of the bounding box. The surface characteristics of the image may include one or more of a resolution of the image, an aspect ratio of the image, a mean and variance of RGB of the image, a brightness of the image, a saturation of the image or a sharpness of the image, a number of frames in a single frame image, or a variance of an area of a frame in a single frame image. The surface characteristics of the bounding box may include one or more of an aspect ratio of each bounding box in the single-frame image, a proportion of an area of each bounding box in the single-frame image to an image area, a degree of marginalization of each bounding box in the single-frame image, a stack map of each bounding box in the single-frame image, a brightness of each bounding box in the single-frame image, or a blurriness of each bounding box in the single-frame image.

Specifically, the AI platform determines the resolution of the image, the aspect ratio of the image, the mean and variance of RGB of the image, the brightness of the image, the saturation of the image, or the sharpness of the image in the surface layer features of each image in the plurality of images that are not labeled, which may refer to the processing in step 1102 and is not described herein again.

The AI platform may determine the number of frames in each frame of the image.

The AI platform can determine the area of each frame in each frame of image, then calculate the mean value of the areas of all frames in each frame of image, then subtract the mean value from the area of each bounding box, square the area of each bounding box to obtain a value corresponding to each bounding box, and add the values corresponding to each bounding box to obtain the variance of the area of the frame in the single frame of image.

The AI platform may calculate the aspect ratio of each bounding box in each frame of the image. The AI platform may calculate the ratio of the area of each bounding box to the image area in each frame of the image.

The AI platform can calculate the marginalization degree of each bounding box in the single-frame image, and the specific processing is as follows: for any bounding box in a certain frame of image, calculating the absolute value of the difference between the abscissa of the center of the bounding box and the abscissa of the center of the image (called as the abscissa difference), the absolute value of the difference between the ordinate of the center of the bounding box and the ordinate of the center of the image (called as the ordinate difference), calculating a first ratio of the abscissa difference to the length of the image, and calculating a second ratio of the ordinate difference to the width of the image, wherein the first ratio and the second ratio reflect the marginalization degree of the bounding box, and generally, the marginalization degree is more serious the larger the first ratio and the second ratio are.

The AI platform can calculate the stacking degree of each bounding box in the single-frame image, and the specific processing is as follows: and for any boundary frame in a certain frame of image, calculating the intersection area of the boundary frame and the rest boundary frames in the image, comparing the intersection area with the boundary frame area respectively, and adding to obtain the stacking layer degree of the boundary frame in the image.

The AI platform can calculate the brightness of each bounding box in the single-frame image, and the specific processing is as follows: for any bounding box in a certain frame of image, calculating the square of the mean value of R, the square of the mean value of G and the square of the mean value of B of the pixel points in the bounding box. Then, the square of the mean value of R is multiplied by 0.241 to obtain a product a, the square of the mean value of G is multiplied by 0.691 to obtain a product B, and the square of the mean value of B is multiplied by 0.068 to obtain a product c. And adding the product a, the product b and the product c, and then opening and squaring to obtain the brightness of the bounding box. The use of a formula representation can be as follows:

the AI platform can calculate the ambiguity of each bounding box in the single-frame image, and the specific processing is as follows: and for any bounding box in a certain frame of image, filtering the bounding box by using a Laplacian operator to obtain an edge value, and solving the variance of the edge value to obtain the ambiguity of the bounding box. The larger the value obtained by calculating the variance is, the clearer the bounding box is. In addition, the determination of the ambiguity of the frame is only an example, and any ambiguity that can be used for determining the boundary frame can be applied to the embodiment.

Then, the AI platform acquires the surface features of the training images in the training image set, and determines the distribution of the images on each surface feature (the processing is the same as the acquisition of the surface features of the plurality of images that are not labeled, and the reference can be made to the surface features of the plurality of images that are not labeled).

And then, the AI platform acquires the stored preset numerical value, and multiplies the preset numerical value by the number of the images in the plurality of images which are not marked to obtain a target numerical value for the distribution of the images on any surface layer characteristic. And the AI platform arranges the numerical values of the surface features of all the images in the plurality of unlabeled images according to an ascending sequence, finds the numerical value at the position of the first target numerical value according to the ascending mode and obtains the limit value of the surface features. In the training image set, determining images with surface features larger than a limit value and images with surface features smaller than or equal to the limit value, determining the difficult example value of the images with the surface features larger than the limit value as a, and determining the difficult example value of the surface features smaller than or equal to the limit value as b. In this way, for each surface feature in each image of the plurality of images that are not labeled, a difficult-to-determine value can be determined, and then the weight corresponding to each surface feature can be obtained. For each frame of image in a plurality of unmarked images, the AI platform multiplies the difficult-to-sample value of the surface layer feature of the bounding box of the image by the weight corresponding to the surface layer feature to obtain a numerical value corresponding to each surface layer feature of the bounding box, and then the AI platform adds the numerical values corresponding to all the surface layer features of the bounding box to obtain the difficult-to-sample value of the bounding box of the image. And the AI platform multiplies the difficult-to-sample value of the surface feature of the image by the weight corresponding to the surface feature to obtain a numerical value corresponding to each surface feature of the image, and then adds the numerical values corresponding to all the surface features of the image to obtain the difficult-to-sample value of the image. The AI platform then weights the refractory value of the bounding box with the refractory value of the image (the sum of the weights of the refractory value of the bounding box and the refractory value of the image equals 1) to obtain a fourth refractory value of the image.

It should be noted that the preset values corresponding to each of the above surface features may be different. In step 1010, the AI platform determines the surface layer characteristics of each frame of image, and during actual processing, the AI platform may directly upload the surface layer characteristics of each image in a plurality of images that are not marked and store the surface layer characteristics in the data storage module. When the AI platform is used, the surface layer characteristics of each image in the plurality of images which are not marked are obtained from the data storage module.

Step 1502, the AI platform uses a second feature extraction model to respectively extract deep features of each bounding box in each image in the training image set and deep features of each bounding box in each image in the plurality of images which are not labeled, and performs clustering processing on each bounding box in each image in the training image set according to the deep features of each bounding box in each image in the training image set to obtain a frame clustering result; and determining a fifth difficult example value of each image in the plurality of unlabeled images according to the deep features of each bounding box in each image in the plurality of unlabeled images, the frame clustering result and the inference result of each bounding box in each image in the plurality of unlabeled images.

In this embodiment, as shown in fig. 16, the AI platform may obtain a second feature extraction model, and the second feature extraction model may be the same as the first feature extraction module mentioned above, and may be CNN. And then inputting the images in the training image set into a second feature extraction model by the AI platform, and determining the deep features of each bounding box in each image. The AI platform can also input each image in the plurality of images which are not marked into the second feature extraction model to determine the deep features of each bounding box in each image. The deep features of each bounding box may be represented using a one-dimensional array, and the dimensions of the one-dimensional array of deep features of each bounding box are equal.

Then, the AI platform may input the deep features of each bounding box in each image in the training image set into a clustering algorithm (the clustering algorithm may be any one of the clustering algorithms, such as a K-means clustering algorithm, etc.), so as to obtain a bounding box clustering result. The bounding box clustering result comprises a plurality of bounding box groups, and each bounding box group comprises one or more bounding boxes.

For each bounding box group, the AI platform can determine an average of the values of the ith dimension for each bounding box in the bounding box group. For example, the bounding box group includes 3 bounding boxes, the deep features of each bounding box are represented by a three-dimensional array, the deep features of the 3 bounding boxes are sequentially (7, 2, 5), (4, 2, 4), (4, 14, 9), the average value of the 1-dimensional value is 3, the average value of the 2-dimensional value is 4, and the average value of the 3-dimensional value is 6, so that the center in the bounding box group is (5, 6, 6). Thus, the center of each bounding box group may be determined in this manner.

For any bounding box in any image in a plurality of unmarked images, the AI platform may determine a distance between deep features of the bounding box and a center of each bounding box group in the image clustering result, specifically, calculate an euclidean distance between the bounding box and the center, and the formula may be:

where i is any dimension in the deep features and N is the total number of dimensions in the deep features. x is the number of_1iIs the ith dimension, x, in the deep features of the bounding box_2iDimension i in the depth feature of the center. And determining the bounding box group with the minimum distance as the image group to which the bounding box belongs. Similarly, any bounding box in a plurality of unmarked images is clustered to an existing bounding box group, and the K-means clustering method can also be used for determining the bounding box group to which any bounding box belongs. Other ways of clustering may also be used. And judging whether the bounding box is the same as the bounding box in the bounding box group. If so, the hard case value is determined to be c. If not, determining the difficult-to-sample value as d. And adding the difficult example values of all the bounding boxes for each frame of image to obtain a fifth difficult example value of each frame of image.

In step 1503, the AI platform determines a target difficult-to-instantiate value of each image of the plurality of images that are not annotated according to one or more of the fourth difficult-to-instantiate value and the fifth difficult-to-instantiate value.

In this embodiment, for any image in a plurality of images that are not labeled, the AI platform may determine the target difficult example value of the image by using one or more of the fourth difficult example value and the fifth difficult example value of the image. Specifically, the AI platform may determine the fourth difficult example value as the target difficult example value, may determine the fifth difficult example value as the target difficult example value, and may obtain the target difficult example after weighting the fourth difficult example value and the fifth difficult example value. When the fourth difficult example value and the fifth difficult example value are used simultaneously, the determined target difficult example value is more accurate because the difficult example values of the two layers are considered simultaneously.

Step 1504, the AI platform determines the first number of images with the maximum target difficulty sample value among the plurality of images that are not labeled as the difficulty samples among the plurality of images that are not labeled.

In the embodiment of the application, the AI platform can obtain a plurality of unmarked images, input the unmarked images into the initial AI model to obtain the marking results of each data in the unmarked images, then determine the difficult cases in the unmarked images by using the marking results of each image in the unmarked images, and train the initial AI model again based on the difficult cases to obtain the optimized AI model. Since the initial AI model is trained in the AI platform by using a difficult example, the inference accuracy of the trained optimized AI model can be higher.

The following embodiments of the present application further provide a method for optimizing an AI model, and as shown in fig. 17, the processing may be:

step 1701, the AI platform trains the initial AI model according to the training image set to obtain an optimized AI model.

In this embodiment, the training image set is the image set provided by the user to the AI platform. The training image set may comprise only unlabelled images. The training image set may include a plurality of images that are unlabeled and a plurality of images that are labeled.

When the training image set may include only a plurality of images that are not labeled, the process of optimizing the initial AI model may be seen in the flow shown in fig. 5. When the training image set comprises a plurality of unlabelled images and a plurality of labeled images, an initial AI model may be trained by using the plurality of labeled images, and based on the initial AI model, the labeling result of the plurality of unlabelled images is determined. And determining a difficult example based on the labeling result, training an initial AI model based on the difficult example, and obtaining an optimized AI model. The processing here can be seen in the flow shown in fig. 5.

And 1702, receiving the inference image set by the AI platform, and inferring each inference image in the inference image set according to the optimized AI model to obtain an inference result.

In this embodiment, if the user wants to use the optimized AI model for inference, the user may upload the inference image set, and input images in the inference image set into the optimized AI model to obtain an inference result.

And step 1703, the AI platform determines a difficult case in the inference image set according to the inference result, wherein the difficult case indicates the inference image of which the error rate of the inference result obtained by the inference through the optimized AI model is higher than the target threshold value.

Wherein the reasoning result is equivalent to the labeling result.

In this embodiment, the process may refer to the processing in step 503, and the difference from step 503 is that here, the inference image set is, and the plurality of unlabelled images in step 503 are actually the plurality of unlabelled images. See the description of step 503 for a detailed procedure.

And step 1704, training the optimized AI model according to the difficult case to obtain a re-optimized AI model.

In this embodiment, after determining the difficult cases in the inference image set, the optimized AI model may be trained continuously to obtain a re-optimized AI model (for a training process, see the foregoing description).

Due to the fact that the optimized AI model is trained again by using a difficult example, the reanalysis capability of the obtained re-optimized AI model is stronger.

It should be noted that the above-mentioned method for providing an AI model may be implemented by one or more modules on the AI platform 100, specifically, a user I/O module, for implementing step 501 in fig. 5, step 901 in fig. 9, and step 902. A difficult mining module, configured to implement step 503 in fig. 5, step 904, the flow shown in fig. 11, the flow shown in fig. 14, the flow shown in fig. 15, and step 1703 in fig. 17. The model training module is used to implement step 502 in fig. 5, step 504, step 905 in fig. 9, and step 1704 in fig. 17. The inference module is used for implementing step 903 shown in fig. 9 and step 1702 shown in fig. 17.

The present application also provides a computing device 400 as shown in fig. 4, wherein a processor 402 in the computing device 400 reads the program and the image set stored in the memory 401 to execute the method executed by the AI platform described above.

Since the various modules in the AI platform 100 provided herein can be distributively deployed over multiple computers in the same or different environments, the present application also provides a computing device as shown in fig. 18 that includes multiple computers 1800, each computer 1800 including a memory 1801, a processor 1802, a communication interface 1803, and a bus 1804. The memory 1801, the processor 1802, and the communication interface 1803 are communicatively connected to each other via a bus 1804.

The memory 1801 may be a read-only memory, a static storage device, a dynamic storage device, or a random access memory. The memory 1801 may store programs that, when executed by the processor 502, stored in the memory 1801, the processor 1802 and the communication interface 1803 are operable to perform portions of the methods of the AI platform for obtaining AI models. The memory may also store a set of images, such as: a part of the storage resources in the memory 1801 is divided into an image set storage module for storing image sets required by the AI platform, and a part of the storage resources in the memory 1801 is divided into an AI model storage module for storing an AI model library.

The processor 1802 may be a general-purpose central processing unit, a microprocessor, an application specific integrated circuit, a graphics processor, or one or more integrated circuits.

The communication interface 1803 enables communication between the computer 1800 and other devices or a communication network using transceiver modules, such as, but not limited to, transceivers. For example, the set of images may be acquired via the communication interface 1803.

The bus 504 may include a pathway to transfer information between various components of the computer 1800, such as the memory 1801, the processor 1802, and the communication interface 1803.

A communication path is established between each of the computers 1800 via a communication network. On each computer 1800 runs any one or more of a user I/O module 101, a hard case mining module 102, a model training module 103, an inference module 104, an AI model storage module 105, a data storage module 106, and a data pre-processing module 107. Any of the computers 1800 can be a computer in a cloud data center (e.g., a server), or a computer in an edge data center, or a terminal computing device.

The descriptions of the flows corresponding to the above-mentioned figures have respective emphasis, and for parts not described in detail in a certain flow, reference may be made to the related descriptions of other flows.

In the above embodiments, it may be entirely or partially implemented by software, hardware, or a combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product providing the AI platform comprises one or more computer program instructions to the AI platform which, when loaded and executed on a computer, cause, in whole or in part, the processes or functions described in accordance with embodiments of the present application with reference to fig. 5, 11, 14 or 15.

The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted from a computer-readable storage medium to another computer-readable storage medium, e.g., where the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, twisted pair, or wireless (e.g., infrared, wireless, microwave, etc.) Optical media (e.g., optical disks), or semiconductor media (e.g., SSDs).

Claims

1. A method of providing an artificial intelligence AI model, the method comprising:

the method comprises the steps that an AI platform receives a plurality of unmarked images of a first user, wherein the first user is an entity registering an account number in the AI platform;

the AI platform marks the plurality of images according to the initial AI model;

the AI platform determines the difficult cases in the plurality of images according to the labeling result;

the AI platform utilizes the difficult case to train the initial AI model to obtain an optimized AI model.

2. The method of claim 1, wherein the AI platform determines the difficult cases in the plurality of images from the annotation result, comprising:

the AI platform provides a confirmation interface for the first user, and a candidate difficulty case is shown to the first user in the confirmation interface, wherein the candidate difficulty case is at least one image in the plurality of images;

and the AI platform determines the difficult cases in the candidate difficult cases according to the operation of the first user on the confirmation interface.

3. The method according to claim 1 or 2,

the method further comprises the following steps: the AI platform receives a correction mark of the first user on the difficult case;

the AI platform training the initial AI model to obtain an optimized AI model using the difficult case comprises: the AI platform trains the initial AI model using the difficult cases and corresponding correction labels to obtain the optimized AI model.

4. The method according to any one of claims 1-3, further comprising:

the AI platform acquires one or more images with labels from the first user;

the AI platform obtains the initial AI model using the annotated one or more images.

5. The method according to any one of claims 1-4, further comprising:

the AI platform providing the optimized AI model to a device of a second user to cause the device to perform a task objective with the optimized AI model;

or

And the AI platform receives the inference image sent by the equipment of the second user, utilizes the optimized AI model to infer the inference image and provides an inference result for the equipment of the second user.

6. The method of any of claims 1-5, wherein the AI platform annotates the plurality of unlabeled images according to an initial AI model, comprising:

the AI platform provides a label selection interface for the first user, wherein the label selection interface comprises at least one label mode which can be selected by the first user;

and the AI platform receives the marking mode selected by the first user and marks the unmarked multiple images according to the initial AI model corresponding to the marking mode selected by the first user.

7. The method of any of claims 1-6, wherein the AI platform annotating the plurality of images according to an initial AI model comprises: classifying the plurality of images according to the initial AI model and/or performing object detection on the plurality of images according to the initial AI model.

8. An Artificial Intelligence (AI) platform, the AI platform comprising:

the user input/output I/O module is used for receiving a plurality of unlabelled images of a first user, wherein the first user is an entity for registering an account number in the AI platform;

the data preprocessing module is used for marking the plurality of images according to the initial AI model;

the difficult example mining module is used for determining difficult examples in the plurality of images according to the labeling result;

and the model training module is used for training the initial AI model by utilizing the difficult cases to obtain an optimized AI model.

9. The AI platform of claim 8,

the user I/O module is further configured to provide a confirmation interface to the first user, and present a candidate difficulty case to the first user in the confirmation interface, where the candidate difficulty case is at least one image in the plurality of images;

the difficult example mining module is further used for determining a difficult example in the candidate difficult examples according to the operation of the first user on the confirmation interface.

10. The AI platform of claim 8 or 9,

the user I/O module is further used for receiving correction marks of the first user on the difficult cases;

the model training module is used for training the initial AI model by using the difficult cases and the corresponding correction labels to obtain the optimized AI model.

11. The AI platform of any of claims 8-10,

the user I/O module is further used for acquiring one or more images with labels from the first user;

the model training module is further configured to obtain the initial AI model using the one or more images with labels.

12. The AI platform of any of claims 8-11,

the user I/O module is further configured to provide the optimized AI model to a device of a second user, such that the device performs a task objective with the optimized AI model;

or

The AI platform also includes an inference module that,

the user I/O module is also used for receiving the inference image sent by the equipment of the second user;

the reasoning module is used for reasoning the reasoning image by utilizing the optimized AI model;

the user I/O module is further configured to provide the inference result to the device of the second user.

13. The AI platform of any of claims 8-12,

the user I/O module is further configured to provide a label selection interface to the first user, where the label selection interface includes at least one label manner selectable by the first user;

the user I/O module is also used for receiving the marking mode selected by the first user;

the data preprocessing module is used for labeling the unmarked multiple images according to the initial AI model corresponding to the labeling mode selected by the first user.

14. The AI platform of any of claims 8-13, wherein the data pre-processing module is configured to classify the plurality of images according to the initial AI model and/or perform object detection on the plurality of images according to the initial AI model.

15. A method of optimizing an artificial intelligence, AI, model, the method comprising:

training the initial AI model according to the training image set to obtain an optimized AI model;

receiving a reasoning image set, and reasoning each reasoning image in the reasoning image set according to the optimized AI model to obtain a reasoning result;

determining a difficult case in the inference image set according to the inference result, wherein the difficult case indicates an inference image of which the error rate of the inference result obtained by the inference through the optimized AI model is higher than a target threshold;

and training the optimized AI model according to the difficult case to obtain a re-optimized AI model.

16. The method of claim 15,

the determining the difficult cases in the inference image set according to the inference result comprises:

determining the inference image set as a video segment;

determining difficult cases in the inference image set according to the inference result of each image in the inference image set;

or

And determining the inference image set as a non-video segment, and determining the difficult cases in the inference image set according to the inference result of each image in the inference image set and the training image set.

17. The method of claim 16,

the determining the difficult cases in the inference image set according to the inference results of the images in the inference image set comprises the following steps:

determining a target image in the inference image set, wherein the inference result of the target image is different from the inference result of an adjacent image of the target image in the video segment;

and determining the target image as an inexplicable case in the inference image.

18. The method of claim 16 or 17,

the determining the difficult cases in the inference image set according to the inference results of the images in the inference image set and the training image set comprises:

obtaining confidence degrees of the images in the inference image set under each category, and determining a first difficult case value of each image in the inference image set according to the highest two confidence degrees of each image in the inference image set;

acquiring surface layer feature distribution information of the images in the training image set, and determining a second difficultly-formulated value of each image in the inference image set according to the surface layer feature distribution information and the surface layer features of each image in the inference image set;

acquiring deep features of the images in the training image set and deep features of the images in the inference image set, and clustering the images in the training image set according to the deep features of the images in the training image set to obtain an image clustering result; determining a third difficult example value of each image in the inference image set according to the deep features of each image in the inference image set, the image clustering result and the inference result of each image in the inference image set;

determining a target difficult example value of each image in the inference image set according to one or more of the first difficult example value, the second difficult example value and the third difficult example value;

and determining a first number of images with the maximum target difficulty case value in the inference image set as difficulty cases in the inference image set.

19. The method of claim 16,

for a first target frame of a first image in the inference image set, judging whether a similar frame corresponding to the first target frame exists in images, in the video segment, of which the time sequence interval with the first image is less than or equal to a second number;

if the similar frame corresponding to the first target frame does not exist, determining the first target frame as a difficult case frame;

if a similar frame corresponding to the first target frame exists and a first image to which the first target frame belongs and a second image to which the similar frame belongs are not adjacent in the video clip, determining a difficult frame in an image between the first image and the second image according to the first target frame and the similar frame;

and determining the difficult cases in the inference image set according to the number of the difficult case frames of each image in the inference image set.

20. The method of claim 19, wherein the determining whether a similar frame corresponding to the first target frame exists in the images of the video segment whose time sequence interval with the first image is less than or equal to a second number comprises:

determining a tracking frame with the highest similarity to a first target frame in images with the time sequence interval less than or equal to a second number from the first images in the video clip;

determining the overlapping rate of the first target frame and each bounding box according to the tracking frame, all bounding boxes in the images of the video clip, the time sequence interval of which with the first image is less than or equal to the second number, and the first target frame;

if the boundary frame with the overlapping rate larger than the second numerical value exists, determining the boundary frame with the overlapping rate larger than the second numerical value as a similar frame corresponding to the first target frame;

if the boundary box with the overlapping rate larger than the second numerical value does not exist, determining that the similar box corresponding to the first target box does not exist.

21. The method of any one of claims 16, 19 or 20,

acquiring surface feature distribution information of the images in the training image set, and determining a fourth difficult example value of each image in the inference image set according to the surface feature distribution information of the images in the training image set and the surface features of the images in the inference image set, wherein the surface features comprise surface features of a boundary frame and surface features of the images;

acquiring the deep features of each frame in each image in the training image set and the deep features of each frame in each image in the inference image set, and clustering each frame in each image in the training image set according to the deep features of each frame in each image in the training image set to obtain a frame clustering result; determining a fifth difficultly-sample value of each image in the inference image set according to the deep features of each frame in each image in the inference image set, the frame clustering result and the inference result of each frame in each image in the inference image set;

determining a target difficulty value of each image of the inference image set according to one or more of the fourth difficulty value and the fifth difficulty value;

22. An Artificial Intelligence (AI) platform, the AI platform comprising:

the model training module is used for training the initial AI model according to the training image set to obtain an optimized AI model;

the reasoning module is used for receiving the reasoning image set, reasoning each reasoning image in the reasoning image set according to the optimized AI model and obtaining a reasoning result;

the difficult example mining module is used for determining difficult examples in the inference image set according to the inference result, wherein the difficult examples indicate inference images of which the error rate of the inference result obtained by inference through the optimized AI model is higher than a target threshold value;

the model training module is further used for training the optimized AI model according to the difficult case to obtain a re-optimized AI model.

23. The AI platform of claim 22,

the difficult-to-excavate module is used for:

determining the inference image set as a video segment;

or

24. The AI platform of claim 23,

the difficult-to-excavate module is used for:

25. The AI platform of claim 23 or 24,

the difficult-to-excavate module is used for:

26. The AI platform of claim 23,

the difficult-to-excavate module is used for:

27. The AI platform of claim 26, wherein the hard case mining module is to:

28. The AI platform of any of claims 23, 26 or 27,

the difficult-to-excavate module is used for:

29. A computing device, comprising a memory to store a set of computer instructions and a processor;

the processor executes a set of computer instructions stored by the memory to perform the method of any of the above claims 1-7.

30. A computing device, comprising a memory to store a set of computer instructions and a processor;

the processor executes a set of computer instructions stored by the memory to perform the method of any of the above claims 15 to 21.

31. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer program code which, when executed by a computing device, performs the method of any of the preceding claims 1 to 7.

32. A computer-readable storage medium, characterized in that the computer-readable storage medium stores computer program code which, when executed by a computing device, performs the method of any of the preceding claims 15 to 21.