WO2021051918A1 - Method for providing ai model, ai platform, computing device, and storage medium - Google Patents

Method for providing ai model, ai platform, computing device, and storage medium Download PDF

Info

Publication number
WO2021051918A1
WO2021051918A1 PCT/CN2020/097856 CN2020097856W WO2021051918A1 WO 2021051918 A1 WO2021051918 A1 WO 2021051918A1 CN 2020097856 W CN2020097856 W CN 2020097856W WO 2021051918 A1 WO2021051918 A1 WO 2021051918A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
reasoning
model
image set
platform
Prior art date
Application number
PCT/CN2020/097856
Other languages
French (fr)
Chinese (zh)
Inventor
杨洁
黄嘉伟
孙井花
陈轶
李鹏飞
白小龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021051918A1 publication Critical patent/WO2021051918A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method for providing an AI model, an AI platform, a computing device, and a storage medium.
  • the acquisition process of the AI model is generally based on the training data to train the AI model to obtain the final AI model. Since the initial AI model is trained only based on the training data and the AI model is not optimized, the reasoning ability of the AI model will be low.
  • This application provides a method for providing an artificial intelligence AI model, which can provide an AI model with stronger reasoning ability for developers who register an account on an AI platform.
  • this application provides a method for providing an artificial intelligence AI model, which includes:
  • the AI platform receives a plurality of unlabeled images of a first user, the first user is an entity registered with an account on the AI platform; the AI platform annotates the multiple images according to the initial AI model; the AI platform according to the annotation result Determine the hard cases in the multiple images; the AI platform uses the hard cases to train the initial AI model to obtain an optimized AI model.
  • the AI platform can provide the first user registered on the platform (for example: AI model developer) with an optimized AI model with stronger reasoning ability, so that the first user can easily and quickly obtain the optimized AI model, saving time and Human input.
  • the platform for example: AI model developer
  • the AI platform determines the difficult examples in the multiple images according to the annotation results, including: the AI platform provides a confirmation interface to the first user, and the candidate difficult examples are displayed to the first user in the confirmation interface, The candidate difficult example is at least one image among the multiple images; the AI platform determines the difficult example among the candidate difficult examples according to the operation of the first user on the confirmation interface.
  • the AI platform obtains the hard cases confirmed by the first user, improves the accuracy of the hard cases, and further improves the reasoning ability of the optimized AI model trained through these confirmed hard cases.
  • the method further includes: the AI platform receives correction annotations of the difficult cases from the first user; the AI platform uses the difficult cases to train the initial AI model to obtain the optimized AI model includes : The AI platform trains the initial AI model by using the difficult examples and the corresponding correction annotations to obtain the optimized AI model. By interacting with the first user, the AI platform obtains the first user's corrective annotations of difficult cases for training the initial AI model, which further improves the reasoning ability of the optimized AI model after training.
  • the method further includes: the AI platform obtains one or more annotated images from the first user; the AI platform obtains the initial AI model by using the one or more annotated images.
  • the method further includes: the AI platform provides the optimized AI model to the device of the second user, so that the device uses the optimized AI model to perform task goals; or, The AI platform receives the inference image sent by the device of the second user, uses the optimized AI model to perform inference on the inference image, and provides the inference result to the device of the second user.
  • This method provides two methods of sending an optimized AI model to the second user's device or using the optimized AI model online to provide reasoning services to the user.
  • the optimized AI model can be conveniently used for reasoning and can also be adapted to different task goals.
  • the AI platform annotating the multiple unlabeled images according to the initial AI model includes: the AI platform provides an annotation selection interface to the first user, and the annotation selection interface includes all The at least one labeling method selectable by the first user; the AI platform receives the labeling method selected by the first user, and labels the plurality of labels according to the initial AI model corresponding to the labeling method selected by the first user Unlabeled image.
  • This method provides the first user with different label selection methods, so that the first user can decide which labeling method to choose according to the image to be uploaded to the AI platform, which improves the flexibility of the AI platform to deal with various users or various scenarios.
  • the AI platform annotating the plurality of images according to the initial AI model includes: classifying the plurality of images according to the initial AI model and/or classifying the plurality of images according to the initial AI model Multiple images perform object detection.
  • the present application also provides an artificial intelligence AI platform
  • the AI platform includes: a user input and output I/O module, configured to receive a plurality of unlabeled images of a first user, the first user is An entity that registers an account on the AI platform; a data preprocessing module, used to annotate the multiple images according to the initial AI model; a difficult example mining module, used to determine difficult examples in the multiple images according to the annotation result; a model
  • the training module is used to train the initial AI model using the difficult examples to obtain an optimized AI model.
  • the user I/O module is further configured to provide a confirmation interface to the first user, and display candidate difficult examples to the first user in the confirmation interface, and the candidate The difficult example is at least one of the multiple images; the difficult example mining module is further configured to determine the difficult example among the candidate difficult examples according to the operation of the first user on the confirmation interface.
  • the user I/O module is further configured to receive correction annotations of the difficult cases by the user;
  • the model training module is specifically configured to use the difficult cases and corresponding Corrective annotation training the initial AI model to obtain the optimized AI model.
  • the user I/O module is further configured to obtain one or more tagged images from the first user; the model training module is also configured to use annotated images Or multiple images to obtain the initial AI model.
  • the user I/O module is further configured to provide the optimized AI model to a device of a second user, so that the device uses the optimized AI model to perform task goals; or ,
  • the AI platform further includes a reasoning module, the user I/O module is also used to receive the reasoning image sent by the second user’s device; the reasoning module is used to use the optimized AI model to The reasoning image is used for reasoning; the user I/O module is also used to provide the reasoning result to the device of the second user.
  • the user I/O module is further configured to provide a label selection interface to the first user, and the label selection interface includes at least one label selectable by the first user
  • the user I/O module is also used to receive the labeling method selected by the first user; the data preprocessing module is specifically used for the initial AI corresponding to the labeling method selected by the first user
  • the model labels the multiple unlabeled images.
  • the data preprocessing module is specifically configured to classify the multiple images according to the initial AI model and/or perform object detection on the multiple images according to the initial AI model .
  • this application also provides a method for optimizing an artificial intelligence AI model, characterized in that the method includes: training an initial AI model according to a training image set to obtain an optimized AI model; receiving a reasoning image set, according to The optimized AI model performs reasoning on each reasoning image in the reasoning image set to obtain a reasoning result; according to the reasoning result, determine a difficult example in the reasoning image set, wherein the difficult example indicates that the optimization A reasoning image in which the error rate of the reasoning result obtained by the AI model for reasoning is higher than the target threshold; the optimized AI model is trained according to the difficult case, and the re-optimized AI model is obtained.
  • This method determines the difficult cases based on the inference results, and uses the difficult cases to retrain the optimized AI model, so that the obtained re-optimized AI model has stronger reasoning ability.
  • the determining the difficult cases in the reasoning image set according to the reasoning result specifically includes: determining that the reasoning image set is a video clip; and reasoning according to each image in the reasoning image set As a result, determine the difficult cases in the inference image set; or, determine that the inference image set is a non-video segment, and determine the inference image set in the inference image set based on the inference result of each image in the inference image set and the training image set Hard case.
  • this method uses different hard case determination methods to determine the hard cases, and fully considers the characteristics of the reasoning image set itself, improves the accuracy of the determined hard cases, and further improves the reasoning of the re-optimized AI model. ability.
  • the determining a difficult case in the reasoning image set according to the reasoning result of each image in the reasoning image set includes: determining a target image in the reasoning image set, wherein the target The reasoning result of the image is different from the reasoning result of the adjacent image of the target image in the video segment; the target image is determined as a difficult case in the reasoning image.
  • the determining the difficult cases in the inference image set according to the inference result of each image in the inference image set and the training image set includes: obtaining the inference image set in the inference image set. For the confidence levels in each category, determine the first hard case value of each image in the inference image set according to the two highest confidence levels of each image in the inference image set; obtain the surface feature distribution information of the images in the training image set , According to the surface feature distribution information and the surface features of each image in the inference image set, determine the second hard case value of each image in the inference image set; acquire the deep features of each image in the training image set and the inference According to the deep features of each image in the image set, perform clustering processing on each image in the training image set according to the deep features of each image in the training image set to obtain an image clustering result; according to the deep features of each image in the inference image set , The image clustering result and the reasoning result of each image in the reasoning image set, determine the third hard case value of each image in the
  • the determining the difficult case in the reasoning image set according to the reasoning result of each image in the reasoning image set includes: for the first target frame of the first image in the reasoning image set, Among the images in the video segment whose time-series interval with the first image is less than or equal to the second number, determine whether there is a similar frame corresponding to the first target frame; if the first target does not exist A similar frame corresponding to the frame, then the first target frame is determined as a difficult case; if there is a similar frame corresponding to the first target frame, and the first image to which the first target frame belongs and the similar frame belong If the second image of is not adjacent in the video segment, determine the difficult frame in the image between the first image and the second image according to the first target frame and the similar frame; According to the number of difficult cases in each image in the reasoning image set, the hard cases in the reasoning image set are determined.
  • the similar frame includes: determining the tracking frame with the highest similarity to the first target frame among the images in the video segment whose time-series interval with the first image is less than or equal to a second number; A tracking frame, all bounding boxes in the image whose time sequence interval from the first image in the video segment is less than or equal to a second number, and the first target frame, and determining the first target frame.
  • the overlap rate with each bounding box if there is a bounding box with an overlap rate greater than the second value, the bounding box with an overlap rate greater than the second value is determined as the similar frame corresponding to the first target frame; if there is no overlap rate If the bounding box is larger than the second value, it is determined that there is no similar box corresponding to the first target box.
  • the determining the difficult cases in the inference image set according to the inference result of each image in the inference image set and the training image set includes: obtaining the surface layer of the image in the training image set Feature distribution information, based on the surface feature distribution information of the images in the training image set and the surface features of the images in the inference image set, determine the fourth difficult example value of each image in the inference image set, wherein the surface features include boundaries The surface features of the frame and the surface features of the image; the deep features of each frame in each image in the training image set and the deep features of each frame in each image in the inference image set are acquired, according to each image in the training image set For the deep features of each frame, perform clustering processing on each frame in each image in the training image set to obtain a frame clustering result; according to the deep features of each frame in each image in the inference image set, the frame cluster Class result and the reasoning result of each frame in each image in the reasoning image set, determine the fifth hard case value of each image in the reasoning image set; according
  • the present application also provides an artificial intelligence AI platform.
  • the AI platform includes: a model training module for training an initial AI model according to a set of training images to obtain an optimized AI model; an inference module for receiving inferences Image set, inferring each reasoning image in the reasoning image set according to the optimized AI model to obtain the reasoning result; the hard case mining module is used to determine the hard cases in the reasoning image set according to the reasoning result, Wherein, the hard case indicates an inference image whose error rate of the reasoning result obtained by reasoning through the optimized AI model is higher than a target threshold; the model training module is also used to compare the optimized AI model according to the hard case Carry out training to obtain and then optimize the AI model.
  • the difficult example mining module is specifically used to: determine that the reasoning image set is a video segment; and determine the difficulty of the reasoning image set according to the reasoning result of each image in the reasoning image set. Example; or, determining that the reasoning image set is a non-video segment, and determining difficult examples in the reasoning image set according to the reasoning result of each image in the reasoning image set and the training image set.
  • the difficult case mining module is specifically configured to: determine a target image in the inference image set, wherein the inference result of the target image and the target image are in the video segment The inference results of adjacent images are different; the target image is determined as a difficult example in the inference image.
  • the difficult example mining module is specifically used to: obtain the confidence of each image in each category in the reasoning image set, and according to the two highest confidences of each image in the reasoning image set Degree, determine the first hard case value of each image in the inference image set; obtain the surface feature distribution information of the images in the training image set, and determine according to the surface feature distribution information and the surface features of each image in the inference image set
  • the second hard example value of each image in the inference image set acquire the deep features of each image in the training image set and the deep features of each image in the inference image set, and compare Perform clustering processing on each image in the training image set to obtain an image clustering result; determine the image clustering result according to the deep features of each image in the inference image set, the image clustering result, and the inference result of each image in the inference image set
  • the third difficult example value of each image in the reasoning image set determining each image in the reasoning image set according to one or more of the first hard example value, the second hard example value, and the third hard example value
  • the difficult example mining module is specifically used to: for the first target frame of the first image in the inference image set, the first target frame in the video segment is in time sequence with the first image. In the image where the interval above is less than or equal to the second number, it is determined whether there is a similar frame corresponding to the first target frame; if there is no similar frame corresponding to the first target frame, the first target frame is determined A difficult example frame; if there is a similar frame corresponding to the first target frame, and the first image to which the first target frame belongs and the second image to which the similar frame belongs are not adjacent in the video segment, then According to the first target frame and the similar frame, determine the difficult case frame in the image between the first image and the second image; according to the number of difficult case frames in each image in the inference image set, Determine the hard cases in the reasoning image set.
  • the difficult case mining module is specifically configured to: determine the difference between the images in the video segment and the first image in the time sequence that is less than or equal to the second number.
  • the tracking frame with the highest degree of similarity to the first target frame; according to the tracking frame, all bounding boxes in the image whose time sequence interval with the first image in the video segment is less than or equal to the second number
  • the first target frame determine the overlap rate between the first target frame and each bounding box; if there is a bounding box with an overlap rate greater than a second value, determine the bounding box with an overlap rate greater than the second value as the all
  • the similar frame corresponding to the first target frame if there is no bounding box with an overlap ratio greater than the second value, it is determined that there is no similar frame corresponding to the first target frame.
  • the difficult example mining module is specifically used to: obtain the surface feature distribution information of the images in the training image set, and according to the surface feature distribution information of the images in the training image set and the inference image
  • the surface features of the concentrated images are determined, and the fourth difficult example value of each image in the inference image set is determined, where the surface features include the surface features of the bounding box and the surface features of the image; each of the images in the training image set is acquired
  • the deep features of the frames and the deep features of each frame in each image in the inference image set according to the deep features of each frame in each image in the training image set, cluster each frame in each image in the training image set Class processing to obtain a frame clustering result; according to the deep features of each frame in each image in the reasoning image set, the frame clustering result, and the reasoning result of each frame in each image in the reasoning image set, determine the The fifth difficulty example value of each image in the reasoning image set; according to one or more of the fourth difficulty example value and the fifth difficulty example value, the target difficulty example value of each image
  • the present application also provides a computing device, the computing device includes a memory and a processor, the memory is used to store a set of computer instructions; the processor executes a set of computer instructions stored in the memory to The computing device is caused to execute the method provided by the first aspect or any one of the possible implementation manners of the first aspect.
  • the present application provides a computer-readable storage medium that stores computer program code, and when the computer program code is executed by a computing device, the computing device executes the aforementioned first aspect or The method provided in any one of the possible implementations of the first aspect.
  • the storage medium includes, but is not limited to, volatile memory, such as random access memory, non-volatile memory, such as flash memory, hard disk (English: hard disk drive, abbreviation: HDD), solid state drive (English: solid state drive, Abbreviation: SSD).
  • the present application provides a computer program product.
  • the computer program product includes computer program code.
  • the computing device executes the foregoing first aspect or any of the first aspects.
  • the computer program product may be a software installation package.
  • the computer program product may be downloaded and executed on a computing device. Program product.
  • the present application also provides a computing device.
  • the computing device includes a memory and a processor.
  • the memory is used to store a set of computer instructions; the processor executes a set of computer instructions stored in the memory to The computing device is caused to execute the method provided by the third aspect or any one of the possible implementation manners of the third aspect.
  • the present application provides a computer-readable storage medium that stores computer program code, and when the computer program code is executed by a computing device, the computing device executes the aforementioned third aspect or The method provided in any possible implementation of the third aspect.
  • the storage medium includes, but is not limited to, volatile memory, such as random access memory, non-volatile memory, such as flash memory, hard disk (English: hard disk drive, abbreviation: HDD), solid state drive (English: solid state drive, Abbreviation: SSD).
  • the present application provides a computer program product.
  • the computer program product includes computer program code.
  • the computing device executes the foregoing third aspect or any of the third aspects.
  • the computer program product may be a software installation package.
  • the computer program product may be downloaded and executed on a computing device. Program product.
  • FIG. 1 is a schematic structural diagram of an AI platform 100 provided by an embodiment of the application.
  • FIG. 2 is a schematic diagram of an application scenario of an AI platform 100 provided by this application;
  • FIG. 3 is a schematic diagram of deployment of an AI platform 100 provided by an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of a computing device 400 for deploying an AI platform 100 according to an embodiment of the application;
  • FIG. 5 is a schematic diagram of a process for providing an AI model according to an embodiment of the application.
  • FIG. 6 is a schematic diagram of a data upload interface provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of an interface for starting smart labeling provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of a data labeling interface provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of a flow of reasoning using an optimized AI model provided by an embodiment of the application.
  • FIG. 10 is a schematic diagram of an interface for starting difficult case mining provided by an embodiment of this application.
  • FIG. 11 is a schematic flowchart of another method for determining difficult cases according to an embodiment of this application.
  • FIG. 12 is a schematic diagram of a surface layer feature distribution provided by an embodiment of this application.
  • FIG. 13 is a schematic diagram of determining a difficult case value provided by an embodiment of the application.
  • FIG. 14 is a schematic flowchart of another method for determining difficult cases according to an embodiment of the application.
  • 15 is a schematic flowchart of another method for determining difficult cases provided by an embodiment of the application.
  • FIG. 16 is a schematic diagram of determining a difficult case value provided by an embodiment of the application.
  • FIG. 17 is a schematic flowchart of an optimized AI model provided by an embodiment of the application.
  • FIG. 18 is a schematic structural diagram of a computing device provided by an embodiment of this application.
  • Machine learning is a core means to realize AI.
  • Machine learning has penetrated into various industries such as medicine, transportation, education, and finance. Not only professional and technical personnel, but also non-AI technology majors in various industries also look forward to using AI and machine learning to complete specific tasks.
  • the AI model is a type of mathematical algorithm model that uses machine learning ideas to solve practical problems.
  • the AI model includes a large number of parameters and calculation formulas (or calculation rules).
  • the parameters in the AI model can be trained through the training image set.
  • the obtained values, for example, the parameters of the AI model are the calculation formulas or the weights of the calculation factors in the AI model.
  • the AI model also contains some hyperparameters. Hyperparameters are parameters that cannot be obtained by training the AI model through the training image set. The hyperparameters can be used to guide the construction of the AI model or the training of the AI model. There are many hyperparameters.
  • the number of iterations of AI model training For example, the number of iterations of AI model training, learning rate (leaning rate), batch size, the number of layers of the AI model, and the number of neurons in each layer.
  • the difference between the hyperparameters of the AI model and the parameters is that the value of the hyperparameter of the AI model cannot be obtained by analyzing the training images in the training image set, while the value of the parameters of the AI model can be determined according to the training process.
  • the training images in the training image set are analyzed for modification and determination.
  • a neural network model is mathematical algorithm models that imitate the structure and function of biological neural networks (animal's central nervous system).
  • a neural network model can include a variety of neural network layers with different functions, and each layer includes parameters and calculation formulas. According to different calculation formulas or different functions, different layers in the neural network model have different names. For example, the layer that performs the convolution calculation is called the convolution layer, and the convolution layer is often used for feature extraction of the input signal (such as an image).
  • a neural network model can also be composed of a combination of multiple existing neural network models. Neural network models with different structures can be used in different scenarios (such as classification, recognition, etc.) or provide different effects when used in the same scenario.
  • Different neural network model structures specifically include one or more of the following: the number of network layers in the neural network model is different, the order of each network layer is different, and the weights, parameters or calculation formulas in each network layer are different.
  • Training an AI model refers to the use of existing images to make the AI model fit the rules of the existing image through a certain method, and to determine the parameters in the AI model.
  • Training an AI model requires preparing a training image set. According to whether the training images in the training image set are labeled (that is, whether the images have a specific type or name), the training of the AI model can be divided into supervised training and none.
  • Supervised training unsupervised trainng. When performing supervised training on the AI model, the training images in the training image set used for training are labeled.
  • the marked loss (loss) value adjusts the parameters in the AI model according to the loss value.
  • Each training image in the training image set is used to iteratively train the AI model, and the parameters of the AI model are continuously adjusted until the AI model can output the same output value corresponding to the training image with a higher accuracy according to the input training image.
  • the training images in the training image set are not labeled, and the training images in the training image set are sequentially input to the AI model, and the AI model gradually recognizes the association and potential between the training images in the training image set Rules until the AI model can be used to judge or identify the type or characteristics of the input image.
  • clustering After receiving a large number of training images, the AI model used for clustering can learn the characteristics of each training image and the associations and differences between the training images, and automatically divide the training images into multiple types. Different task types can use different AI models. Some AI models can only be trained by supervised learning, some AI models can only be trained by unsupervised learning, and some AI models can be trained both by supervised learning and It can be trained in unsupervised learning.
  • the trained AI model can be used to complete a specific task.
  • AI models in machine learning need to be trained in a supervised learning method. Training the AI model in a supervised learning method can enable the AI model to learn more specifically in the marked training image set. The association between training images and corresponding annotations in the training image set enables the trained AI model to predict other input inference images with higher accuracy.
  • a neural network model for image classification tasks In order to train a neural network model for image classification tasks, first collect images according to the task, build a training image set, and form training The image set contains three types of images, namely: apple, pear, and banana. The collected training images are stored in three folders according to their types. The folder name is the label of all the images in the folder. After the training image set is constructed, select a neural network model that can realize image classification (such as convolutional neural network (convolutional neural network, CNN)), and input the training images in the training image set into the CNN. The volume of each layer in the CNN The product core extracts and classifies the image features, and finally outputs the confidence that the image belongs to each type.
  • image classification such as convolutional neural network (convolutional neural network, CNN)
  • the loss function is used to calculate the loss value, and each layer in the CNN is updated according to the loss value and the CNN structure. Parameters. The foregoing training process continues until the loss value output by the loss function converges or all images in the training image set are used for training, then the training ends.
  • the loss function is a function used to measure the degree to which the AI model is trained (that is, used to calculate the difference between the predicted result of the AI model and the real target).
  • the loss function is a function used to measure the degree to which the AI model is trained (that is, used to calculate the difference between the predicted result of the AI model and the real target).
  • the loss function is used to judge the difference between the current AI model’s predicted value and the real target value, and the parameters of the AI model are updated until the AI model can predict the real desired target value or the real desired target. If the value is very close, it is considered that the AI model has been trained.
  • the trained AI model can be used to reason about the image and get the reasoning result.
  • the specific reasoning process is: in the scene of image classification, the image is input into the AI model, and the convolution kernel of each layer in the AI model extracts the features of the image, and outputs the category to which the image belongs based on the extracted features.
  • the scene of target detection also called object detection
  • the image is input into the AI model, and the convolution kernel of each layer in the AI model extracts the features of the image, and outputs each target included in the image based on the extracted features
  • the location and category of the bounding box When covering the scenes of image classification and target detection, the image is input into the AI model.
  • the convolution kernel of each layer in the AI model extracts the features of the image, and outputs the category of the image based on the extracted features, and each of the images included The location and category of the bounding box of the target.
  • some AI models have strong reasoning ability, while some AI models have weak reasoning ability.
  • the strong reasoning ability of the AI model means that when the AI model is used to reason about the image, the accuracy of the reasoning result is greater than or equal to a certain value.
  • the weak reasoning ability of the AI model means that when the AI model is used to reason about the image, the accuracy of the reasoning result is lower than the certain value.
  • a hard example is when the initial AI model is trained or the trained AI model is inferred when the output result of the initial AI model or the trained AI model is wrong or the error rate is high
  • the input data of the corresponding model For example, in the training process of the AI model, in the process of labeling unlabeled images, an image with an error rate of the labeled result higher than the target threshold is a difficult example. In the reasoning process of the AI model, in the reasoning image set, the image whose error rate of the reasoning result output by the AI model is higher than the target threshold is a difficult example.
  • Difficult case mining refers to the method of identifying an image as a difficult case.
  • the AI platform is a platform that provides a convenient AI development environment and convenient development tools for AI developers and users. There are various AI models or AI sub-models built in the AI platform to solve different problems.
  • the AI platform can search for and establish an applicable AI model according to the needs of users. Users only need to determine their needs in the AI platform and prepare them according to the prompts.
  • the training image set is uploaded to the AI platform, and the AI platform can train an AI model for the user that can be used to realize the user's needs. Or, the user prepares his own algorithm and training image set according to the prompts, and uploads it to the AI platform. Based on the user's own algorithm and training image set, the AI platform can train an AI model that can be used to realize the user's needs. Users can use the AI model completed by training to complete their own specific tasks.
  • the trained AI model has weak reasoning ability.
  • the embodiment of the application provides an AI platform, which introduces difficult case mining technology, so that the AI platform forms a closed-loop process of AI model construction, training, reasoning, difficult case mining, retraining, and rereasing, satisfying developers At the same time, the accuracy of the AI model is improved (that is, the reasoning ability of the AI model is improved).
  • AI models include deep learning models, machine learning models, and so on.
  • FIG. 1 is a schematic structural diagram of the AI platform 100 in an embodiment of the application. It should be understood that Figure 1 is only an exemplary structural schematic diagram of the AI platform 100, and this application does not limit the modules in the AI platform 100 The division.
  • the AI platform 100 includes a user input/output (I/O) module 101, a difficult case mining module 102, a model training module 103, an inference module 104, and a data preprocessing module 105.
  • the AI platform may further include an AI model storage module 106 and a data storage module 107.
  • User I/O module 101 used to receive task goals input or selected by the user, receive the training image set of the first user, receive the reasoning image set sent by the device of the second user, etc., where the training image set includes unlabeled multiple Images (can be referred to as multiple unlabeled training images).
  • the user I/O module 101 is also used to receive correction annotations for difficult cases from the first user, obtain one or more annotations from the first user, provide the optimized AI model to the device of the second user, and receive the second user Inference images sent by your device.
  • GUI graphical user interface
  • CLI command line interface
  • the AI platform 100 displayed on the GUI can provide users with multiple AI services (such as image classification services, target detection services, etc.).
  • the user can select a task target on the GUI, for example, the user selects an image classification service, and the user can continue to upload multiple unmarked images in the GUI of the AI platform.
  • the GUI After the GUI receives the task target and multiple unlabeled images, it communicates with the model training module 103.
  • the model training module 103 selects or searches for an AI model that can be used to complete the construction of the user's task goal according to the task goal determined by the user.
  • the user I/O module 101 is also used to receive the difficult cases output by the difficult case mining module 102, and provide a GUI for the user to confirm the difficult cases.
  • the user I/O module 101 may also be used to receive the user's input for the expected effect of the AI model for completing the task goal.
  • the accuracy of inputting or selecting the finally obtained AI model for face recognition is higher than 99%.
  • the user I/O module 101 may also be used to receive an AI model input by the user, etc.
  • users can enter the initial AI model in the GUI based on their mission goals.
  • the user I/O module 101 can also be used to receive the superficial features and deep features of the reasoning image in the reasoning image set input by the user.
  • the surface features include one of the resolution of the image, the aspect ratio of the image, the mean and variance of the red-green-blue (RGB) of the image, the brightness of the image, the saturation of the image, or the sharpness of the image.
  • RGB red-green-blue
  • One or more, deep features refer to the abstract features of the image extracted using the convolution kernel in the feature extraction model (such as CNN, etc.).
  • the surface features include the surface features of the bounding box and the surface features of the image.
  • the surface features of the bounding box can include the aspect ratio of each bounding box in a single frame image, and the aspect ratio of each bounding box in a single frame image.
  • One or more of the fuzzy degree of the frame, the surface characteristics of the image can include the resolution of the image, the aspect ratio of the image, the mean and variance of the RGB of the image, the brightness of the image, the saturation of the image, or the sharpness of the image , One or more of the number of frames in a single frame image or the variance of the area of the frame in a single frame image.
  • Deep features refer to the abstract features of the image extracted using the convolution kernel in the feature extraction model (such as CNN, etc.).
  • the user I/O module 101 may also be used to provide a GUI for the user to label the training images in the training image set.
  • the user I/O module 101 may also be used to provide various pre-built initial AI models for the user to choose from. For example, users can select an initial AI model on the GUI according to their mission goals.
  • the user I/O module 101 may also be used to receive various configuration information of the initial AI model and training images in the training image set by the user.
  • the difficult case mining module 102 is used to determine difficult cases in the inference image set received by the user I/O module 101.
  • the hard case mining module 102 can communicate with the inference module 104 and the user I/O module 101.
  • the hard case mining module 102 can obtain the reasoning result of the reasoning module 104 on the reasoning image in the reasoning image set from the reasoning module 104, and mine the hard cases in the reasoning image set based on the reasoning result.
  • the hard case mining module 102 can also provide the user I/O module 101 with the hard cases mined.
  • the hard case mining module 102 can also be used to obtain the surface features and deep features of the reasoning image in the reasoning image set input by the user from the user I/O module 101.
  • Model training module 103 used to train the AI model.
  • the model training module 103 can communicate with the user I/O module 101, the inference module 104, and the AI model storage module 106.
  • the specific treatment is:
  • the initial AI model includes an AI model that has not been trained, and an AI model that has been trained but is not optimized based on difficult examples.
  • the untrained AI model means that the constructed AI model has not been trained using the training image set, and the parameters in the constructed AI model are all preset values.
  • AI models that are trained but not optimized based on difficult cases refer to AI models that can be used for reasoning but are not optimized based on difficult cases.
  • the initial AI model is an AI model obtained by training the constructed AI model using only the annotated training images in the training image set.
  • the specific processing is: the AI platform determines the AI for the user to complete the construction of the user's task goal according to the user's task goal model.
  • the model training module 103 can communicate with the user I/O module 101 and the AI model storage module 106.
  • the model training module 103 selects a ready-made AI model from the AI model library stored in the AI model storage module 106 according to the user’s task goal, as the constructed AI model, or the model training module 103 according to the user’s task goal, or the user’s task
  • the expected effect of the target or some configuration parameters input by the user search the AI sub-model structure in the AI model library, and specify some hyperparameters of the AI model, for example, the number of layers of the model, the number of neurons in each layer, etc., for AI Model construction, and finally get a constructed AI model.
  • some hyperparameters of the AI model may be hyperparameters determined by the AI platform based on the experience of AI model construction and training.
  • the model training module 103 obtains a training image set from the user I/O module 101.
  • the model training model 103 determines some hyperparameters during training of the constructed AI model according to the characteristics of the training image set and the structure of the constructed AI model. For example, the number of iterations, learning rate, batch size, etc.
  • the model training module 103 uses the marked images in the acquired training image set to perform automatic training on the constructed AI model, and continuously updates the internal parameters of the constructed AI model during the training process to obtain the initial AI model. It is worth noting that some hyperparameters during training of the constructed AI model may be hyperparameters determined by the AI platform based on the experience of model training.
  • the model training module 103 inputs the unlabeled images in the training image set to the initial AI model and outputs the inference results of the unlabeled images.
  • the model training module 103 transmits the inference results to the difficult case mining module 102, which is based on reasoning As a result, difficult examples in the unlabeled images are excavated and fed back to the model training module 103.
  • the model training module 103 uses difficult examples to continue to optimize the training of the initial AI model to obtain an optimized AI model.
  • the model training module 103 provides the optimized AI model to the inference module 104 for inference processing. It should be noted here that if the initial AI model is the initial AI model stored in the AI model storage module 106, the training images in the training image set may be all unlabeled images. If the initial AI model is a constructed AI model, the training images in the training image set include part of the unlabeled images and part of the labeled images.
  • the inference module 104 uses the optimized AI model to perform inference on the inference image in the inference image concentration, and outputs the inference result of the inference image in the inference image concentration.
  • the hard case mining module 102 obtains the reasoning result from the reasoning module 104, and based on the reasoning result, determines the hard case in the reasoning image set.
  • the model training module 103 continues to train the optimized AI model based on the difficult examples provided by the difficult example mining module 102 to obtain a more optimized AI model.
  • the model training module 103 transmits the more optimized AI model to the AI model storage module 106 for storage, and transmits the more optimized AI model to the inference module 104 for inference processing.
  • the inference module 104 performs inference on the inference images in the inference image and obtains the difficult cases, and then optimizes the optimized AI model, it is actually the same as using the difficult cases in the training images to optimize the initial AI model.
  • the difficult examples in the inference image are used as training images.
  • the model training module 103 may also be used to determine the AI model selected by the user on the GUI as the initial AI model. Or the AI model input by the user on the GUI is determined as the initial AI model.
  • the initial AI model may also include an AI model after training the AI model in the AI model storage module 106 using images in the training image set.
  • the inference module 104 is used to perform inference on the inference image in the inference image based on the AI model to obtain the inference result.
  • the reasoning module 104 can communicate with the difficult case mining module 102, the user I/O module 101, and the AI model storage module 105.
  • the inference module 104 obtains the centralized inference image of the inference image from the user I/O module 101, performs inference processing on the centralized inference image of the inference image, and obtains the inference result of the inference image in the centralized inference image.
  • the reasoning module 104 transmits the reasoning result to the hard case mining module 102, so that the hard case mining module 102 mines the hard cases in the reasoning image set based on the reasoning result.
  • the data preprocessing module 105 is configured to perform preprocessing operations on the training images in the training image set and the inference image in the inference image set received by the user I/O module 101.
  • the data preprocessing module 105 can read the training image set or the inference image set received by the user I/O module 101 from the data storage module 107, and then preprocess the inference image in the inference image set or the training image in the training image set.
  • Preprocessing the training images in the training images or the inference images in the inference images in the training images uploaded by the user can make the training images in the training images or the inference images in the inference images have consistency in size, and can also remove the training images or inference images in the training images. Focus on the inappropriate data in the image.
  • the preprocessed training image set can be suitable for training the constructed AI model or training the initial AI model, and can also make the training effect better.
  • the preprocessed inference image and the centralized inference image may be suitable for input to the second AI model for inference processing.
  • the preprocessed training image set or the inference image set is stored in the data storage module 107.
  • the preprocessed training image set is sent to the model training module 103, and the preprocessed inference image set is sent to the inference module 104.
  • the data storage module 107 can also be used as a part of the data preprocessing module 105, even if the data preprocessing module 105 has the function of storing images.
  • AI model storage module 106 used to store the initial AI model, optimized AI model, and AI sub-model structure, etc., and can also be used to store the AI model determined and constructed according to the AI sub-model structure.
  • the AI model storage module 106 can communicate with the user I/O module 101 and the model training module 103.
  • the AI model storage module 106 receives and stores the trained initial AI model and the optimized AI model transmitted by the model training module 103.
  • the AI model storage module 106 provides the constructed AI model or the initial AI model for the model training module 103.
  • the AI model storage module 106 stores the initial AI model uploaded by the user and received by the user I/O module 101. It should be understood that, in another embodiment, the AI model storage module 106 may also be used as a part of the model training module 103.
  • the data storage module 107 (for example, it can be the data storage resource corresponding to the Object Storage Service (OBS) provided by the cloud service provider): used to store the training image set and inference image set uploaded by the user, and also used for storage Data processed by the data preprocessing module 105.
  • OBS Object Storage Service
  • the AI platform in this application can be a system that can interact with users.
  • This system can be a software system, a hardware system, or a combination of software and hardware, which is not limited in this application.
  • the AI platform provided by the embodiments of the present application can provide users with services for training AI models, so that the AI platform can provide optimized AI models after training.
  • the AI platform can dig out difficult cases from unlabeled images, and further train the initial AI model based on the difficult cases to obtain an optimized AI model, so that the reasoning results of the AI model are more accurate.
  • FIG. 2 is a schematic diagram of an application scenario of an AI platform 100 provided by an embodiment of the application.
  • the AI platform 100 may be all deployed in a cloud environment.
  • the cloud environment is an entity that uses basic resources to provide cloud services to users in the cloud computing mode.
  • the cloud environment includes cloud data centers and cloud service platforms.
  • Cloud data centers include a large number of basic resources (including computing resources, storage resources, and network resources) owned by cloud service providers.
  • the computing resources included in cloud data centers can be a large number of computing equipment ( For example server).
  • the AI platform 100 can be independently deployed on a server or a virtual machine in a cloud data center.
  • the AI platform 100 can also be deployed on multiple servers in a cloud data center or distributed in a cloud data center.
  • the AI platform 100 is abstracted by the cloud service provider into an AI cloud service provided to users on the cloud service platform. After the user purchases the cloud service on the cloud service platform (pre-rechargeable and then based on the final resource usage) Settlement), the cloud environment uses the AI platform 100 deployed in the cloud data center to provide users with AI platform cloud services.
  • the user can determine the tasks to be completed by the AI model through the application program interface (API) or GUI, upload the training image set and the reasoning image set to the cloud environment, the AI platform in the cloud environment 100 Receive user task information, training image set, and reasoning image set, perform data preprocessing, AI model training, use the trained AI model to reason about the reasoning image in the reasoning image set, perform hard case mining and rebuild based on the hard cases mined Training AI models and other operations.
  • AI platform returns content such as difficult cases unearthed to users through API or GUI. The user further chooses whether to retrain the AI model based on difficult cases.
  • the trained AI model can be downloaded by users or used online to complete specific tasks.
  • the AI platform 100 in a cloud environment when abstracted into an AI cloud service provided to users, it can be divided into two parts, namely: a basic AI cloud service and an AI hard case mining cloud service. Users can purchase only basic AI cloud services on the cloud service platform, and then purchase them when they need to use AI to mine cloud services. After purchase, the cloud service provider will provide AI to mine cloud service APIs, and finally according to the number of API calls Additional billing for AI difficult mining cloud services.
  • the deployment of the AI platform 100 provided in the present application is relatively flexible. As shown in FIG. 3, in another embodiment, the AI platform 100 provided in the present application can also be deployed in different environments in a distributed manner.
  • the AI platform 100 provided in this application can be logically divided into multiple parts, and each part has a different function.
  • the AI platform 100 includes a user I/O module 101, a difficult example mining module 102, a model training module 103, an AI model storage module 105, and a data storage module 106.
  • Each part of the AI platform 100 can be respectively deployed in any two or three environments among the terminal computing device, the edge environment, and the cloud environment.
  • Terminal computing devices include: terminal servers, smart phones, notebook computers, tablet computers, personal desktop computers, smart cameras, etc.
  • the edge environment is an environment that includes a collection of edge computing devices that are closer to the terminal computing device.
  • the edge computing device includes: edge servers, edge small stations with computing capabilities, and so on.
  • the various parts of the AI platform 100 deployed in different environments or devices are collaboratively implemented to provide users with functions such as determining and training the constructed AI model.
  • the user I/O module 101, the data storage module 106, and the data preprocessing module 107 in the AI platform 100 are deployed in the terminal computing device. It is difficult to deploy the AI platform 100 in the edge computing device in the edge environment.
  • Example mining module 102, model training module 103, inference module 104, and AI model storage module 105 are examples of the AI platform 100 deployed in the terminal computing device.
  • the user sends the training image set and the inference image set to the user I/O module 101 in the terminal computing device.
  • the terminal computing device stores the training image set and the inference image set in the data storage module 106, and the data preprocessing module 102 collects the training image
  • the training image and the inference image are preprocessed in the centralized inference image, and the preprocessed training image is also stored in the data storage module 106.
  • the model training module 103 in the edge computing device determines the constructed AI model according to the user’s task goals, and obtains the initial AI model based on the constructed AI model and training image set training images, and further based on the difficult examples in the unlabeled images in the training image set And the initial AI model training to get the optimized initial AI model.
  • the difficult example mining module 102 may also mine the difficult examples included in the reasoning image set based on the optimized AI model.
  • the model training module 103 trains the optimized AI model based on difficult cases to obtain a more optimized AI model. It should be understood that this application does not restrict the deployment of which parts of the AI platform 100 are deployed in which environment, and the actual application can be carried out according to the computing capabilities of the terminal computing device, the resource occupancy of the edge environment and the cloud environment, or specific application requirements. Adaptive deployment.
  • the AI platform 100 may also be separately deployed on a computing device in any environment (for example, separately deployed on an edge server in an edge environment).
  • 4 is a schematic diagram of the hardware structure of a computing device 400 on which the AI platform 100 is deployed.
  • the computing device 400 shown in FIG. 4 includes a memory 401, a processor 402, a communication interface 403, and a bus 404.
  • the memory 401, the processor 402, and the communication interface 403 realize the communication connection between each other through the bus 404.
  • the memory 401 may be a read only memory (ROM), a random access memory (RAM), a hard disk, a flash memory or any combination thereof.
  • the memory 401 can store programs. When the programs stored in the memory 401 are executed by the processor 402, the processor 402 and the communication interface 403 are used to execute the AI platform 100 to train AI models for users, mine difficult examples, and further optimize the AI model based on the difficult examples Methods.
  • the memory can also store image collections. For example, a part of the storage resources in the memory 401 is divided into a data storage module 106 for storing data required by the AI platform 100, and a part of the storage resources in the memory 401 is divided into an AI model storage module 105 for storing AI Model library.
  • the processor 402 may adopt a central processing unit (CPU), an application specific integrated circuit (ASIC), a graphics processing unit (GPU) or any combination thereof.
  • the processor 402 may include one or more chips.
  • the processor 402 may include an AI accelerator, such as a neural network processor (neural processing unit, NPU).
  • the communication interface 403 uses a transceiver module such as a transceiver to implement communication between the computing device 400 and other devices or a communication network. For example, data can be obtained through the communication interface 403.
  • a transceiver module such as a transceiver to implement communication between the computing device 400 and other devices or a communication network. For example, data can be obtained through the communication interface 403.
  • the bus 404 may include a path for transferring information between various components of the computing device 400 (for example, the memory 401, the processor 402, and the communication interface 403).
  • AI technology is widely used in many fields.
  • AI technology is used in the field of self-driving and assisted driving of vehicles, specifically for lane line recognition, traffic light recognition, automatic parking spot recognition, and detection Sidewalks and other processing.
  • these processes can be considered as using the AI model in the AI platform for image classification and/or target detection.
  • the AI model is used to determine traffic light recognition
  • the AI model is used to recognize lane lines.
  • Image classification is mainly used to determine the category of the image (that is, input a frame of image, output the category of the image).
  • Target detection can include two aspects.
  • This embodiment of the application will take image classification and target detection as examples to illustrate how to provide an AI model in an AI platform.
  • Step 501 The AI platform receives multiple unlabeled images of the first user.
  • the first user is a user of an entity who has registered an account on the AI platform.
  • developers of AI models, etc. For example, developers of AI models, etc.
  • the first user wants to obtain an AI model on the AI platform, he can set multiple unlabeled images in a folder, and then open the image upload interface provided by the AI platform.
  • the upload interface includes the input location of the image.
  • the first user can add the storage location of the training image set at the input location of the image, and upload multiple unlabeled images to the AI platform. In this way, the AI platform can receive multiple unlabeled images of the first user.
  • the upload interface also displays the logo (used to mark the image uploaded this time), the annotation type (used to indicate the purpose of the AI model trained with the image, such as target detection or image classification, etc.), Creation time, image input location, image tag set (such as person, car, etc.), name (such as target, object, etc.), description, version name, etc.
  • logo used to mark the image uploaded this time
  • annotation type used to indicate the purpose of the AI model trained with the image, such as target detection or image classification, etc.
  • Creation time image input location
  • image tag set such as person, car, etc.
  • name such as target, object, etc.
  • description version name, etc.
  • step 502 the AI platform annotates multiple images according to the initial AI model.
  • the AI platform can obtain the initial AI model, and then input multiple unlabeled images into the initial AI model to obtain labeling results of the multiple unlabeled images.
  • the annotation result of the image is the category to which the image belongs. If the image is an image of apple, the category to which the image belongs is apple.
  • the initial AI model is used for target detection, the labeling result of the image is the position of the bounding box of the target included in the image and the category to which the target belongs, where the target may be an object included in the image, such as a car, a person, a cat, etc.
  • the initial AI model is used for both image classification and target detection, the labeling result of the image is the category to which the image belongs, the position of the bounding box of the target included in the image, and the category to which the target belongs.
  • the first user can also upload annotated images to the AI platform.
  • Step 503 The AI platform determines difficult cases in multiple images according to the annotation results.
  • the AI platform After the AI platform obtains the labeling results of the unlabeled images, it can determine the difficult examples included in the unlabeled images according to the labeling results (the concept of difficult examples is explained in the previous section). , I won’t repeat it here).
  • step 504 the AI platform trains the initial AI model using difficult examples to obtain an optimized AI model.
  • the AI platform can use the difficult examples to continue training the initial AI model after determining the difficult examples.
  • the specific processing is: input some of the difficult examples into the initial AI model to obtain output results. Determine the difference between the output result and the labeling result of the difficult case. Based on the difference, adjust the parameters in the initial AI model, continue to use another part of the difficult case, and repeat the above process until all the difficult cases are used for training, or until If the difference between the result predicted by the optimized AI model and the annotation result is less than a certain threshold, it is considered that the optimized AI model is obtained.
  • this application also provides a method for obtaining the initial AI model in step 502, and the processing of labeling multiple unlabeled images based on the initial AI model, and the processing is as follows:
  • the AI platform provides a label selection interface to the first user, and the label selection interface includes at least one labeling method selectable by the first user. Receive the labeling mode selected by the first user, and label a plurality of unlabeled images according to the initial AI model corresponding to the labeling mode selected by the first user.
  • the first user labels some of the unlabeled images. If the remaining images do not want to be labeled, the first user can click the smart label option , Trigger to enter the label selection interface. Or after the first user provides multiple unlabeled images, the first user may directly click the smart label option without labeling to trigger the entry of the label selection interface.
  • One or more labeling methods are provided in the label selection interface. If there is only one labeling method, the option of whether to select the labeling method is displayed in the labeling selection interface. If the first user wants to select the labeling method, he can click the "Yes" option to trigger the selection of the labeling method.
  • the first user does not select the labeling method, he can click the "No" option and the labeling method will not be selected.
  • these multiple labeling methods can be displayed in the labeling selection interface, and the selection options corresponding to each labeling method are displayed. The first user can select the desired options through the corresponding selection options of the labeling method. The labeling method used, and then submit, so that the AI platform will receive the labeling method selected by the user.
  • multiple labeling methods provided in the label selection interface may include active learning methods and pre-labeling methods.
  • the processing of the AI platform in the active learning mode is: the AI platform first uses the AI model constructed by training and constructing multiple labeled images provided by the first user to obtain the initial AI model. Then, based on the initial AI model, the unlabeled images are annotated, and the annotation results of the images are obtained.
  • the processing of the AI platform in the pre-labeling mode is: the AI platform directly obtains the existing initial AI model, annotates multiple unlabeled images based on the initial AI model, and obtains the labeling results of multiple images.
  • the label selection interface also displays the number of all images, the number of unlabeled images, the number of labeled images, and the number to be confirmed (to be confirmed refers to the number of difficult cases to be confirmed by the user).
  • the AI platform may use the AI model constructed by training and constructing multiple labeled images provided by the first user to obtain the initial AI model. Then the AI platform inputs multiple unlabeled images into the initial AI model, and obtains the labeling results of the multiple unlabeled images.
  • the multiple images with annotations may be the first user annotating some of the unlabeled images to obtain multiple images with annotations, or they may be directly provided by the first user. Annotated multiple images.
  • the AI platform can directly obtain the initial AI model (the initial AI model can be the AI model uploaded by the first user, or it can be preset in the AI platform AI model). Then the AI platform inputs multiple unlabeled images into the initial AI model, and obtains the labeling results of the multiple unlabeled images.
  • a process for the first user to label training images on the AI platform is also provided.
  • the specific processing is: when the first user chooses to label unlabeled images by himself, the first user can determine what he wants to train Is the AI model used in the scene of image classification, in the scene of target detection, or in the scene of the combination of user image classification and target detection. If the AI model to be trained is applied to the scene of image classification, the AI platform is started to provide an image annotation interface of the scene of image classification. If the AI model to be trained is applied to the scene of target detection, the image annotation interface of the scene of target detection provided by the AI platform is started.
  • options such as selecting image, bounding box, return key, zooming in image, and zooming out image are provided.
  • the first user can open a frame of image by selecting the image option. Then in the image, use the bounding box to label the target, and add a label to the target.
  • the label can include the category of the target in the bounding box and the position of the bounding box in the image (because the bounding box is generally rectangular, the position can use the upper left corner and The position coordinate mark in the lower right corner).
  • the AI platform will obtain the bounding box of the target, and the labeling information column of the image can also be displayed in the image labeling interface.
  • the label information column displays the information of the target that the first user has labelled, including label, bounding box and operation.
  • the label is used to indicate the category of the target, and the bounding box is used to indicate the shape of the used frame.
  • the operations include deletion and modification. Options. The first user can modify the added annotations in the image through operations.
  • the above-mentioned bounding box is a rectangular box that can completely surround the target.
  • the initial AI model of the image training department with annotations can also be used, and the specific processing is as follows:
  • the first user when the first user provides multiple unlabeled images, he can also provide one or more labeled images.
  • the two can be uploaded together, or one or more labeled images can be uploaded first. , And upload multiple unlabeled images.
  • the AI platform can obtain the pre-selected AI model.
  • the pre-selected AI model can be the AI model selected by the user (which can include the AI model uploaded by the user or the AI model selected by the user in the AI platform), or the AI platform based on this task AI model for target selection.
  • the AI platform uses one or more labeled images to train the pre-selected AI model to obtain the initial AI model (see the process of supervised training for the training process).
  • the AI platform may also provide the difficult cases to the first user, so that the first user can further confirm whether the candidate difficult cases selected by the AI platform are difficult cases.
  • the specific processing is as follows:
  • the AI platform provides a confirmation interface to the first user, and displays the candidate difficult example to the first user in the confirmation interface.
  • the candidate difficult example is at least one image among multiple images. According to the operation of the first user on the confirmation interface, the hard examples among the candidate hard examples are determined.
  • the AI platform determines the candidate difficult examples in the multiple unlabeled images based on the annotation results (the candidate difficult examples refer to the identification of one or more images in the multiple unlabeled images only after the annotation results , And has not been confirmed by the first user).
  • the AI platform can provide the candidate difficult examples to the first user by providing a confirmation interface to the first user, in which the first user is shown the candidate difficult examples in the multiple unlabeled images.
  • the first user can open any candidate difficult case, and then can subjectively judge whether the labeling result of the candidate difficult case is correct, if the labeling result of the candidate difficult case is correct, the confirmation operation can be performed, and the AI platform can receive the confirmation operation and confirm The candidate hard case is a hard case. In this way, since confirmation is provided to the first user, the determined difficult cases are more correct.
  • the labeling results of these difficult cases can also be corrected. This processing can be performed after step 503 or after the user confirms the difficult cases. as follows:
  • the AI platform receives users' correction annotations for difficult cases.
  • Training the initial AI model using difficult examples to obtain an optimized AI model includes: using difficult examples and corresponding correction annotations to train the initial AI model to obtain an optimized AI model.
  • the AI platform can provide the difficult case to the first user by providing a confirmation interface to the first user. Show difficult examples in multiple unlabeled images. The first user can open any difficult case, and then the first user can subjectively judge whether the labeling result of the difficult case is correct. If it is not correct, the first user can correct the marked result. After the correction is completed, the first user confirms the correction of the difficult case, the AI platform will receive the confirmation operation, and the AI platform can confirm that the difficult case is available, and The marking result of this difficult case is the correction marking submitted by the first user.
  • the AI platform may use the difficult case determined in step 503 and the corrective annotation corresponding to the difficult case to train the initial AI model to obtain an optimized AI model.
  • the AI platform may use the difficult case confirmed by the first user and the correction label corresponding to the difficult case to train the initial AI model to obtain an optimized AI model. In this way, since the first user corrects the labeling result of the difficult example, the labeling result of the difficult example is correct, and the reasoning ability of the optimized AI model of the training office is stronger.
  • the AI platform can use correction annotations for difficult cases to train the initial AI model to obtain an optimized AI model.
  • this application may also provide an optimized AI model for the second user to use. Specifically, it may include two providing methods, offline providing methods and online providing methods. The following method one is offline providing method, and method two Ways to provide online:
  • Manner 1 Provide the optimized AI model to the AI device of the second user, so that the AI device uses the optimized AI model to perform the task goal.
  • AI equipment refers to equipment that runs AI models, such as driving recorders.
  • the AI platform may send the optimized AI model to the AI device, and the AI device receives the optimized AI
  • the optimized AI model can be run on the AI device, so that the AI device can use the optimized AI model to perform mission goals.
  • the AI device is a driving recorder, and the optimized AI model can be used to detect lane lines and so on.
  • the second user can download the optimized AI model from the AI platform to a certain device, and then install the optimized AI model on the AI device, so that the AI device can use the optimized AI model to perform mission goals.
  • Manner 2 Receive the reasoning image sent by the second user's device, use the optimized AI model to reason about the reasoning image, and provide the reasoning result to the second user's device.
  • the second user wants to use the optimized AI model, he can open the AI platform through his own device, register an account on the AI platform, and then use the registered account to log in to the AI platform. Then the second user can find the optimized AI model in the AI model provided by the AI platform, and use the operation guidance provided by the AI platform to upload the reasoning image to the AI platform. After receiving the reasoning image, the AI platform can input the reasoning image to the optimized AI model to obtain the reasoning result of the reasoning image, and then send the reasoning result to the device of the second user. Among them, if the optimized AI model is used for image classification, the inference result is the category to which the inferred image belongs.
  • the inference result is the position of the bounding box of the target included in the inference image and the category to which the target belongs. If the optimized AI model is used for target detection and image classification, the inference result is the category to which the inferred image belongs, and the position of the bounding box of the target included in the inferred image and the category to which the target belongs.
  • a one-click online option is also provided in the confirmation interface, and the user can trigger the AI platform by operating the one-key online option Automatically use difficult examples to train the initial AI model to obtain an optimized AI model.
  • the optimized AI model can be used to infer the inference image, as shown in Figure 9, the specific processing is:
  • Step 901 The AI platform receives multiple inference images uploaded by the user.
  • the user wants to use the optimized AI model to reason about the reasoning image.
  • the reasoning image includes multiple reasoning images (the reasoning image is It is also an unlabeled image).
  • the process of uploading multiple inference images here is the same as the process of uploading multiple unmarked images in the previous article, and will not be repeated here.
  • the AI platform provides the user with a selection interface for selection of difficult cases, and the selection interface for selection of difficult cases includes difficult cases selection parameters that can be selected by the user.
  • the hard case selection interface may include hard cases that the user can select.
  • Filter parameters The user can select the difficult case screening parameters according to the reasoning image and actual needs. As shown in FIG. 10, the difficult case screening parameters may include one or more of the difficult case screening methods, the reasoning image type, the task target type, and the difficult case output path information. Difficult cases can be screened by confidence and algorithm.
  • the inference image types can include continuous (continuous to indicate that multiple inference images are continuous in time series) and discontinuous (non-continuous to indicate that multiple inference images are not continuous in time sequence).
  • the types of task targets can include target detection and image classification.
  • the output path information of difficult examples can be used to indicate the storage location of the difficult examples excavated in the inference image. If multiple inference images are continuous in time sequence (indicating that the inference image is a video segment), the inference image type is selected as continuous. If multiple inference images are not continuous in time sequence (indicating that the inference image is not a video segment), the inference image type is selected as non-continuous. If the user wants to perform image classification on multiple inference images, he can select the task target type as image classification. If the user wants to perform target detection on multiple inference images, he can select the task target type as target detection.
  • the hard-case screening parameters also include the storage location information of the marked training image.
  • Step 903 The AI platform performs inference on multiple inference images according to the optimized AI model, and obtains inference results.
  • the AI platform can input multiple inference images into the optimized AI model, and the optimized AI model will output the inference results of the multiple inference images. If the optimized AI model is used for image classification, for multiple inference images, the output inference result is the category to which the image belongs. If the optimized AI model is used for target detection, for multiple reasoning images, the output reasoning result is the target category in the bounding box included in each frame of reasoning image and the position of the bounding box in the reasoning image.
  • step 904 the AI platform determines the difficult cases in the multiple reasoning images according to the reasoning result and the difficult case screening parameters selected by the user.
  • the AI platform can use the reasoning result and the type and filtering method of the task target in the difficult example filtering parameter selected by the user to filter out the difficult examples in the multiple reasoning images. Then, the difficult cases in the multiple reasoning images are stored through the difficult case output path in the difficult case filtering parameters.
  • step 905 the AI platform trains the optimized AI model according to the difficult cases, and obtains the re-optimized AI model.
  • the AI platform can also provide the difficult case to the user so that the user can further confirm whether it is a difficult case.
  • the specific processing is as follows:
  • the AI platform provides a confirmation interface to the first user, and displays the candidate difficult example to the first user in the confirmation interface.
  • the candidate difficult example is at least one image among multiple images.
  • the AI platform determines the hard cases among the candidate hard cases according to the operations of the first user on the confirmation interface.
  • the first user may determine at least one candidate difficult example included in the multiple reasoning images according to the reasoning result and the difficult case screening parameters selected by the user. Then the AI platform provides at least one candidate difficult case to the user I/O module, and the user I/O module provides the user with a confirmation interface, in which the user is shown the candidate difficult case in multiple reasoning images.
  • the user can open any candidate difficult case, the user can subjectively judge whether the labeling information of the difficult case is accurate, if it is not accurate, the user can modify the labeling information, after the modification is completed, the modification of the candidate difficult case is confirmed, AI The platform will receive the confirmation operation to confirm that the difficult case is available, and the annotation information of the difficult case is the correction annotation after the user's modification.
  • the user may subjectively judge that there is no problem with the labeling of the difficult case, and the modification of the candidate difficult case can be directly confirmed, and the AI platform will receive the confirmation operation.
  • the AI platform can confirm that the difficult case is available, and the label of the difficult case is the label provided by the original AI platform.
  • the initial AI model is used to determine the process of identifying difficult cases. Specifically, the initial AI model is used to extract the features of the unlabeled image, and based on the features of the unlabeled image, determine the Annotate the results, and then find difficult cases in the unannotated images based on the annotated results.
  • the initial AI model is continuously trained based on the difficult examples in the unlabeled image to obtain an optimized AI model.
  • it is the process of using the optimized AI model to determine the difficult cases.
  • the optimized AI model is used to extract the features of the inference image, based on the features of the inference image, determine the inference result of the inference image, and then find out the inference image based on the inference result.
  • the optimized AI model is continuously trained based on the difficult examples in the reasoning image to obtain the re-optimized AI model. It can be seen that the processing principles of step 503 and step 904 are similar. Both use an AI model to identify difficult cases in unlabeled images, and the difference between the AI models used is that the reasoning ability of the optimized AI model is higher than that of the initial AI model. ability.
  • step 504 and step 905 are similar, and both are based on difficult cases to train the existing AI model, so that the reasoning ability of the obtained AI model is better than the reasoning ability of the current AI model. Therefore, the above-mentioned process in Fig. 5 and Fig. 9 are actually to find difficult cases and optimize the current AI model.
  • the AI platform can provide AI model developers with optimized AI models with stronger reasoning capabilities, so that developers can deploy AI models with one click and do not need to care about the development process.
  • the implementation process of determining the difficult case may be as follows:
  • the initial AI model uses the initial AI model to annotate the unlabeled images, obtain the label information of each image in the unlabeled images, and determine whether the unlabeled images are video clips. If the unlabeled image is a video segment, the difficult cases in the unlabeled image are determined according to the labeling results of each image in the unlabeled image. If the image in the unlabeled image is not a video segment, then according to the labeling results of each image in the unlabeled image and the training image set, the difficult cases in the unlabeled image are determined.
  • the AI platform can use any one or more of the optical flow method and the Hamming distance to determine whether multiple unlabeled images are difficult cases. For example, the AI platform can use the Hamming distance to determine the distance between each frame of image and the next frame of image adjacent to that image in time series. If the Hamming distance between the image and the next frame of the image in time sequence is less than a certain value, it is determined that the image and the next frame of image are continuous in time sequence. If the Hamming distance is greater than or equal to a certain value, then It is determined that the image and the next frame of image are not continuous in time sequence.
  • the optical flow method can also be used to judge whether the image is continuous with the next frame of image. If the optical flow method is used to determine whether the image is continuous with the next frame If the image is continuous, it is finally determined that the image and the next frame of image are continuous in time sequence. If the optical flow method is used to determine that the image and the next frame of image are not continuous in time sequence, it is finally determined that the image and the next frame of image are not continuous in time sequence. In this way, continuing to traverse each frame of image will determine whether the multiple unlabeled images are continuous images or non-continuous images.
  • the multiple unlabeled images are continuous images, it is determined that the multiple unlabeled images are video clips, and if the multiple unlabeled images are not continuous images, it is determined that the multiple unlabeled images are not video clips.
  • a combination of multiple methods is used to determine whether the image is continuous in time sequence, so the accuracy of the determined result is relatively high.
  • the AI platform can use the labeling results of the unlabeled multiple images to determine the difficult cases in the unlabeled multiple images.
  • the AI platform can use the labeling results of each image in the multiple unlabeled images and the training image set to determine the difficult cases in the unlabeled images.
  • the training image set here refers to the set of training images obtained by training the initial AI model.
  • the numbers are adjacent. For example, if the image number of a certain frame is 1 and the image number of another frame is 2, then the two frames are adjacent. If the images are adjacent in time sequence, it can also mean that the upload sequence is adjacent. For example, if a certain frame of image is uploaded first, and another frame of image is uploaded second, it means that the two frames of images are adjacent in time sequence. For another example, if a certain frame of image is uploaded first, and another frame of image is uploaded third, it means that the two frames of images are not adjacent in time sequence.
  • the difficult way to determine is as follows:
  • the AI platform determines the target image in the multiple unlabeled images, where the labeling result of the target image is different from the labeling result of the image adjacent to the target image in time series. Determine the target image as a difficult example among the unlabeled images.
  • the annotation result of each frame of image output in step 502 may include the category to which the image belongs.
  • the AI platform can determine whether the category to which the image belongs is the same as the category to which the adjacent frame image belongs.
  • the adjacent frame image here refers to the frame image adjacent to the image in time sequence. If they are the same, it can be determined that the image is not a difficult example. If they are not the same, it means that the optimized AI model has a relatively high recognition error rate for the image, and the image can be determined as a difficult example. This image is the target image.
  • the first frame of image has only the next frame of image that is adjacent in time series
  • the last frame of image has only the previous frame of image that is adjacent in time series.
  • Step 1101 The AI platform obtains the confidence level of each image in the multiple unlabeled images in each category, and determines each of the multiple unlabeled images according to the two highest confidence levels of each image in the multiple unlabeled images. The first hard case value of the image.
  • the hard case value is used to measure the degree of whether the image is a hard case.
  • the larger the hard case value the greater the probability that the image is a hard case.
  • the smaller the hard case value is, the smaller the probability that the image is a hard case.
  • the output of the optimized AI model may include the confidence levels of multiple unlabeled images in each category.
  • the confidence of the output of the optimized AI model in each category indicates the possibility that the labeled result after the optimized AI model infers the input data belongs to each category.
  • the two largest confidence levels corresponding to the image can be obtained.
  • Example value In this way, according to this method for any image, the first difficult example value of each image in the multiple unlabeled images can be determined.
  • Step 1102 The AI platform obtains the surface feature distribution information of the training images in the training image set, and determines the second difficulty of each image in the unlabeled multiple images based on the surface feature distribution information and the surface features of each image in the multiple unlabeled images. Example value.
  • the surface features of each frame can be determined.
  • the surface features can include the resolution of the image, the aspect ratio of the image, and the red, green, and blue (Red Green Blue) of the image. , RGB) one or more of the mean and variance, the brightness of the image, the saturation of the image, or the sharpness of the image.
  • the AI platform can obtain the resolution and brightness of the image from the attributes of the image.
  • the resolution of the image refers to the number of pixels contained in a unit inch, and the brightness of the image determines the brightness of the color in the color space.
  • the AI platform can divide the length of the image by the width to get the aspect ratio of the image.
  • the AI platform can use the R, G, and B of each pixel in the image to determine the average value of R, the average of B, and the average of G respectively, which is the average of RGB of the image.
  • the AI platform determines the average value of R, the average value of G, and the average value of B for all pixels in the image, calculates the square of the difference between the average value of R and R of each pixel, and calculates the sum of squares corresponding to all pixels in the image. That is, the variance of R in the image is obtained.
  • the AI platform can determine the variance of G and B.
  • the variance of R, G, and B are the variances of RGB of the image.
  • the AI platform can calculate the saturation of the image.
  • Saturation refers to the vividness of the color, also known as the purity of the color.
  • the saturation of the image is calculated as: (max(R,G,B)- min(R,G,B))/max(R,G,B ), max (R, G, B) represents the maximum value of R, G, B in the image, and min (R, G, B) represents the minimum value of R, G, B in the image.
  • the AI platform can also calculate the sharpness of the image.
  • the sharpness is an index to measure the quality of the image.
  • the sharpness of the image can be determined by the Brenner gradient function or the Laplacian gradient function.
  • the AI platform obtains the surface features of each image in the training image set, and determines the distribution of the images on each surface feature.
  • the distribution of images on each surface feature can be represented by a histogram.
  • Figure 12(a) is a histogram of the mean value of R of the image.
  • the horizontal axis is the mean value of R and the vertical axis is the number of images.
  • the AI platform obtains the stored preset value, and for the distribution of the image on any surface feature, multiplies the preset value by the number of images in the multiple unlabeled images to obtain the target value.
  • the AI platform arranges the value of the surface feature of all images in the multiple unlabeled images in ascending order, finds the value at the target value position in ascending order, and obtains the limit value of the surface feature.
  • determine the image with the surface feature greater than the limit value and the image with the surface feature less than or equal to the limit value determine the difficult example value of the image with the surface feature greater than the limit value, and determine the difficulty of determining the surface feature less than or equal to the limit value
  • the example value is b.
  • the surface feature is the brightness of the image
  • the number of images is 1000
  • the preset value is 90%
  • the 900th value among the brightness values arranged in ascending order If the value is 202.5, the hard case value of images in the training image set with a brightness greater than 202.5 is determined to be 1, and the hard case value of images in the training image set less than or equal to 202.5 is determined to be 0.
  • the method of determining the hard case value based on the brightness the hard case value of each frame image under each surface feature can be determined. The above is only an optional implementation method, and other methods may also be used to determine the limit value.
  • a difficult case value can be determined, and then the weight corresponding to each surface feature can be obtained.
  • the hard-case value of each surface feature in the image is multiplied by the weight corresponding to the surface feature to obtain a value corresponding to each surface feature.
  • the AI platform adds the values corresponding to all the surface features to obtain the second hard case value of the image.
  • the weights can be different, and the sum of the weights of all surface features is equal to 1.
  • the brightness of the image and the definition of the image are more important than the aspect ratio of the image.
  • the AI platform determines the surface features of each frame of image.
  • the user can directly upload the training image set and the surface features of each image in the multiple unlabeled images, and store them in the data storage module.
  • the AI platform When the AI platform is in use, it obtains the surface features of each of the multiple unlabeled images from the data storage module.
  • Step 1103 the AI platform uses the first feature extraction model to extract the deep features of each image in the training image set and the deep features of each image in the unlabeled multiple images.
  • the training image is concentrated Perform clustering processing on each image to obtain the image clustering result; according to the deep features of each image in the unlabeled multiple images, the image clustering result, and the labeling result of each image in the unlabeled multiple images, determine the unlabeled multiple The third difficult example value of each image in each image.
  • the AI platform can obtain the first feature extraction model, which can be CNN, and then the AI platform inputs each image in the training image set to the first feature extraction model to determine each The deep features of the image.
  • the AI platform can also input each image of the multiple unlabeled images to the first feature extraction model to determine the deep features of each image.
  • the deep features of each frame of image can be represented by a one-dimensional array, and the dimensions of the one-dimensional array of deep features of each frame of image are equal.
  • the AI platform can input the deep features of each image in the training image set into the clustering algorithm (the clustering algorithm can be any clustering algorithm, such as K-means clustering algorithm, etc.) to obtain the image clustering result.
  • the image clustering result includes multiple image groups, and each frame of the image group includes one or more images.
  • the AI platform can determine the average value of the i-th dimension of each frame of image in the image group.
  • the image group includes 3 images, and the deep features of each frame are represented by a three-dimensional array.
  • the deep features of the 3 images are (1, 2, 5), (4, 2, 4), (4, 8, 9), the average value of the first dimension is 3, the average of the second dimension is 4, and the average of the third dimension is 6, so the center of the image group is (3, 4, 6). In this way, the center of each frame of the image group can be determined in this way.
  • the AI platform can determine the distance between the deep features of the image and the center of each frame of the image group in the image clustering result.
  • i is any dimension in the deep features
  • N is the total number of dimensions in the deep features.
  • x_1i is the i-th dimension in the deep features of the image
  • x_2i is the i-th dimension in the deep features of the center.
  • the image group with the smallest distance is determined as the image group to which the image belongs (the process here can be considered as a clustering result of multiple unlabeled images). Determine whether the image is the same category as the image in the image group. If they are the same, it is determined that the hard case value is a, and the third hard case value of the image is a. If they are not the same, it is determined that the hard case value is b, and the third hard case value of the image is b.
  • the K-means clustering method can also be used to determine the frame image group to which any image belongs. In addition, other methods can also be used for clustering.
  • the AI hard case value determines the target hard case value of each image of the multiple unlabeled images according to one or more of the first hard case value, the second hard case value, and the third hard case value.
  • the AI platform can use one or more of the first, second, and third difficult values of the image to determine The target difficulty value of the image.
  • the AI platform can determine the first hard case value as the target hard case value, or the second hard case value as the target hard case value, or weight the first hard case value and the second hard case value. After that, the target hard case value is obtained, or the target hard case value can be obtained after the first hard case value and the third hard case value are weighted, and the second hard case value and the third hard case value can also be weighted to obtain For the target hard case value, the first hard case value, the second hard case value, and the third hard case value may be weighted to obtain the target hard case value.
  • the determined target difficulty value is more accurate.
  • Step 1105 The AI platform determines the first number of images with the largest target difficulty value among the unlabeled images as the difficult examples in the unlabeled images.
  • the first number can be preset and stored in the data storage module of the AI platform.
  • the AI platform can sort the unlabeled images in the descending order of the target difficult case value, select the first number of images that are ranked first, and determine it as one of the unlabeled images. Hard case.
  • the multiple unlabeled images are video clips, as shown in Figure 14, the difficult example of determining method is:
  • Step 1401 For the first target frame of the first image among the unlabeled images, the AI platform determines that the first target frame is similar to the first target frame in the images whose time-series interval with the first image is less than or equal to the second number The highest tracking frame.
  • the second number can be preset, such as 2 and so on.
  • any one of the unlabeled images may be referred to as the first image, and any bounding box in the first image may be referred to as the first target frame.
  • the AI platform can determine that the time sequence of the first image is less than or equal to the second number of images. For example, if the first image is the 5th frame image and the second number is 2, then the images whose time sequence interval of the first image is less than or equal to the second number are the 3rd frame image, the 4th frame image, the 6th frame image and The 7th frame image. In this way, when the second number is greater than or equal to 2, not only an adjacent frame of image is considered, but also adjacent frames of images are considered, so the judgment accuracy of false detection and missed detection can be improved.
  • the AI platform may acquire multiple bounding boxes in the images whose time-series interval of the first image is less than or equal to the second number, and then determine the similarity between the multiple bounding boxes and the first target box. Specifically, for each bounding box, calculate the first absolute value of the difference between the area of the bounding box and the area of the first target box, and calculate the first absolute value of the difference between the length of the bounding box and the length of the first target box. Two absolute values, calculating the third absolute value of the difference between the width of the bounding box and the width of the first target box.
  • the first weight, the second weight, and the third weight are added to obtain the similarity between the first target box and the bounding box. It should be noted here that the sum of the first weight, the second weight, and the third weight is equal to 1, and the second weight and the third weight may be equal.
  • the AI platform may determine the bounding box with the highest similarity to the first target box in the above multiple bounding boxes. Then the bounding box with the highest similarity is determined as the tracking box corresponding to the first target box. In this way, since the most similar bounding box in the adjacent multi-frame images is considered, it is possible to prevent the target from being lost due to motion.
  • step 1402 the AI platform determines the overlap between the first target frame and each bounding box according to the tracking frame, all the bounding boxes and the first target frame in the images whose time-series interval from the first image is less than or equal to the second number rate.
  • the AI platform can use the following formula to determine the overlap ratio of the first target box and each bounding box:
  • overlap refers to the overlap rate of the first target box and the bounding box.
  • the first target box is represented by curbox
  • the bounding box is represented by bbox
  • iou (curbox, bbox) represents the intersection over union (iou) of the first target box and the bounding box.
  • the tracking box of the first target box is represented by trackedbox
  • iou (trackedbox, bbox) represents the intersection ratio of the tracking box and the bounding box.
  • the overlap is equal to the maximum value of these two intersections.
  • the intersection ratio of the first target box and the bounding box is equal to the ratio of the area of the intersection of the first target box and the bounding box to the area of the union of the first target box and the bounding box.
  • the intersection ratio of the tracking box and the bounding box is equal to the ratio of the area of the intersection of the tracking box and the bounding box to the area of the union of the tracking box and the bounding box.
  • Step 1403 If there is a bounding box with an overlap rate greater than the second value, the AI platform will determine the bounding box with an overlap rate greater than the second value as a similar box corresponding to the first target frame; if there is no bounding box with an overlap rate greater than the second value Bounding box, it is determined that there is no similar box corresponding to the first target box.
  • the second value can be preset and stored in the data storage module.
  • the second value is 0.5 and so on.
  • the AI platform can determine the magnitude of each overlap rate and the second value. If the overlap ratio between the first target frame and a certain bounding box is greater than the second value, it is determined that the bounding box is a similar frame corresponding to the first target frame.
  • the bounding box is a similar frame corresponding to the first target frame.
  • the overlap ratio of the first target frame and any bounding box is less than or equal to the second value, it is determined that there is no similar frame corresponding to the first target frame in the images whose time-series interval of the first image is less than or equal to the second number .
  • Step 1404 If there is no similar frame corresponding to the first target frame, the AI platform determines the first target frame as a difficult case frame.
  • step 1403 it is determined that there is no similar frame corresponding to the first target frame, indicating that the first target frame is a frame that appears suddenly, which can be regarded as a false detection frame.
  • the AI platform can determine the first target frame as a difficult case frame.
  • Step 1405 If there is a similar frame corresponding to the first target frame, and the first image to which the first target frame belongs and the second image to which the similar frame belongs are not adjacent in time sequence, the AI platform determines whether the first target frame and the similar frame , Determine the difficult frame in the image between the first image and the second image.
  • step 1403 it is determined that there is a similar frame corresponding to the first target frame, and the AI platform can determine whether the similar frame and the image where the first target frame is located are adjacent in time sequence. If they are adjacent in time sequence, it means that there is no missing frame. If they are not adjacent in time sequence, it means that the frame suddenly disappears and there is a missing frame.
  • the AI platform can use the sliding average between the similar frame and the first target frame to mark the image between the first image and the second image
  • the missed frame in, the missed frame is the difficult frame in the image between the first image and the second image. In this way, the principle that the minority obeys the majority is adopted, and the missed and falsely detected frames in the continuous frame image are marked.
  • the moving average processing in step 1405 may be: the general bounding box is a rectangle, and the position coordinates of the upper left corner and the lower right corner of the bounding box are used to mark the position of the bounding box in the image to which it belongs.
  • the position coordinates refer to the position coordinates in the image.
  • the AI platform can use the abscissa of the upper left corner of the first target frame to subtract the abscissa of the upper left corner of the similar frame to obtain the abscissa difference, and multiply the abscissa difference by x/(n+1) (where n is equal to The number of images between the image to which the first target frame belongs and the image to which the similar frame belongs, and x is the xth image between the image to which the first target frame belongs and the image to which the similar frame belongs).
  • Step 1406 Determine the difficult cases in the multiple unlabeled images according to the number of difficult cases in each image in the multiple unlabeled images.
  • the number of difficult cases in each image in the unlabeled multiple images can be determined, and then the AI platform can make the number of difficult cases exceed the third number of images, Determined as a difficult case among multiple unlabeled images.
  • the difficult example of determining method is:
  • step 1501 the AI platform obtains the surface feature distribution information of the images in the training image set, and determines the fourth of each image in the unlabeled multiple images according to the surface feature distribution information of the images in the training image set and the surface features of multiple unlabeled images. Hard case value.
  • the surface feature of each frame of image can be determined, and the surface feature may include the surface feature of the image and the surface feature of the bounding box.
  • the surface characteristics of the image can include the resolution of the image, the aspect ratio of the image, the mean and variance of the RGB of the image, the brightness of the image, the saturation of the image or the sharpness of the image, the number of frames in a single frame image, or a single frame image One or more of the variances of the area of the middle frame.
  • the surface characteristics of the bounding box can include the aspect ratio of each bounding box in a single frame image, the ratio of the area of each bounding box in a single frame image to the image area, the degree of marginalization of each bounding box in a single frame image, and the single frame image.
  • the AI platform determines the resolution of the image, the aspect ratio of the image, the mean and variance of the RGB of the image, the brightness of the image, the saturation of the image, or the image's surface characteristics in the surface features of each image in the unlabeled multiple images.
  • the processing in step 1102 please refer to the processing in step 1102, which will not be repeated here.
  • the AI platform can determine the number of frames in each frame of the image.
  • the AI platform can determine the area of each frame in each frame of image, and then calculate the average value of the area of all frames in each frame of image, and then subtract the average value from the area of each bounding box, and then square it to obtain the corresponding value of each bounding box. A value, the value corresponding to each bounding box is added to obtain the variance of the area of the box in a single frame of image.
  • the AI platform can calculate the aspect ratio of each bounding box in each frame of image.
  • the AI platform can calculate the ratio of the area of each bounding box to the image area in each frame of image.
  • the AI platform can calculate the degree of marginalization of each bounding box in a single frame of image.
  • the specific processing is: for any bounding box in a frame of image, calculate the difference between the abscissa of the center of the bounding box and the abscissa of the center of the image The absolute value of the value (called the abscissa difference), the absolute value of the difference between the ordinate of the center of the bounding box and the ordinate of the center of the image (called the ordinate difference), calculate the abscissa difference
  • the first ratio to the length of the image, and the second ratio of the ordinate difference to the width of the image is calculated, (the first ratio, the second ratio) reflects the degree of marginalization of the bounding box, generally the first ratio and The larger the second ratio, the more serious the marginalization.
  • the AI platform can calculate the stacking degree of each bounding box in a single frame image.
  • the specific processing is: for any bounding box in a certain frame of image, calculate the area of the intersection of the bounding box and the rest of the bounding boxes in the image, and the intersection The area is respectively compared with the area of the bounding box, and then added to obtain the stacking level of the bounding box in the image.
  • the AI platform can calculate the brightness of each bounding box in a single frame image.
  • the specific processing is: For any bounding box in a certain frame of image, calculate the square of the mean value of R and the square of the mean value of G of the pixels in the bounding box And the square of the mean of B. Then multiply the square of the mean value of R by 0.241 to get the product a, multiply the square of the mean value of G by 0.691 to get the product b, and multiply the square of the mean value of B by 0.068 to get the product c. After the product a, the product b, and the product c are added, the square root is used to obtain the brightness of the bounding box.
  • the formula can be expressed as follows:
  • the AI platform can calculate the ambiguity of each bounding box in a single frame of image.
  • the specific processing is: for any bounding box in a frame of image, use the Laplacian to filter the bounding box to obtain the edge value, and Find the variance of the edge value to get the ambiguity of the bounding box. It should be noted that the larger the value obtained by calculating the variance, the clearer the bounding box.
  • the ambiguity of the determination box here is only an example form, and any ambiguity that can be used to determine the bounding box can be applied to this embodiment.
  • the AI platform obtains the surface features of the training images in the training image set, and determines the distribution of the images on each surface feature (processing is the same as obtaining the surface features of multiple unlabeled images, see the surface features of multiple unlabeled images) Specifically, the distribution of images on each surface feature can be represented by a histogram.
  • the AI platform obtains the stored preset value, and for the distribution of the image on any surface feature, multiplies the preset value by the number of images in the multiple unlabeled images to obtain the target value.
  • the AI platform arranges the value of the surface feature of all images in the multiple unlabeled images in ascending order, finds the value at the target value position in ascending order, and obtains the limit value of the surface feature.
  • determine the image with the surface feature greater than the limit value and the image with the surface feature less than or equal to the limit value determine the difficult example value of the image with the surface feature greater than the limit value a, and determine the difficulty of determining the surface feature less than or equal to the limit value
  • the example value is b.
  • a difficult case value can be determined, and then the weight corresponding to each surface feature can be obtained.
  • the AI platform multiplies the hard-case value of the surface feature of the image's bounding box with the weight corresponding to the surface feature to obtain a value corresponding to each surface feature of the bounding box , And then the AI platform adds up the values corresponding to all the surface features of the bounding box to obtain the difficult value of the bounding box of the image.
  • the AI platform multiplies the difficult value of the surface feature of the image by the weight corresponding to the surface feature to obtain a value corresponding to each surface feature of the image, and then the AI platform adds the values corresponding to all the surface features of the image , Get the difficult example value of the image. Then the AI platform weights the hard case value of the bounding box and the hard case value of the image (the sum of the weight of the hard case value of the bounding box and the hard case value of the image is equal to 1) to obtain the fourth hard case value of the image.
  • the weights can be different, and the sum of the weights of all surface features is equal to 1.
  • the brightness of the image and the definition of the image are more important than the aspect ratio of the image.
  • the AI platform determines the surface features of each frame of image.
  • the user can directly upload the surface features of each of the multiple unlabeled images and store them in the data storage module.
  • the AI platform When the AI platform is in use, it obtains the surface features of each of the multiple unlabeled images from the data storage module.
  • the AI platform uses the second feature extraction model to extract the deep features of each bounding box in each image in the training image set and the deep features of each bounding box in each image in the multiple unlabeled images, according to the training image set
  • the deep features of each bounding box in each image are clustered for each bounding box in each image in the training image set to obtain the box clustering result; according to the deep layer of each bounding box in each image in multiple unlabeled images
  • the feature, the frame clustering result, and the inference result of each bounding box in each image in the multiple unlabeled images determine the fifth hard case value of each image in the multiple unlabeled images.
  • the AI platform can obtain a second feature extraction model, and the second feature extraction model can be the same as the first feature extraction module mentioned above, and can be a CNN. Then the AI platform inputs each image in the training image set to the second feature extraction model, and determines the deep features of each bounding box in each image. The AI platform can also input each of the multiple unlabeled images to the second feature extraction model to determine the deep features of each bounding box in each image.
  • the deep features of each bounding box can be represented by a one-dimensional array, and the one-dimensional arrays of the deep features of each bounding box have the same dimensions.
  • the AI platform can input the deep features of each bounding box in each image in the training image set into the clustering algorithm (the clustering algorithm can be any clustering algorithm, such as K-means clustering algorithm, etc.) to obtain the boundary Box clustering results.
  • the bounding box clustering result includes multiple bounding box groups, and each bounding box group includes one or more bounding boxes.
  • the AI platform can determine the average value of the i-th dimension of each bounding box in the bounding box group.
  • the bounding box group includes 3 bounding boxes, and the deep features of each bounding box are represented by a three-dimensional array.
  • the deep features of the three bounding boxes are (7, 2, 5), (4, 2, 4), ( 4,14,9), the average value of the first dimension is 3, the average value of the second dimension is 4, and the average value of the third dimension is 6, so the center of the bounding box group is (5, 6, 6). In this way, the center of each bounding box group can be determined in this way.
  • the AI platform can determine the distance between the deep features of the bounding box and the center of each bounding box group in the image clustering result.
  • the specific processing is to calculate the boundary
  • i is any dimension in the deep features
  • N is the total number of dimensions in the deep features.
  • x_1i is the i-th dimension in the deep features of the bounding box
  • x_2i is the i-th dimension in the deep features of the center.
  • clustering any bounding box in multiple unlabeled images into an existing bounding box group can also use the K-means clustering method to determine the bounding box group to which any bounding box belongs.
  • other methods can also be used for clustering.
  • step 1503 the AI platform determines the target difficulty value of each image of the multiple unlabeled images according to one or more of the fourth difficulty value and the fifth difficulty value.
  • the AI platform can use one or more of the fourth and fifth difficulty values of the image to determine the target difficulty of the image value. Specifically, the AI platform can determine the fourth difficulty value as the target difficulty value, or the fifth difficulty value as the target difficulty value, and the fourth difficulty value and the fifth difficulty value can be weighted. , It is difficult to get the target. When using the fourth and fifth difficulty values at the same time, since the two levels of difficulty values are considered at the same time, the determined target difficulty value is more accurate.
  • step 1504 the AI platform determines the first number of images with the largest target difficulty value among the unlabeled images as the difficult examples among the unlabeled images.
  • the first number can be preset and stored in the data storage module of the AI platform.
  • the AI platform can sort the unlabeled images in the descending order of the target difficult case value, select the first number of images that are ranked first, and determine it as one of the unlabeled images. Hard case.
  • the AI platform can obtain multiple unlabeled images, input the multiple unlabeled images into the initial AI model, obtain the labeling results of each data in the multiple unlabeled images, and then use the unlabeled multiple images. Based on the labeling results of each image in each image, the difficult cases in the multiple unlabeled images are determined, and based on the difficult cases, the initial AI model is retrained to obtain the optimized AI model. Because difficult examples are used in the AI platform to train the initial AI model, the optimized AI model after training can be more accurate in reasoning.
  • the following embodiments of the present application also provide a method for optimizing the AI model. As shown in FIG. 17, the processing may be:
  • Step 1701 The AI platform trains the initial AI model according to the training image set to obtain an optimized AI model.
  • the training image set is the image set provided by the user to the AI platform.
  • the training image set may only include multiple unlabeled images.
  • the training image set may include multiple unlabeled images and multiple labeled images.
  • the process of optimizing the initial AI model can be shown in the process shown in FIG. 5.
  • the training image set includes multiple unlabeled images and multiple labeled images
  • the processing here can refer to the flow shown in Figure 5.
  • the AI platform receives the reasoning image set, and performs reasoning on each reasoning image in the reasoning image set according to the optimized AI model to obtain the reasoning result.
  • the user can upload a set of reasoning images, and input the concentrated images of the reasoning images into the optimized AI model to obtain the reasoning result.
  • the AI platform determines the hard cases in the reasoning image set according to the reasoning results, where the hard cases indicate the reasoning images whose error rate of the reasoning results obtained by optimizing the AI model for reasoning is higher than the target threshold.
  • the inference result is equivalent to the annotation result.
  • this process can be referred to the processing of step 503.
  • the difference from step 503 is only that this is the inference image set.
  • step 503 there are multiple unlabeled images, which are actually multiple unlabeled images. See the description of step 503 for the detailed process.
  • step 1704 the optimized AI model is trained according to the difficult cases, and the re-optimized AI model is obtained.
  • the optimized AI model can be continuously trained to obtain the re-optimized AI model (see the previous description for the training process).
  • the resulting re-optimized AI model has stronger reasoning ability.
  • the above method of providing an AI model can be jointly implemented by one or more modules on the AI platform 100.
  • the user I/O module is used to implement steps 501 in FIG. 5 and steps in FIG. 9 901 and step 902.
  • the difficult example mining module is used to implement step 503 and step 904 in FIG. 5, the flow shown in FIG. 11, the flow shown in FIG. 14, the flow shown in FIG. 15, and the step 1703 in FIG.
  • the model training module is used to implement step 502, step 504 in FIG. 5, step 905 in FIG. 9, and step 1704 in FIG. 17.
  • the reasoning module is used to implement step 903 shown in FIG. 9 and step 1702 shown in FIG. 17.
  • the present application also provides a computing device 400 as shown in FIG. 4.
  • the processor 402 in the computing device 400 reads the program and image collection stored in the memory 401 to execute the method executed by the aforementioned AI platform.
  • each module in the AI platform 100 provided in this application can be distributed on multiple computers in the same environment or in different environments, this application also provides a computing device as shown in FIG. 18, the computing device A plurality of computers 1800 are included, and each computer 1800 includes a memory 1801, a processor 1802, a communication interface 1803, and a bus 1804. Among them, the memory 1801, the processor 1802, and the communication interface 1803 implement communication connections between each other through the bus 1804.
  • the memory 1801 may be a read-only memory, a static storage device, a dynamic storage device, or a random access memory.
  • the memory 1801 may store a program. When the program stored in the memory 1801 is executed by the processor 502, the processor 1802 and the communication interface 1803 are used to execute a part of the method used by the AI platform to obtain an AI model.
  • the memory can also store image collections. For example, part of the storage resources in the memory 1801 is divided into an image collection storage module for storing image collections required by the AI platform, and part of the storage resources in the memory 1801 is divided into an AI model storage module. Module, used to store AI model library.
  • the processor 1802 may adopt a general-purpose central processing unit, a microprocessor, an application specific integrated circuit, a graphics processor, or one or more integrated circuits.
  • the communication interface 1803 uses a transceiver module such as but not limited to a transceiver to implement communication between the computer 1800 and other devices or communication networks. For example, the image collection can be acquired through the communication interface 1803.
  • a transceiver module such as but not limited to a transceiver to implement communication between the computer 1800 and other devices or communication networks. For example, the image collection can be acquired through the communication interface 1803.
  • the bus 504 may include a path for transferring information between various components of the computer 1800 (for example, the memory 1801, the processor 1802, and the communication interface 1803).
  • Each of the above-mentioned computers 1800 establishes a communication path through a communication network.
  • Each computer 1800 runs any one or more of the user I/O module 101, the difficult case mining module 102, the model training module 103, the inference module 104, the AI model storage module 105, the data storage module 106, and the data preprocessing module 107.
  • Any computer 1800 may be a computer in a cloud data center (for example, a server), a computer in an edge data center, or a terminal computing device.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, or a combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product that provides the AI platform includes one or more computer instructions that enter the AI platform. When these computer program instructions are loaded and executed on the computer, they are generated in whole or in part according to the embodiments of the present application as shown in Figure 5, Figure 11, Figure 14 or The process or function described in Figure 15.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server, or data center via wired (such as coaxial cable, optical fiber, twisted pair, or wireless (such as infrared, wireless, microwave, etc.)).
  • the computer-readable storage medium stores and provides The computer program instructions of the AI platform.
  • the computer-readable storage medium can be any medium that can be accessed by a computer or a data storage device such as a server or data center that includes one or more integrated media.
  • the medium can be a magnetic medium, (For example, floppy disk, hard disk, magnetic tape), optical medium (for example, optical disc), or semiconductor medium (for example, SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The present application provides a method for providing an artificial intelligence (AI) model, an AI platform, a computing device, and a storage medium. The method relates to the technical field of AI. The method comprises: the AI platform receives a plurality of unmarked images of a first user, the first user being an entity that registers an account on the AI platform; the AI platform marks the plurality of images according to an initial AI model; the AI platform determines a hard example in the plurality of images according to the marking result; and the AI platform trains the initial AI model using the hard example to obtain an optimized AI model. By using the present application, because the AI platform uses the hard example to train the initial AI model, the AI model provided by the AI platform has a strong inferential capability.

Description

提供AI模型的方法、AI平台、计算设备及存储介质Provide AI model methods, AI platforms, computing equipment and storage media 技术领域Technical field
本申请涉及人工智能技术领域,特别涉及一种提供AI模型的方法、AI平台、计算设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method for providing an AI model, an AI platform, a computing device, and a storage medium.
背景技术Background technique
AI模型的获取过程一般是基于训练数据对AI模型进行训练,得到最终的AI模型。由于仅基于训练数据对初始AI模型进行训练,没有对AI模型进行优化,所以会导致AI模型的推理能力较低。The acquisition process of the AI model is generally based on the training data to train the AI model to obtain the final AI model. Since the initial AI model is trained only based on the training data and the AI model is not optimized, the reasoning ability of the AI model will be low.
发明内容Summary of the invention
本申请提供了一种提供人工智能AI模型的方法,该方法可为在AI平台注册账号的开发者提供推理能力更强的AI模型。This application provides a method for providing an artificial intelligence AI model, which can provide an AI model with stronger reasoning ability for developers who register an account on an AI platform.
第一方面,本申请提供了一种提供人工智能AI模型的方法,该方法包括:In the first aspect, this application provides a method for providing an artificial intelligence AI model, which includes:
AI平台接收第一用户的未标注的多个图像,第一用户为在所述AI平台注册账号的实体;所述AI平台根据初始AI模型标注所述多个图像;所述AI平台根据标注结果确定所述多个图像中的难例;所述AI平台利用所述难例训练所述初始AI模型以获得优化AI模型。The AI platform receives a plurality of unlabeled images of a first user, the first user is an entity registered with an account on the AI platform; the AI platform annotates the multiple images according to the initial AI model; the AI platform according to the annotation result Determine the hard cases in the multiple images; the AI platform uses the hard cases to train the initial AI model to obtain an optimized AI model.
通过该方法,AI平台可向在平台已注册的第一用户(例如:AI模型开发者)提供推理能力更强的优化AI模型,使得第一用户可方便快捷地获得优化AI模型,节约时间和人力投入。Through this method, the AI platform can provide the first user registered on the platform (for example: AI model developer) with an optimized AI model with stronger reasoning ability, so that the first user can easily and quickly obtain the optimized AI model, saving time and Human input.
在一种可能的实现方式中,AI平台根据标注结果确定所述多个图像中的难例,包括:AI平台向第一用户提供确认界面,在确认界面中向第一用户展示候选难例,所述候选难例为所述多个图像中的至少一个图像;AI平台根据所述第一用户在确认界面上的操作,确定候选难例中的难例。AI平台通过与第一用户交互,获得第一用户确认后的难例,提高了难例的准确性,进一步地提高了通过这些确定后的难例训练后的优化AI模型的推理能力。In a possible implementation, the AI platform determines the difficult examples in the multiple images according to the annotation results, including: the AI platform provides a confirmation interface to the first user, and the candidate difficult examples are displayed to the first user in the confirmation interface, The candidate difficult example is at least one image among the multiple images; the AI platform determines the difficult example among the candidate difficult examples according to the operation of the first user on the confirmation interface. By interacting with the first user, the AI platform obtains the hard cases confirmed by the first user, improves the accuracy of the hard cases, and further improves the reasoning ability of the optimized AI model trained through these confirmed hard cases.
在一种可能的实现方式中,该方法还包括:AI平台接收第一用户对所述难例的矫正标注;所述AI平台利用所述难例训练所述初始AI模型以获得优化AI模型包括:所述AI平台利用所述难例和对应的矫正标注训练所述初始AI模型以获得所述优化AI模型。AI平台通过与第一用户交互,获得第一用户对难例的矫正标注用于对初始AI模型进行训练,进一步地提高了训练后的优化AI模型的推理能力。In a possible implementation manner, the method further includes: the AI platform receives correction annotations of the difficult cases from the first user; the AI platform uses the difficult cases to train the initial AI model to obtain the optimized AI model includes : The AI platform trains the initial AI model by using the difficult examples and the corresponding correction annotations to obtain the optimized AI model. By interacting with the first user, the AI platform obtains the first user's corrective annotations of difficult cases for training the initial AI model, which further improves the reasoning ability of the optimized AI model after training.
在一种可能的实现方式中,该方法还包括:AI平台从第一用户获取带标注的一个或多个图像;AI平台利用带标注的一个或多个图像获得初始AI模型。In a possible implementation, the method further includes: the AI platform obtains one or more annotated images from the first user; the AI platform obtains the initial AI model by using the one or more annotated images.
在一种可能的实现方式中,该方法还包括:所述AI平台将所述优化AI模型提供给第二用户的设备,以使得所述设备用所述优化AI模型执行任务目标;或,所述AI平台接收所述第二用户的设备发送的推理图像,利用所述优化AI模型对所述推理图像进行推 理,并向所述第二用户的设备提供推理结果。该方法提供了向第二用户的设备发送优化AI模型或者在线利用优化AI模型向用户提供推理服务两种方法,可以使优化AI模型方便地用于推理,也可以适应不同的任务目标。In a possible implementation manner, the method further includes: the AI platform provides the optimized AI model to the device of the second user, so that the device uses the optimized AI model to perform task goals; or, The AI platform receives the inference image sent by the device of the second user, uses the optimized AI model to perform inference on the inference image, and provides the inference result to the device of the second user. This method provides two methods of sending an optimized AI model to the second user's device or using the optimized AI model online to provide reasoning services to the user. The optimized AI model can be conveniently used for reasoning and can also be adapted to different task goals.
在一种可能的实现方式中,AI平台根据初始AI模型标注所述多个未标注的图像,包括:所述AI平台向所述第一用户提供标注选择界面,所述标注选择界面上包括所述第一用户可选择的至少一种标注方式;所述AI平台接收所述第一用户选择的标注方式,根据所述第一用户选择的标注方式对应的所述初始AI模型标注所述多个未标注的图像。该方法通过给第一用户提供不同的标注选择方式,使第一用户可以根据要上传至AI平台的图像决定选用何种标注方式,提高了AI平台应对各种用户或者各种场景的灵活性。In a possible implementation manner, the AI platform annotating the multiple unlabeled images according to the initial AI model includes: the AI platform provides an annotation selection interface to the first user, and the annotation selection interface includes all The at least one labeling method selectable by the first user; the AI platform receives the labeling method selected by the first user, and labels the plurality of labels according to the initial AI model corresponding to the labeling method selected by the first user Unlabeled image. This method provides the first user with different label selection methods, so that the first user can decide which labeling method to choose according to the image to be uploaded to the AI platform, which improves the flexibility of the AI platform to deal with various users or various scenarios.
在一种可能的实现方式中,所述AI平台根据初始AI模型标注所述多个图像包括:根据所述初始AI模型对所述多个图像分类和/或根据所述初始AI模型对所述多个图像执行物体检测。In a possible implementation, the AI platform annotating the plurality of images according to the initial AI model includes: classifying the plurality of images according to the initial AI model and/or classifying the plurality of images according to the initial AI model Multiple images perform object detection.
第二方面,本申请还提供了一种人工智能AI平台,所述AI平台包括:用户输入输出I/O模块,用于接收第一用户的未标注的多个图像,所述第一用户为在所述AI平台注册账号的实体;数据预处理模块,用于根据初始AI模型标注所述多个图像;难例挖掘模块,用于根据标注结果确定所述多个图像中的难例;模型训练模块,用于利用所述难例训练所述初始AI模型以获得优化AI模型。In the second aspect, the present application also provides an artificial intelligence AI platform, the AI platform includes: a user input and output I/O module, configured to receive a plurality of unlabeled images of a first user, the first user is An entity that registers an account on the AI platform; a data preprocessing module, used to annotate the multiple images according to the initial AI model; a difficult example mining module, used to determine difficult examples in the multiple images according to the annotation result; a model The training module is used to train the initial AI model using the difficult examples to obtain an optimized AI model.
在一种可能的实现方式中,所述用户I/O模块,还用于向所述第一用户提供确认界面,在所述确认界面中向所述第一用户展示候选难例,所述候选难例为所述多个图像中的至少一个图像;所述难例挖掘模块,还用于根据所述第一用户在所述确认界面上的操作,确定所述候选难例中的难例。In a possible implementation manner, the user I/O module is further configured to provide a confirmation interface to the first user, and display candidate difficult examples to the first user in the confirmation interface, and the candidate The difficult example is at least one of the multiple images; the difficult example mining module is further configured to determine the difficult example among the candidate difficult examples according to the operation of the first user on the confirmation interface.
在一种可能的实现方式中,所述用户I/O模块,还用于接收所述用户对所述难例的矫正标注;所述模型训练模块,具体用于利用所述难例和对应的矫正标注训练所述初始AI模型以获得所述优化AI模型。In a possible implementation, the user I/O module is further configured to receive correction annotations of the difficult cases by the user; the model training module is specifically configured to use the difficult cases and corresponding Corrective annotation training the initial AI model to obtain the optimized AI model.
在一种可能的实现方式中,所述用户I/O模块,还用于从所述第一用户获取带标注的一个或多个图像;所述模型训练模块,还用于利用带标注的一个或多个图像获得所述初始AI模型。In a possible implementation, the user I/O module is further configured to obtain one or more tagged images from the first user; the model training module is also configured to use annotated images Or multiple images to obtain the initial AI model.
在一种可能的实现方式中,所述用户I/O模块,还用于将所述优化AI模型提供给第二用户的设备,以使得所述设备用所述优化AI模型执行任务目标;或,所述AI平台还包括推理模块,所述用户I/O模块,还用于接收所述第二用户的设备发送的推理图像;所述推理模块,用于利用所述优化AI模型对所述推理图像进行推理;所述用户I/O模块,还用于向所述第二用户的设备提供推理结果。In a possible implementation manner, the user I/O module is further configured to provide the optimized AI model to a device of a second user, so that the device uses the optimized AI model to perform task goals; or , The AI platform further includes a reasoning module, the user I/O module is also used to receive the reasoning image sent by the second user’s device; the reasoning module is used to use the optimized AI model to The reasoning image is used for reasoning; the user I/O module is also used to provide the reasoning result to the device of the second user.
在一种可能的实现方式中,所述用户I/O模块,还用于向所述第一用户提供标注选择界面,所述标注选择界面上包括所述第一用户可选择的至少一种标注方式;所述用户I/O模块,还用于接收所述第一用户选择的标注方式;所述数据预处理模块,具体用于根据所述第一用户选择的标注方式对应的所述初始AI模型标注所述多个未标注的图像。In a possible implementation manner, the user I/O module is further configured to provide a label selection interface to the first user, and the label selection interface includes at least one label selectable by the first user The user I/O module is also used to receive the labeling method selected by the first user; the data preprocessing module is specifically used for the initial AI corresponding to the labeling method selected by the first user The model labels the multiple unlabeled images.
在一种可能的实现方式中,所述数据预处理模块,具体用于根据所述初始AI模型对所述多个图像分类和/或根据所述初始AI模型对所述多个图像执行物体检测。In a possible implementation manner, the data preprocessing module is specifically configured to classify the multiple images according to the initial AI model and/or perform object detection on the multiple images according to the initial AI model .
第三方面,本申请还提供了一种优化人工智能AI模型的方法,其特征在于,所述方法包括:根据训练图像集对初始AI模型进行训练,获得优化AI模型;接收推理图像集,根据所述优化AI模型对所述推理图像集中的每个推理图像进行推理,获得推理结果;根据所述推理结果,确定所述推理图像集中的难例,其中,所述难例指示通过所述优化AI模型进行推理获得的推理结果的错误率高于目标阈值的推理图像;根据所述难例对所述优化AI模型进行训练,获得再优化AI模型。该方法根据推理结果确定难例,利用难例再训练优化AI模型,使得获得的再优化AI模型的推理能力更强。In a third aspect, this application also provides a method for optimizing an artificial intelligence AI model, characterized in that the method includes: training an initial AI model according to a training image set to obtain an optimized AI model; receiving a reasoning image set, according to The optimized AI model performs reasoning on each reasoning image in the reasoning image set to obtain a reasoning result; according to the reasoning result, determine a difficult example in the reasoning image set, wherein the difficult example indicates that the optimization A reasoning image in which the error rate of the reasoning result obtained by the AI model for reasoning is higher than the target threshold; the optimized AI model is trained according to the difficult case, and the re-optimized AI model is obtained. This method determines the difficult cases based on the inference results, and uses the difficult cases to retrain the optimized AI model, so that the obtained re-optimized AI model has stronger reasoning ability.
在一种可能的实现方式中,所述根据所述推理结果,确定所述推理图像集中的难例,具体包括:确定所述推理图像集为视频片段;根据所述推理图像集中各图像的推理结果,确定所述推理图像集中的难例;或,确定所述推理图像集为非视频片段,根据所述推理图像集中各图像的推理结果和所述训练图像集,确定所述推理图像集中的难例。该方法根据推理图像集的类型,利用不同的难例确定方式确定难例,充分考虑了推理图像集本身的特点,提高了确定的难例的准确率,进一步地提高了再优化AI模型的推理能力。In a possible implementation manner, the determining the difficult cases in the reasoning image set according to the reasoning result specifically includes: determining that the reasoning image set is a video clip; and reasoning according to each image in the reasoning image set As a result, determine the difficult cases in the inference image set; or, determine that the inference image set is a non-video segment, and determine the inference image set in the inference image set based on the inference result of each image in the inference image set and the training image set Hard case. According to the type of the reasoning image set, this method uses different hard case determination methods to determine the hard cases, and fully considers the characteristics of the reasoning image set itself, improves the accuracy of the determined hard cases, and further improves the reasoning of the re-optimized AI model. ability.
在一种可能的实现方式中,所述根据所述推理图像集中各图像的推理结果,确定所述推理图像集中的难例,包括:确定所述推理图像集中的目标图像,其中,所述目标图像的推理结果与所述目标图像在所述视频片段中的相邻的图像的推理结果不相同;将所述目标图像确定为所述推理图像中的难例。In a possible implementation manner, the determining a difficult case in the reasoning image set according to the reasoning result of each image in the reasoning image set includes: determining a target image in the reasoning image set, wherein the target The reasoning result of the image is different from the reasoning result of the adjacent image of the target image in the video segment; the target image is determined as a difficult case in the reasoning image.
在一种可能的实现方式中,所述根据所述推理图像集中各图像的推理结果和所述训练图像集,确定所述推理图像集中的难例,包括:获取所述推理图像集中各图像在各类别下的置信度,根据所述推理图像集中各图像的最高的两个置信度,确定所述推理图像集中各图像的第一难例值;获取所述训练图像集中图像的表层特征分布信息,根据所述表层特征分布信息和所述推理图像集中各图像的表层特征,确定所述推理图像集中各图像的第二难例值;获取所述训练图像集中各图像的深层特征和所述推理图像集中各图像的深层特征,根据所述训练图像集中各图像的深层特征,对所述训练图像集中各图像进行聚类处理,得到图像聚类结果;根据所述推理图像集中各图像的深层特征、所述图像聚类结果和所述推理图像集中各图像的推理结果,确定所述推理图像集中各图像的第三难例值;根据所述第一难例值、第二难例值和所述第三难例值中的一个或多个,确定所述推理图像集中各图像的目标难例值;将所述推理图像集中目标难例值最大的第一数目个图像,确定为所述推理图像集中的难例。In a possible implementation manner, the determining the difficult cases in the inference image set according to the inference result of each image in the inference image set and the training image set includes: obtaining the inference image set in the inference image set. For the confidence levels in each category, determine the first hard case value of each image in the inference image set according to the two highest confidence levels of each image in the inference image set; obtain the surface feature distribution information of the images in the training image set , According to the surface feature distribution information and the surface features of each image in the inference image set, determine the second hard case value of each image in the inference image set; acquire the deep features of each image in the training image set and the inference According to the deep features of each image in the image set, perform clustering processing on each image in the training image set according to the deep features of each image in the training image set to obtain an image clustering result; according to the deep features of each image in the inference image set , The image clustering result and the reasoning result of each image in the reasoning image set, determine the third hard case value of each image in the reasoning image set; according to the first hard case value, the second hard case value and the result One or more of the third difficulty value, determine the target difficulty value of each image in the reasoning image set; determine the first number of images with the largest target difficulty value in the reasoning image set as the reasoning Hard cases in the image collection.
在一种可能的实现方式中,所述根据所述推理图像集中各图像的推理结果,确定所述推理图像集中的难例,包括:对于所述推理图像集中第一图像的第一目标框,在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,判断是否存在所述第一目标框对应的相似框;若未存在所述第一目标框对应的相似框,则将所述第一目标框确定为难例框;若存在所述第一目标框对应的相似框,且所述第一目标框所属的第一图像和所述相似框所属的第二图像在所述视频片段中不相邻,则根据所述第一目标框与所述相似框,确定所述第一图像和所述第二图像之间的图像中的难例框;根据所述推理图像集中各图像的难例框的数目,确定所述推理图像集中的难例。In a possible implementation manner, the determining the difficult case in the reasoning image set according to the reasoning result of each image in the reasoning image set includes: for the first target frame of the first image in the reasoning image set, Among the images in the video segment whose time-series interval with the first image is less than or equal to the second number, determine whether there is a similar frame corresponding to the first target frame; if the first target does not exist A similar frame corresponding to the frame, then the first target frame is determined as a difficult case; if there is a similar frame corresponding to the first target frame, and the first image to which the first target frame belongs and the similar frame belong If the second image of is not adjacent in the video segment, determine the difficult frame in the image between the first image and the second image according to the first target frame and the similar frame; According to the number of difficult cases in each image in the reasoning image set, the hard cases in the reasoning image set are determined.
在一种可能的实现方式中,所述在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,判断是否存在所述第一目标框对应的相似框,包括: 在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,确定与第一目标框的相似度最高的追踪框;根据所述追踪框、在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中的所有的边界框和所述第一目标框,确定所述第一目标框与各边界框的重叠率;若存在重叠率大于第二数值的边界框,则将重叠率大于第二数值的边界框,确定为所述第一目标框对应的相似框;若不存在重叠率大于第二数值的边界框,则确定未存在所述第一目标框对应的相似框。In a possible implementation manner, among the images in the video clip whose time-series interval with the first image is less than or equal to a second number, it is determined whether there is a corresponding first target frame. The similar frame includes: determining the tracking frame with the highest similarity to the first target frame among the images in the video segment whose time-series interval with the first image is less than or equal to a second number; A tracking frame, all bounding boxes in the image whose time sequence interval from the first image in the video segment is less than or equal to a second number, and the first target frame, and determining the first target frame The overlap rate with each bounding box; if there is a bounding box with an overlap rate greater than the second value, the bounding box with an overlap rate greater than the second value is determined as the similar frame corresponding to the first target frame; if there is no overlap rate If the bounding box is larger than the second value, it is determined that there is no similar box corresponding to the first target box.
在一种可能的实现方式中,所述根据所述推理图像集中各图像的推理结果和所述训练图像集,确定所述推理图像集中的难例,包括:获取所述训练图像集中图像的表层特征分布信息,根据所述训练图像集中图像的表层特征分布信息和所述推理图像集中图像的表层特征,确定所述推理图像集中各图像的第四难例值,其中,所述表层特征包括边界框的表层特征和图像的表层特征;获取所述训练图像集中各图像中每个框的深层特征和所述推理图像集中各图像中每个框的深层特征,根据所述训练图像集中各图像中每个框的深层特征,对所述训练图像集中各图像中每个框进行聚类处理,得到框聚类结果;根据所述推理图像集中各图像中每个框的深层特征、所述框聚类结果和所述推理图像集中各图像中每个框的推理结果,确定所述推理图像集中各图像的第五难例值;根据所述第四难例值和所述第五难例值中的一个或多个,确定所述推理图像集各图像的目标难例值;将所述推理图像集中目标难例值最大的第一数目个图像,确定为所述推理图像集中的难例。In a possible implementation manner, the determining the difficult cases in the inference image set according to the inference result of each image in the inference image set and the training image set includes: obtaining the surface layer of the image in the training image set Feature distribution information, based on the surface feature distribution information of the images in the training image set and the surface features of the images in the inference image set, determine the fourth difficult example value of each image in the inference image set, wherein the surface features include boundaries The surface features of the frame and the surface features of the image; the deep features of each frame in each image in the training image set and the deep features of each frame in each image in the inference image set are acquired, according to each image in the training image set For the deep features of each frame, perform clustering processing on each frame in each image in the training image set to obtain a frame clustering result; according to the deep features of each frame in each image in the inference image set, the frame cluster Class result and the reasoning result of each frame in each image in the reasoning image set, determine the fifth hard case value of each image in the reasoning image set; according to the fourth hard case value and the fifth hard case value Determine the target difficulty value of each image in the reasoning image set; determine the first number of images with the largest target difficulty value in the reasoning image set as the hard instance in the reasoning image set.
第四方面,本申请还提供一种人工智能AI平台,所述AI平台包括:模型训练模块,用于根据训练图像集对初始AI模型进行训练,获得优化AI模型;推理模块,用于接收推理图像集,根据所述优化AI模型对所述推理图像集中的每个推理图像进行推理,获得推理结果;难例挖掘模块,用于根据所述推理结果,确定所述推理图像集中的难例,其中,所述难例指示通过所述优化AI模型进行推理获得的推理结果的错误率高于目标阈值的推理图像;所述模型训练模块,还用于根据所述难例对所述优化AI模型进行训练,获得再优化AI模型。In a fourth aspect, the present application also provides an artificial intelligence AI platform. The AI platform includes: a model training module for training an initial AI model according to a set of training images to obtain an optimized AI model; an inference module for receiving inferences Image set, inferring each reasoning image in the reasoning image set according to the optimized AI model to obtain the reasoning result; the hard case mining module is used to determine the hard cases in the reasoning image set according to the reasoning result, Wherein, the hard case indicates an inference image whose error rate of the reasoning result obtained by reasoning through the optimized AI model is higher than a target threshold; the model training module is also used to compare the optimized AI model according to the hard case Carry out training to obtain and then optimize the AI model.
在一种可能的实现方式中,所述难例挖掘模块,具体用于:确定所述推理图像集为视频片段;根据所述推理图像集中各图像的推理结果,确定所述推理图像集中的难例;或,确定所述推理图像集为非视频片段,根据所述推理图像集中各图像的推理结果和所述训练图像集,确定所述推理图像集中的难例。In a possible implementation, the difficult example mining module is specifically used to: determine that the reasoning image set is a video segment; and determine the difficulty of the reasoning image set according to the reasoning result of each image in the reasoning image set. Example; or, determining that the reasoning image set is a non-video segment, and determining difficult examples in the reasoning image set according to the reasoning result of each image in the reasoning image set and the training image set.
在一种可能的实现方式中,所述难例挖掘模块,具体用于:确定所述推理图像集中的目标图像,其中,所述目标图像的推理结果与所述目标图像在所述视频片段中的相邻的图像的推理结果不相同;将所述目标图像确定为所述推理图像中的难例。In a possible implementation manner, the difficult case mining module is specifically configured to: determine a target image in the inference image set, wherein the inference result of the target image and the target image are in the video segment The inference results of adjacent images are different; the target image is determined as a difficult example in the inference image.
在一种可能的实现方式中,所述难例挖掘模块,具体用于:获取所述推理图像集中各图像在各类别下的置信度,根据所述推理图像集中各图像的最高的两个置信度,确定所述推理图像集中各图像的第一难例值;获取所述训练图像集中图像的表层特征分布信息,根据所述表层特征分布信息和所述推理图像集中各图像的表层特征,确定所述推理图像集中各图像的第二难例值;获取所述训练图像集中各图像的深层特征和所述推理图像集中各图像的深层特征,根据所述训练图像集中各图像的深层特征,对所述训练图像集中各图像进行聚类处理,得到图像聚类结果;根据所述推理图像集中各图像的深层特 征、所述图像聚类结果和所述推理图像集中各图像的推理结果,确定所述推理图像集中各图像的第三难例值;根据所述第一难例值、第二难例值和所述第三难例值中的一个或多个,确定所述推理图像集中各图像的目标难例值;将所述推理图像集中目标难例值最大的第一数目个图像,确定为所述推理图像集中的难例。In a possible implementation, the difficult example mining module is specifically used to: obtain the confidence of each image in each category in the reasoning image set, and according to the two highest confidences of each image in the reasoning image set Degree, determine the first hard case value of each image in the inference image set; obtain the surface feature distribution information of the images in the training image set, and determine according to the surface feature distribution information and the surface features of each image in the inference image set The second hard example value of each image in the inference image set; acquire the deep features of each image in the training image set and the deep features of each image in the inference image set, and compare Perform clustering processing on each image in the training image set to obtain an image clustering result; determine the image clustering result according to the deep features of each image in the inference image set, the image clustering result, and the inference result of each image in the inference image set The third difficult example value of each image in the reasoning image set; determining each image in the reasoning image set according to one or more of the first hard example value, the second hard example value, and the third hard example value The target difficulty example value of the target; the first number of images with the largest target difficulty example value in the reasoning image set is determined as the hard example in the reasoning image set.
在一种可能的实现方式中,所述难例挖掘模块,具体用于:对于所述推理图像集中第一图像的第一目标框,在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,判断是否存在所述第一目标框对应的相似框;若未存在所述第一目标框对应的相似框,则将所述第一目标框确定为难例框;若存在所述第一目标框对应的相似框,且所述第一目标框所属的第一图像和所述相似框所属的第二图像在所述视频片段中不相邻,则根据所述第一目标框与所述相似框,确定所述第一图像和所述第二图像之间的图像中的难例框;根据所述推理图像集中各图像的难例框的数目,确定所述推理图像集中的难例。In a possible implementation manner, the difficult example mining module is specifically used to: for the first target frame of the first image in the inference image set, the first target frame in the video segment is in time sequence with the first image. In the image where the interval above is less than or equal to the second number, it is determined whether there is a similar frame corresponding to the first target frame; if there is no similar frame corresponding to the first target frame, the first target frame is determined A difficult example frame; if there is a similar frame corresponding to the first target frame, and the first image to which the first target frame belongs and the second image to which the similar frame belongs are not adjacent in the video segment, then According to the first target frame and the similar frame, determine the difficult case frame in the image between the first image and the second image; according to the number of difficult case frames in each image in the inference image set, Determine the hard cases in the reasoning image set.
在一种可能的实现方式中,所述难例挖掘模块,具体用于:在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,确定与第一目标框的相似度最高的追踪框;根据所述追踪框、在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中的所有的边界框和所述第一目标框,确定所述第一目标框与各边界框的重叠率;若存在重叠率大于第二数值的边界框,则将重叠率大于第二数值的边界框,确定为所述第一目标框对应的相似框;若不存在重叠率大于第二数值的边界框,则确定未存在所述第一目标框对应的相似框。In a possible implementation manner, the difficult case mining module is specifically configured to: determine the difference between the images in the video segment and the first image in the time sequence that is less than or equal to the second number. The tracking frame with the highest degree of similarity to the first target frame; according to the tracking frame, all bounding boxes in the image whose time sequence interval with the first image in the video segment is less than or equal to the second number And the first target frame, determine the overlap rate between the first target frame and each bounding box; if there is a bounding box with an overlap rate greater than a second value, determine the bounding box with an overlap rate greater than the second value as the all The similar frame corresponding to the first target frame; if there is no bounding box with an overlap ratio greater than the second value, it is determined that there is no similar frame corresponding to the first target frame.
在一种可能的实现方式中,所述难例挖掘模块,具体用于:获取所述训练图像集中图像的表层特征分布信息,根据所述训练图像集中图像的表层特征分布信息和所述推理图像集中图像的表层特征,确定所述推理图像集中各图像的第四难例值,其中,所述表层特征包括边界框的表层特征和图像的表层特征;获取所述训练图像集中各图像中每个框的深层特征和所述推理图像集中各图像中每个框的深层特征,根据所述训练图像集中各图像中每个框的深层特征,对所述训练图像集中各图像中每个框进行聚类处理,得到框聚类结果;根据所述推理图像集中各图像中每个框的深层特征、所述框聚类结果和所述推理图像集中各图像中每个框的推理结果,确定所述推理图像集中各图像的第五难例值;根据所述第四难例值和所述第五难例值中的一个或多个,确定所述推理图像集各图像的目标难例值;将所述推理图像集中目标难例值最大的第一数目个图像,确定为所述推理图像集中的难例。In a possible implementation, the difficult example mining module is specifically used to: obtain the surface feature distribution information of the images in the training image set, and according to the surface feature distribution information of the images in the training image set and the inference image The surface features of the concentrated images are determined, and the fourth difficult example value of each image in the inference image set is determined, where the surface features include the surface features of the bounding box and the surface features of the image; each of the images in the training image set is acquired The deep features of the frames and the deep features of each frame in each image in the inference image set, according to the deep features of each frame in each image in the training image set, cluster each frame in each image in the training image set Class processing to obtain a frame clustering result; according to the deep features of each frame in each image in the reasoning image set, the frame clustering result, and the reasoning result of each frame in each image in the reasoning image set, determine the The fifth difficulty example value of each image in the reasoning image set; according to one or more of the fourth difficulty example value and the fifth difficulty example value, the target difficulty example value of each image in the reasoning image set is determined; The first number of images with the largest target hard case value in the reasoning image set is determined as the hard case in the reasoning image set.
第五方面,本申请还提供一种计算设备,所述计算设备包括存储器和处理器,所述存储器用于存储一组计算机指令;所述处理器执行所述存储器存储的一组计算机指令,以使得所述计算设备执行第一方面或第一方面的任意一种可能的实现方式提供的方法。In a fifth aspect, the present application also provides a computing device, the computing device includes a memory and a processor, the memory is used to store a set of computer instructions; the processor executes a set of computer instructions stored in the memory to The computing device is caused to execute the method provided by the first aspect or any one of the possible implementation manners of the first aspect.
第六方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算设备执行时,所述计算设备执行前述第一方面或第一方面的任意一种可能的实现方式中提供的方法。该存储介质包括但不限于易失性存储器,例如随机访问存储器,非易失性存储器,例如快闪存储器、硬盘(英文:hard disk drive,缩写:HDD)、固态硬盘(英文:solid state drive,缩写:SSD)。In a sixth aspect, the present application provides a computer-readable storage medium that stores computer program code, and when the computer program code is executed by a computing device, the computing device executes the aforementioned first aspect or The method provided in any one of the possible implementations of the first aspect. The storage medium includes, but is not limited to, volatile memory, such as random access memory, non-volatile memory, such as flash memory, hard disk (English: hard disk drive, abbreviation: HDD), solid state drive (English: solid state drive, Abbreviation: SSD).
第七方面,本申请提供一种计算机程序产品,所述计算机程序产品包括计算机程序 代码,在所述计算机程序代码被计算设备执行时,所述计算设备执行前述第一方面或第一方面的任意可能的实现方式中提供的方法。该计算机程序产品可以为一个软件安装包,在需要使用前述第一方面或第一方面的任意可能的实现方式中提供的方法的情况下,可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。In a seventh aspect, the present application provides a computer program product. The computer program product includes computer program code. When the computer program code is executed by a computing device, the computing device executes the foregoing first aspect or any of the first aspects. The methods provided in the possible implementations. The computer program product may be a software installation package. In the case where the method provided in the foregoing first aspect or any possible implementation of the first aspect needs to be used, the computer program product may be downloaded and executed on a computing device. Program product.
第八方面,本申请还提供一种计算设备,所述计算设备包括存储器和处理器,所述存储器用于存储一组计算机指令;所述处理器执行所述存储器存储的一组计算机指令,以使得所述计算设备执行第三方面或第三方面的任意一种可能的实现方式提供的方法。In an eighth aspect, the present application also provides a computing device. The computing device includes a memory and a processor. The memory is used to store a set of computer instructions; the processor executes a set of computer instructions stored in the memory to The computing device is caused to execute the method provided by the third aspect or any one of the possible implementation manners of the third aspect.
第九方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算设备执行时,所述计算设备执行前述第三方面或第三方面的任意一种可能的实现方式中提供的方法。该存储介质包括但不限于易失性存储器,例如随机访问存储器,非易失性存储器,例如快闪存储器、硬盘(英文:hard disk drive,缩写:HDD)、固态硬盘(英文:solid state drive,缩写:SSD)。In a ninth aspect, the present application provides a computer-readable storage medium that stores computer program code, and when the computer program code is executed by a computing device, the computing device executes the aforementioned third aspect or The method provided in any possible implementation of the third aspect. The storage medium includes, but is not limited to, volatile memory, such as random access memory, non-volatile memory, such as flash memory, hard disk (English: hard disk drive, abbreviation: HDD), solid state drive (English: solid state drive, Abbreviation: SSD).
第十方面,本申请提供一种计算机程序产品,所述计算机程序产品包括计算机程序代码,在所述计算机程序代码被计算设备执行时,所述计算设备执行前述第三方面或第三方面的任意可能的实现方式中提供的方法。该计算机程序产品可以为一个软件安装包,在需要使用前述第三方面或第三方面的任意可能的实现方式中提供的方法的情况下,可以下载该计算机程序产品并在计算设备上执行该计算机程序产品。In a tenth aspect, the present application provides a computer program product. The computer program product includes computer program code. When the computer program code is executed by a computing device, the computing device executes the foregoing third aspect or any of the third aspects. The methods provided in the possible implementations. The computer program product may be a software installation package. In the case where the method provided in the foregoing third aspect or any possible implementation of the third aspect needs to be used, the computer program product may be downloaded and executed on a computing device. Program product.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方法,下面将对实施例中所需使用的附图作以简单地介绍。In order to more clearly illustrate the technical methods of the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments.
图1为本申请实施例提供的一种AI平台100的结构示意图;FIG. 1 is a schematic structural diagram of an AI platform 100 provided by an embodiment of the application;
图2为本申请提供的一种AI平台100的应用场景示意图;FIG. 2 is a schematic diagram of an application scenario of an AI platform 100 provided by this application;
图3为本申请实施例提供的一种AI平台100的部署示意图;FIG. 3 is a schematic diagram of deployment of an AI platform 100 provided by an embodiment of the application;
图4为本申请实施例提供的一种部署AI平台100的计算设备400的结构示意图;FIG. 4 is a schematic structural diagram of a computing device 400 for deploying an AI platform 100 according to an embodiment of the application;
图5为本申请实施例提供的一种提供AI模型的流程示意图;FIG. 5 is a schematic diagram of a process for providing an AI model according to an embodiment of the application;
图6为本申请实施例提供的一种数据的上传界面的示意图;FIG. 6 is a schematic diagram of a data upload interface provided by an embodiment of the application;
图7为本申请实施例提供的一种启动智能标注的界面的示意图;FIG. 7 is a schematic diagram of an interface for starting smart labeling provided by an embodiment of the application;
图8为本申请实施例提供的一种数据的标注界面的示意图;FIG. 8 is a schematic diagram of a data labeling interface provided by an embodiment of the application;
图9为本申请实施例提供的一种使用优化AI模型进行推理的流程的示意图;FIG. 9 is a schematic diagram of a flow of reasoning using an optimized AI model provided by an embodiment of the application;
图10为本申请实施例提供的一种启动难例挖掘的界面的示意图;FIG. 10 is a schematic diagram of an interface for starting difficult case mining provided by an embodiment of this application;
图11为本申请实施例提供的另一种确定难例的方法的流程示意图;FIG. 11 is a schematic flowchart of another method for determining difficult cases according to an embodiment of this application;
图12为本申请实施例提供的一种表层特征分布示意图;FIG. 12 is a schematic diagram of a surface layer feature distribution provided by an embodiment of this application;
图13为本申请实施例提供的一种确定难例值的示意图;FIG. 13 is a schematic diagram of determining a difficult case value provided by an embodiment of the application;
图14为本申请实施例提供的另一种确定难例的方法的流程示意图;FIG. 14 is a schematic flowchart of another method for determining difficult cases according to an embodiment of the application;
图15为本申请实施例提供的另一种确定难例的方法的流程示意图;15 is a schematic flowchart of another method for determining difficult cases provided by an embodiment of the application;
图16为本申请实施例提供的一种确定难例值的示意图;FIG. 16 is a schematic diagram of determining a difficult case value provided by an embodiment of the application;
图17为本申请实施例提供的一种优化AI模型的流程示意图;FIG. 17 is a schematic flowchart of an optimized AI model provided by an embodiment of the application;
图18为本申请实施例提供的一种计算设备的结构示意图。FIG. 18 is a schematic structural diagram of a computing device provided by an embodiment of this application.
具体实施方式detailed description
下面将结合本申请中的附图,对本申请提供的实施例中的方案进行描述。The solutions in the embodiments provided in this application will be described below in conjunction with the drawings in this application.
目前,人工智能热潮不断,机器学习是一种实现AI的核心手段,机器学习渗透至医学、交通、教育、金融等各个行业。不仅仅是专业技术人员,就连各行业的非AI技术专业也期盼用AI、机器学习完成特定任务。At present, the artificial intelligence boom continues. Machine learning is a core means to realize AI. Machine learning has penetrated into various industries such as medicine, transportation, education, and finance. Not only professional and technical personnel, but also non-AI technology majors in various industries also look forward to using AI and machine learning to complete specific tasks.
为了便于理解本申请提供的技术方案和实施例,下面对AI模型、AI模型的训练、难例、难例挖掘、AI平台等概念进行详细说明:In order to facilitate the understanding of the technical solutions and embodiments provided by this application, the concepts of AI model, AI model training, hard cases, hard case mining, and AI platform are described in detail below:
AI模型,是一类用机器学习思想解决实际问题的数学算法模型,AI模型中包括大量的参数和计算公式(或计算规则),AI模型中的参数是可以通过训练图像集对AI模型进行训练获得的数值,例如,AI模型的参数是AI模型中的计算公式或计算因子的权重。AI模型还包含一些超(hyper)参数,超参数是无法通过训练图像集对AI模型进行训练获得的参数,超参数可用于指导AI模型的构建或者AI模型的训练,超参数有多种。例如,AI模型训练的迭代(iteration)次数、学习率(leaning rate)、批尺寸(batch size)、AI模型的层数、每层神经元的个数。换而言之,AI模型的超参数与参数的区别在于:AI模型的超参数的值无法通过对训练图像集中的训练图像进行分析获得,而AI模型的参数的值可根据在训练过程中对训练图像集中的训练图像进行分析进行修改和确定。The AI model is a type of mathematical algorithm model that uses machine learning ideas to solve practical problems. The AI model includes a large number of parameters and calculation formulas (or calculation rules). The parameters in the AI model can be trained through the training image set. The obtained values, for example, the parameters of the AI model are the calculation formulas or the weights of the calculation factors in the AI model. The AI model also contains some hyperparameters. Hyperparameters are parameters that cannot be obtained by training the AI model through the training image set. The hyperparameters can be used to guide the construction of the AI model or the training of the AI model. There are many hyperparameters. For example, the number of iterations of AI model training, learning rate (leaning rate), batch size, the number of layers of the AI model, and the number of neurons in each layer. In other words, the difference between the hyperparameters of the AI model and the parameters is that the value of the hyperparameter of the AI model cannot be obtained by analyzing the training images in the training image set, while the value of the parameters of the AI model can be determined according to the training process. The training images in the training image set are analyzed for modification and determination.
AI模型多种多样,使用较为广泛的一类AI模型为神经网络模型,神经网络模型是一类模仿生物神经网络(动物的中枢神经***)的结构和功能的数学算法模型。一个神经网络模型可以包括多种不同功能的神经网络层,每层包括参数和计算公式。根据计算公式的不同或功能的不同,神经网络模型中不同的层有不同的名称。例如,进行卷积计算的层称为卷积层,卷积层常用于对输入信号(如图像)进行特征提取。一个神经网络模型也可以由多个已有的神经网络模型组合构成。不同结构的神经网络模型可用于不同的场景(如分类、识别等)或在用于同一场景时提供不同的效果。神经网络模型结构不同具体包括以下一项或多项:神经网络模型中网络层的层数不同、各个网络层的顺序不同、每个网络层中的权重、参数或计算公式不同。业界已存在多种不同的用于识别或分类等应用场景的具有较高准确率的神经网络模型,其中,一些神经网络模型可以被特定的训练图像集进行训练后单独用于完成一项任务或与其他神经网络模型(或其他功能模块)组合完成一项任务。There are many kinds of AI models. One of the most widely used AI models is neural network models, which are mathematical algorithm models that imitate the structure and function of biological neural networks (animal's central nervous system). A neural network model can include a variety of neural network layers with different functions, and each layer includes parameters and calculation formulas. According to different calculation formulas or different functions, different layers in the neural network model have different names. For example, the layer that performs the convolution calculation is called the convolution layer, and the convolution layer is often used for feature extraction of the input signal (such as an image). A neural network model can also be composed of a combination of multiple existing neural network models. Neural network models with different structures can be used in different scenarios (such as classification, recognition, etc.) or provide different effects when used in the same scenario. Different neural network model structures specifically include one or more of the following: the number of network layers in the neural network model is different, the order of each network layer is different, and the weights, parameters or calculation formulas in each network layer are different. There are many different neural network models with high accuracy for application scenarios such as recognition or classification in the industry. Among them, some neural network models can be trained on a specific training image set and then used to complete a task or Combine with other neural network models (or other functional modules) to complete a task.
除了神经网络模型外,其他大部分AI模型在被用于完成一项任务前都需要被训练。Except for neural network models, most other AI models need to be trained before being used to complete a task.
训练AI模型,是指利用已有的图像通过一定方法使AI模型拟合已有图像的规律,确定AI模型中的参数。训练一个AI模型需要准备一个训练图像集,根据训练图像集中的训练图像是否有标注(即:图像是否有特定的类型或名称),可以将AI模型的训练分为监督训练(supervised training)和无监督训练(unsupervised trainng)。对AI模型进行监督训练时,用于训练的训练图像集中的训练图像带有标注(label)。训练AI模型时,将训练图像集中的训练图像作为AI模型的输入,将训练图像对应的标注作为AI模型的输出值的参考,利用损失函数(loss function)计算AI模型输出值与训练图像对应的标注的损失(loss)值,根据损失值调整AI模型中的参数。用训练图像集中的每个训练图像迭代地对AI模型进行训练,AI模型的参数不断调整,直到AI模型可以根据 输入的训练图像准确度较高地输出与训练图像对应的标注相同的输出值。对AI模型进行无监督训练,则用于训练的图像集中的训练图像没有标注,训练图像集中的训练图像依次输入至AI模型,由AI模型逐步识别训练图像集中的训练图像之间的关联和潜在规则,直到AI模型可以用于判断或识别输入的图像的类型或特征。例如,聚类,用于聚类的AI模型接收到大量的训练图像后,可学习到各个训练图像的特征以及训练图像之间的关联和区别,将训练图像自动地分为多个类型。不同的任务类型可采用不同的AI模型,一些AI模型仅可以用监督学习的方式训练,一些AI模型仅可以用无监督学习的方式训练,还有一些AI模型既可以用监督学习的方式训练又可以用无监督学习的方式训练。经过训练完成的AI模型可以用于完成一项特定的任务。通常而言,机器学习中的AI模型都需要采用有监督学习的方式进行训练,有监督学习的方式对AI模型进行训练可使AI模型在带有标注的训练图像集中更有针对性地学习到训练图像集中训练图像与对应标注的关联,使训练完成的AI模型用于预测其他输入推理图像时准确率较高。Training an AI model refers to the use of existing images to make the AI model fit the rules of the existing image through a certain method, and to determine the parameters in the AI model. Training an AI model requires preparing a training image set. According to whether the training images in the training image set are labeled (that is, whether the images have a specific type or name), the training of the AI model can be divided into supervised training and none. Supervised training (unsupervised trainng). When performing supervised training on the AI model, the training images in the training image set used for training are labeled. When training the AI model, use the training image in the training image set as the input of the AI model, and use the annotations corresponding to the training image as the reference for the output value of the AI model, and use the loss function to calculate the output value of the AI model and the training image. The marked loss (loss) value adjusts the parameters in the AI model according to the loss value. Each training image in the training image set is used to iteratively train the AI model, and the parameters of the AI model are continuously adjusted until the AI model can output the same output value corresponding to the training image with a higher accuracy according to the input training image. For unsupervised training of the AI model, the training images in the training image set are not labeled, and the training images in the training image set are sequentially input to the AI model, and the AI model gradually recognizes the association and potential between the training images in the training image set Rules until the AI model can be used to judge or identify the type or characteristics of the input image. For example, clustering. After receiving a large number of training images, the AI model used for clustering can learn the characteristics of each training image and the associations and differences between the training images, and automatically divide the training images into multiple types. Different task types can use different AI models. Some AI models can only be trained by supervised learning, some AI models can only be trained by unsupervised learning, and some AI models can be trained both by supervised learning and It can be trained in unsupervised learning. The trained AI model can be used to complete a specific task. Generally speaking, AI models in machine learning need to be trained in a supervised learning method. Training the AI model in a supervised learning method can enable the AI model to learn more specifically in the marked training image set. The association between training images and corresponding annotations in the training image set enables the trained AI model to predict other input inference images with higher accuracy.
下面举一个用监督学习的方式训练一个用于图像分类任务的神经网络模型的例子:为了训练一个用于完成图像分类任务的神经网络模型,首先根据任务搜集图像,构建训练图像集,构成的训练图像集中包含3类图像,分别为:苹果、梨、香蕉,采集的训练图像分别按照类型存放在3个文件夹中,文件夹命名即为该文件夹内所有图像的标注。训练图像集构建好之后,选择一个可实现图像分类的神经网络模型(如卷积神经网络(convolutional neural network,CNN)),将训练图像集中的训练图像输入至CNN中,CNN中各层的卷积核对图像进行特征提取、特征分类,最后输出图像属于每个类型的置信度(confidence),根据置信度和图像对应的标注利用损失函数计算损失值,根据损失值和CNN结构更新CNN中每层的参数。前述训练过程持续进行,直到损失函数输出的损失值收敛或者训练图像集中所有的图像均被用于训练,则训练结束。Here is an example of using supervised learning to train a neural network model for image classification tasks: In order to train a neural network model for image classification tasks, first collect images according to the task, build a training image set, and form training The image set contains three types of images, namely: apple, pear, and banana. The collected training images are stored in three folders according to their types. The folder name is the label of all the images in the folder. After the training image set is constructed, select a neural network model that can realize image classification (such as convolutional neural network (convolutional neural network, CNN)), and input the training images in the training image set into the CNN. The volume of each layer in the CNN The product core extracts and classifies the image features, and finally outputs the confidence that the image belongs to each type. According to the confidence and the corresponding label of the image, the loss function is used to calculate the loss value, and each layer in the CNN is updated according to the loss value and the CNN structure. Parameters. The foregoing training process continues until the loss value output by the loss function converges or all images in the training image set are used for training, then the training ends.
损失函数,是用于衡量AI模型被训练的程度(也就是用于计算AI模型预测的结果与真实目标之间的差异)的函数。在训练AI模型的过程中,因为希望AI模型的输出尽可能的接近真正想要预测的值,所以可以通过比较当前AI模型根据输入图像的预测值和真正想要的目标值(即输入图像的标注),再根据两者之间的差异情况来更新AI模型中的参数(当然,在第一次更新之前通常会有初始化的过程,即为AI模型中的参数预先配置初始值)。每次训练都通过损失函数判断一下当前的AI模型预测的值与真实目标值之间的差异,更新AI模型的参数,直到AI模型能够预测出真正想要的目标值或与真正想要的目标值非常接近的值,则认为AI模型被训练完成。The loss function is a function used to measure the degree to which the AI model is trained (that is, used to calculate the difference between the predicted result of the AI model and the real target). In the process of training the AI model, because it is hoped that the output of the AI model is as close as possible to the value that you really want to predict, you can compare the predicted value of the current AI model based on the input image with the really desired target value (that is, the input image Label), and then update the parameters in the AI model according to the difference between the two (of course, there is usually an initialization process before the first update, that is, the parameters in the AI model are pre-configured with initial values). In each training session, the loss function is used to judge the difference between the current AI model’s predicted value and the real target value, and the parameters of the AI model are updated until the AI model can predict the real desired target value or the real desired target. If the value is very close, it is considered that the AI model has been trained.
在训练完成AI模型之后,训练完成的AI模型可以用于对图像进行推理,得到推理结果。具体的推理过程是:在图像分类的场景中,将图像输入至AI模型中,AI模型中的各层的卷积核对图像进行特征提取,基于提取的特征输出图像所属的类别。在目标检测(也可以称为是物体检测)的场景中,将图像输入至AI模型中,AI模型中的各层的卷积核对图像进行特征提取,基于提取的特征输出图像包括的每个目标的边界框的位置和类别。在涵盖图像分类和目标检测的场景时,将图像输入至AI模型中,AI模型中的各层的卷积核对图像进行特征提取,基于提取的特征输出图像所属的类别,以及图像包括的每个目标的边界框的位置和类别。此处需要说明的是,对于AI模型,有些AI模型的推理能力较强,而有些AI模型的推理能力较弱。AI模型的推理能力较强指使用AI模型对图 像进行推理时,推理结果的准确率大于或等于一定数值。而AI模型的推理能力较弱指使用AI模型对图像进行推理时,推理结果的准确率低于该一定数值。After the AI model is trained, the trained AI model can be used to reason about the image and get the reasoning result. The specific reasoning process is: in the scene of image classification, the image is input into the AI model, and the convolution kernel of each layer in the AI model extracts the features of the image, and outputs the category to which the image belongs based on the extracted features. In the scene of target detection (also called object detection), the image is input into the AI model, and the convolution kernel of each layer in the AI model extracts the features of the image, and outputs each target included in the image based on the extracted features The location and category of the bounding box. When covering the scenes of image classification and target detection, the image is input into the AI model. The convolution kernel of each layer in the AI model extracts the features of the image, and outputs the category of the image based on the extracted features, and each of the images included The location and category of the bounding box of the target. What needs to be explained here is that for AI models, some AI models have strong reasoning ability, while some AI models have weak reasoning ability. The strong reasoning ability of the AI model means that when the AI model is used to reason about the image, the accuracy of the reasoning result is greater than or equal to a certain value. The weak reasoning ability of the AI model means that when the AI model is used to reason about the image, the accuracy of the reasoning result is lower than the certain value.
难例(hard example),是对初始AI模型进行训练的过程中或者对训练后的AI模型进行推理的过程中,初始AI模型或训练后的AI模型输出的结果为错误或错误率较高时对应的模型的输入数据。例如,在AI模型的训练过程中,在对未标注的图像进行标注的过程中,标注的结果的错误率高于目标阈值的图像为难例。在AI模型的推理过程中,推理图像集中AI模型输出的推理结果的错误率高于目标阈值的图像为难例。A hard example is when the initial AI model is trained or the trained AI model is inferred when the output result of the initial AI model or the trained AI model is wrong or the error rate is high The input data of the corresponding model. For example, in the training process of the AI model, in the process of labeling unlabeled images, an image with an error rate of the labeled result higher than the target threshold is a difficult example. In the reasoning process of the AI model, in the reasoning image set, the image whose error rate of the reasoning result output by the AI model is higher than the target threshold is a difficult example.
难例挖掘,指确定一个图像为难例的方法。Difficult case mining refers to the method of identifying an image as a difficult case.
AI平台,是一种为AI开发者和用户提供便捷的AI开发环境以及便利的开发工具的平台。AI平台中内置有各种解决不同问题的AI模型或者AI子模型,AI平台可以根据用户的需求搜索并且建立适用的AI模型,用户只需在AI平台中确定自己的需求,且按照提示准备好训练图像集上传至AI平台,AI平台就能为用户训练出一个可用于实现用户需要的AI模型。或者,用户按照提示准备好自己的算法和训练图像集,上传至AI平台,AI平台基于用户自己的算法和训练图像集,可以训练出一个可用于实现用户需要的AI模型。用户可利用训练完成的AI模型完成自己的特定任务。The AI platform is a platform that provides a convenient AI development environment and convenient development tools for AI developers and users. There are various AI models or AI sub-models built in the AI platform to solve different problems. The AI platform can search for and establish an applicable AI model according to the needs of users. Users only need to determine their needs in the AI platform and prepare them according to the prompts. The training image set is uploaded to the AI platform, and the AI platform can train an AI model for the user that can be used to realize the user's needs. Or, the user prepares his own algorithm and training image set according to the prompts, and uploads it to the AI platform. Based on the user's own algorithm and training image set, the AI platform can train an AI model that can be used to realize the user's needs. Users can use the AI model completed by training to complete their own specific tasks.
如果AI平台用传统的AI模型训练方式得到AI模型,训练出的AI模型推理能力较弱。本申请实施例提供了一种AI平台,该AI平台中引入难例挖掘技术,使得AI平台形成AI模型构建、训练、推理、难例挖掘、再训练、再推理的闭环过程,在满足开发者的需求的同时,提高了AI模型的准确率(即提高了AI模型的推理能力)。If the AI platform uses the traditional AI model training method to obtain the AI model, the trained AI model has weak reasoning ability. The embodiment of the application provides an AI platform, which introduces difficult case mining technology, so that the AI platform forms a closed-loop process of AI model construction, training, reasoning, difficult case mining, retraining, and rereasing, satisfying developers At the same time, the accuracy of the AI model is improved (that is, the reasoning ability of the AI model is improved).
需要说明的是,上文中提到的AI模型是一种泛指,AI模型包括深度学习模型、机器学习模型等。It should be noted that the AI model mentioned above is a general term, and AI models include deep learning models, machine learning models, and so on.
图1为本申请实施例中的AI平台100的结构示意图,应理解,图1仅是示例性地展示了AI平台100的一种结构化示意图,本申请并不限定对AI平台100中的模块的划分。如图1所示,AI平台100包括用户输入输出(input/output,I/O)模块101、难例挖掘模块102、模型训练模块103、推理模块104、数据预处理模块105。可选的,AI平台还可以包括AI模型存储模块106和数据存储模块107。Figure 1 is a schematic structural diagram of the AI platform 100 in an embodiment of the application. It should be understood that Figure 1 is only an exemplary structural schematic diagram of the AI platform 100, and this application does not limit the modules in the AI platform 100 The division. As shown in FIG. 1, the AI platform 100 includes a user input/output (I/O) module 101, a difficult case mining module 102, a model training module 103, an inference module 104, and a data preprocessing module 105. Optionally, the AI platform may further include an AI model storage module 106 and a data storage module 107.
下面简要地描述AI平台100中的各个模块的功能:The following briefly describes the functions of each module in the AI platform 100:
用户I/O模块101:用于接收用户输入或选择的任务目标、接收第一用户的训练图像集、接收第二用户的设备发送的推理图像集等,其中,训练图像集中包括未标注的多个图像(可以称为是未标注的多个训练图像)。用户I/O模块101还用于接收第一用户对难例的矫正标注、从第一用户获取带标注的一个或多个图像、将优化AI模型提供给第二用户的设备、接收第二用户的设备发送的推理图像等。作为用户I/O模块101的举例,可采用图形用户界面(graphical user interface,GUI)或命令行界面(CLI)实现。例如,GUI上显示AI平台100可向用户提供多种AI业务,(如图像分类业务、目标检测业务等)。用户可在GUI上选择一个任务目标,例如,用户选择图像分类业务,用户可以继续在AI平台的GUI中上传未标注的多个图像等。GUI接收到任务目标和未标注的多个图像后,与模型训练模块103进行通信。模型训练模块103根据用户确定的任务目标为用户选择或者搜索可用于完成用户任务目标的构建的AI模型。用户I/O模块101还用于接 收难例挖掘模块102输出的难例,提供GUI,用于用户对难例进行确认处理。User I/O module 101: used to receive task goals input or selected by the user, receive the training image set of the first user, receive the reasoning image set sent by the device of the second user, etc., where the training image set includes unlabeled multiple Images (can be referred to as multiple unlabeled training images). The user I/O module 101 is also used to receive correction annotations for difficult cases from the first user, obtain one or more annotations from the first user, provide the optimized AI model to the device of the second user, and receive the second user Inference images sent by your device. As an example of the user I/O module 101, a graphical user interface (GUI) or a command line interface (CLI) can be used. For example, the AI platform 100 displayed on the GUI can provide users with multiple AI services (such as image classification services, target detection services, etc.). The user can select a task target on the GUI, for example, the user selects an image classification service, and the user can continue to upload multiple unmarked images in the GUI of the AI platform. After the GUI receives the task target and multiple unlabeled images, it communicates with the model training module 103. The model training module 103 selects or searches for an AI model that can be used to complete the construction of the user's task goal according to the task goal determined by the user. The user I/O module 101 is also used to receive the difficult cases output by the difficult case mining module 102, and provide a GUI for the user to confirm the difficult cases.
可选的,用户I/O模块101还可用于接收用户输入的对完成任务目标的AI模型的效果期望。例如,输入或选择最终获得的AI模型用于人脸识别的准确率要高于99%。Optionally, the user I/O module 101 may also be used to receive the user's input for the expected effect of the AI model for completing the task goal. For example, the accuracy of inputting or selecting the finally obtained AI model for face recognition is higher than 99%.
可选的,用户I/O模块101还可用于接收用户输入的AI模型等。例如,用户可基于自己的任务目标,在GUI输入初始AI模型。Optionally, the user I/O module 101 may also be used to receive an AI model input by the user, etc. For example, users can enter the initial AI model in the GUI based on their mission goals.
可选的,用户I/O模块101还可用于接收用户输入的推理图像集中推理图像的表层特征和深层特征。对于图像分类的场景中,表层特征包括图像的分辨率、图像的长宽比、图像的红绿蓝(RGB)的均值和方差、图像的亮度、图像的饱和度或图像的清晰度中的一种或多种,深层特征指使用特征提取模型(如CNN等)中的卷积核提取到的图像的抽象特征。对于目标检测的场景中,表层特征包括边界框的表层特征和图像的表层特征,边界框的表层特征可以包括单帧图像中每个边界框的长宽比、单帧图像中每个边界框的面积占图像面积的比例、单帧图像中每个边界框的边缘化程度、单帧图像中每个边界框的堆叠图、单帧图像中每个边界框的亮度或单帧图像中每个边界框的模糊度中的一种或多种,图像的表层特征可以包括图像的分辨率、图像的长宽比、图像的RGB的均值和方差、图像的亮度、图像的饱和度或图像的清晰度、单帧图像中框的数目或单帧图像中框的面积的方差中的一种或多种。深层特征指使用特征提取模型(如CNN等)中的卷积核提取到的图像的抽象特征。Optionally, the user I/O module 101 can also be used to receive the superficial features and deep features of the reasoning image in the reasoning image set input by the user. For the scene of image classification, the surface features include one of the resolution of the image, the aspect ratio of the image, the mean and variance of the red-green-blue (RGB) of the image, the brightness of the image, the saturation of the image, or the sharpness of the image. One or more, deep features refer to the abstract features of the image extracted using the convolution kernel in the feature extraction model (such as CNN, etc.). For the target detection scene, the surface features include the surface features of the bounding box and the surface features of the image. The surface features of the bounding box can include the aspect ratio of each bounding box in a single frame image, and the aspect ratio of each bounding box in a single frame image. The ratio of the area to the image area, the degree of marginalization of each bounding box in a single frame image, the stacked image of each bounding box in a single frame image, the brightness of each bounding box in a single frame image, or each boundary in a single frame image One or more of the fuzzy degree of the frame, the surface characteristics of the image can include the resolution of the image, the aspect ratio of the image, the mean and variance of the RGB of the image, the brightness of the image, the saturation of the image, or the sharpness of the image , One or more of the number of frames in a single frame image or the variance of the area of the frame in a single frame image. Deep features refer to the abstract features of the image extracted using the convolution kernel in the feature extraction model (such as CNN, etc.).
可选的,用户I/O模块101还可用于提供GUI,用于用户对训练图像集中训练图像的标注。Optionally, the user I/O module 101 may also be used to provide a GUI for the user to label the training images in the training image set.
可选的,用户I/O模块101还可用于提供各种预先内置的初始AI模型供用户选择。例如,用户可根据自己的任务目标在GUI上选择一个初始AI模型。Optionally, the user I/O module 101 may also be used to provide various pre-built initial AI models for the user to choose from. For example, users can select an initial AI model on the GUI according to their mission goals.
可选的,用户I/O模块101还可用于接收用户对初始AI模型、训练图像集中训练图像的各种配置信息等。Optionally, the user I/O module 101 may also be used to receive various configuration information of the initial AI model and training images in the training image set by the user.
难例挖掘模块102,用于在用户I/O模块101接收到的推理图像集中,确定出难例。难例挖掘模块102与推理模块104、用户I/O模块101均可以进行通信。难例挖掘模块102可以从推理模块104中,获取推理模块104对推理图像集中推理图像进行推理的推理结果,基于推理结果挖掘出推理图像集中的难例。难例挖掘模块102还可以向用户I/O模块101提供挖掘出的难例。The difficult case mining module 102 is used to determine difficult cases in the inference image set received by the user I/O module 101. The hard case mining module 102 can communicate with the inference module 104 and the user I/O module 101. The hard case mining module 102 can obtain the reasoning result of the reasoning module 104 on the reasoning image in the reasoning image set from the reasoning module 104, and mine the hard cases in the reasoning image set based on the reasoning result. The hard case mining module 102 can also provide the user I/O module 101 with the hard cases mined.
可选的,难例挖掘模块102还可用于从用户I/O模块101中,获取用户输入的推理图像集中推理图像的表层特征和深层特征。Optionally, the hard case mining module 102 can also be used to obtain the surface features and deep features of the reasoning image in the reasoning image set input by the user from the user I/O module 101.
模型训练模块103:用于对AI模型进行训练。模型训练模块103与用户I/O模块101、推理模块104、AI模型存储模块106均可以通信。具体处理为:Model training module 103: used to train the AI model. The model training module 103 can communicate with the user I/O module 101, the inference module 104, and the AI model storage module 106. The specific treatment is:
在本实施例中,初始AI模型包括未进行训练的AI模型、进行训练但是未基于难例优化的AI模型。未进行训练的AI模型指构建的AI模型还未使用训练图像集进行训练,构建的AI模型中的参数均是预设的数值。进行训练但是未基于难例优化的AI模型指已经能用于推理但是未基于难例优化的AI模型,可以包括两种,一种是用户直接在AI模型存储模块105中选择的初始AI模型,一种是仅使用训练图像集中带标注的训练图像对构建的AI模型进行训练获得的AI模型。可见AI平台可以从AI模型存储模块106中,获取初始AI模型,或者使用模型训练模块103对训练图像集合进行训练,获得初始AI 模型。In this embodiment, the initial AI model includes an AI model that has not been trained, and an AI model that has been trained but is not optimized based on difficult examples. The untrained AI model means that the constructed AI model has not been trained using the training image set, and the parameters in the constructed AI model are all preset values. AI models that are trained but not optimized based on difficult cases refer to AI models that can be used for reasoning but are not optimized based on difficult cases. There are two types, one is the initial AI model directly selected by the user in the AI model storage module 105, One is the AI model obtained by training the constructed AI model using only the annotated training images in the training image set. It can be seen that the AI platform can obtain the initial AI model from the AI model storage module 106, or use the model training module 103 to train the training image set to obtain the initial AI model.
初始AI模型为仅使用训练图像集中带标注的训练图像对构建的AI模型进行训练获得的AI模型,具体处理为:AI平台根据用户的任务目标为用户确定用于完成用户任务目标的构建的AI模型。模型训练模块103与用户I/O模块101和AI模型存储模块106均可以进行通信。模型训练模块103根据用户的任务目标在AI模型存储模块106中存储的AI模型库中选择一个现成的AI模型,作为构建的AI模型,或者模型训练模块103根据用户的任务目标,或者用户对任务目标的预期效果或者用户输入的一些配置参数,在AI模型库中搜索AI子模型结构,且指定一些AI模型的超参数,例如,模型的层数、每层神经元的个数等,进行AI模型构建,最终获得一个构建的AI模型。值得注意的是,AI模型的一些超参数可以是AI平台根据AI模型的构建和训练的经验确定的超参数。The initial AI model is an AI model obtained by training the constructed AI model using only the annotated training images in the training image set. The specific processing is: the AI platform determines the AI for the user to complete the construction of the user's task goal according to the user's task goal model. The model training module 103 can communicate with the user I/O module 101 and the AI model storage module 106. The model training module 103 selects a ready-made AI model from the AI model library stored in the AI model storage module 106 according to the user’s task goal, as the constructed AI model, or the model training module 103 according to the user’s task goal, or the user’s task The expected effect of the target or some configuration parameters input by the user, search the AI sub-model structure in the AI model library, and specify some hyperparameters of the AI model, for example, the number of layers of the model, the number of neurons in each layer, etc., for AI Model construction, and finally get a constructed AI model. It is worth noting that some hyperparameters of the AI model may be hyperparameters determined by the AI platform based on the experience of AI model construction and training.
模型训练模块103从用户I/O模块101获取训练图像集。模型训练模型103根据训练图像集的特点和构建的AI模型的结构确定构建的AI模型训练时的一些超参数。例如,迭代次数、学***台根据模型训练的经验确定的超参数。The model training module 103 obtains a training image set from the user I/O module 101. The model training model 103 determines some hyperparameters during training of the constructed AI model according to the characteristics of the training image set and the structure of the constructed AI model. For example, the number of iterations, learning rate, batch size, etc. After setting the hyperparameters, the model training module 103 uses the marked images in the acquired training image set to perform automatic training on the constructed AI model, and continuously updates the internal parameters of the constructed AI model during the training process to obtain the initial AI model. It is worth noting that some hyperparameters during training of the constructed AI model may be hyperparameters determined by the AI platform based on the experience of model training.
模型训练模块103将训练图像集中未标注的图像,输入至初始AI模型,输出未标注的图像的推理结果,模型训练模块103将推理结果传输至难例挖掘模块102,难例挖掘模块102基于推理结果,挖掘出未标注的图像中的难例,反馈给模型训练模块103。模型训练模块103使用难例,继续对初始AI模型进行优化训练,得到优化AI模型。模型训练模块103将优化AI模型提供给推理模块104,用于进行推理处理。此处需要说明的是,若初始AI模型是AI模型存储模块106中存储的初始AI模型,则训练图像集中训练图像可以为全部未标注的图像。若初始AI模型是构建的AI模型,则训练图像集中训练图像包括部分未标注的图像和部分带标注的图像。The model training module 103 inputs the unlabeled images in the training image set to the initial AI model and outputs the inference results of the unlabeled images. The model training module 103 transmits the inference results to the difficult case mining module 102, which is based on reasoning As a result, difficult examples in the unlabeled images are excavated and fed back to the model training module 103. The model training module 103 uses difficult examples to continue to optimize the training of the initial AI model to obtain an optimized AI model. The model training module 103 provides the optimized AI model to the inference module 104 for inference processing. It should be noted here that if the initial AI model is the initial AI model stored in the AI model storage module 106, the training images in the training image set may be all unlabeled images. If the initial AI model is a constructed AI model, the training images in the training image set include part of the unlabeled images and part of the labeled images.
推理模块104使用优化AI模型对推理图像集中推理图像进行推理,输出推理图像集中推理图像的推理结果。难例挖掘模块102从推理模块104中获取推理结果,基于推理结果,确定出推理图像集中的难例。模型训练模块103基于难例挖掘模块102提供的难例,继续对优化AI模型进行训练,得到更加优化的AI模型。模型训练模块103将更加优化的AI模型传输至AI模型存储模块106进行存储,将更加优化的AI模型传输至推理模块104用于进行推理处理。此处需要说明的是,在推理模块104对推理图像集中推理图像进行推理,得到难例,再优化优化的AI模型时,实际上与使用训练图像中的难例对初始AI模型的优化过程相同,此时推理图像中的难例作为训练图像。The inference module 104 uses the optimized AI model to perform inference on the inference image in the inference image concentration, and outputs the inference result of the inference image in the inference image concentration. The hard case mining module 102 obtains the reasoning result from the reasoning module 104, and based on the reasoning result, determines the hard case in the reasoning image set. The model training module 103 continues to train the optimized AI model based on the difficult examples provided by the difficult example mining module 102 to obtain a more optimized AI model. The model training module 103 transmits the more optimized AI model to the AI model storage module 106 for storage, and transmits the more optimized AI model to the inference module 104 for inference processing. What needs to be explained here is that when the inference module 104 performs inference on the inference images in the inference image and obtains the difficult cases, and then optimizes the optimized AI model, it is actually the same as using the difficult cases in the training images to optimize the initial AI model. At this time, the difficult examples in the inference image are used as training images.
可选的,模型训练模块103还可用于将用户在GUI上选择的AI模型确定为初始AI模型。或者将用户在GUI上输入的AI模型确定为初始AI模型。Optionally, the model training module 103 may also be used to determine the AI model selected by the user on the GUI as the initial AI model. Or the AI model input by the user on the GUI is determined as the initial AI model.
可选的,初始AI模型还可以包括使用训练图像集中的图像对AI模型存储模块106中的AI模型训练后的AI模型。Optionally, the initial AI model may also include an AI model after training the AI model in the AI model storage module 106 using images in the training image set.
推理模块104,用于根据AI模型对推理图像集中推理图像进行推理,得到推理结果。推理模块104与难例挖掘模块102、用户I/O模块101、AI模型存储模块105均可以进行通信。推理模块104从用户I/O模块101获取推理图像集中推理图像,对推理图像集中 推理图像进行推理处理,得到推理图像集中推理图像的推理结果。推理模块104将推理结果传输至难例挖掘模块102,以使难例挖掘模块102基于推理结果,挖掘推理图像集中的难例。The inference module 104 is used to perform inference on the inference image in the inference image based on the AI model to obtain the inference result. The reasoning module 104 can communicate with the difficult case mining module 102, the user I/O module 101, and the AI model storage module 105. The inference module 104 obtains the centralized inference image of the inference image from the user I/O module 101, performs inference processing on the centralized inference image of the inference image, and obtains the inference result of the inference image in the centralized inference image. The reasoning module 104 transmits the reasoning result to the hard case mining module 102, so that the hard case mining module 102 mines the hard cases in the reasoning image set based on the reasoning result.
数据预处理模块105,用于对用户I/O模块101接收到的训练图像集中训练图像和推理图像集中推理图像进行预处理操作。数据预处理模块105可从数据存储模块107读取用户I/O模块101接收到的训练图像集或推理图像集,进而对推理图像集中推理图像或者训练图像集中训练图像进行预处理。对用户上传的训练图像集中训练图像或推理图像集中推理图像进行预处理可使得训练图像集中训练图像或推理图像集中推理图像中在尺寸上具有一致性,还可以去除训练图像集中训练图像或推理图像集中推理图像中不恰当的数据。预处理后的训练图像集可适合用于对构建的AI模型的训练或者对初始AI模型进行训练,还可使训练的效果更优。预处理后的推理图像集中推理图像可适合于输入至第二AI模型中进行推理处理。数据预处理模块105对训练图像集中训练图像或者推理图像集中推理图像进行预处理完成后,将预处理完成的训练图像集或者推理图像集存储至数据存储模块107。或者将预处理后的训练图像集发送至模型训练模块103,将预处理后的推理图像集发送至推理模块104。应理解,在另一个实施例中,数据存储模块107也可作为数据预处理模块105中的一部分,即使数据预处理模块105有存储图像的功能。The data preprocessing module 105 is configured to perform preprocessing operations on the training images in the training image set and the inference image in the inference image set received by the user I/O module 101. The data preprocessing module 105 can read the training image set or the inference image set received by the user I/O module 101 from the data storage module 107, and then preprocess the inference image in the inference image set or the training image in the training image set. Preprocessing the training images in the training images or the inference images in the inference images in the training images uploaded by the user can make the training images in the training images or the inference images in the inference images have consistency in size, and can also remove the training images or inference images in the training images. Focus on the inappropriate data in the image. The preprocessed training image set can be suitable for training the constructed AI model or training the initial AI model, and can also make the training effect better. The preprocessed inference image and the centralized inference image may be suitable for input to the second AI model for inference processing. After the data preprocessing module 105 performs preprocessing on the training image in the training image set or the inference image in the inference image set, the preprocessed training image set or the inference image set is stored in the data storage module 107. Or the preprocessed training image set is sent to the model training module 103, and the preprocessed inference image set is sent to the inference module 104. It should be understood that in another embodiment, the data storage module 107 can also be used as a part of the data preprocessing module 105, even if the data preprocessing module 105 has the function of storing images.
AI模型存储模块106:用于存储初始AI模型、优化AI模型和AI子模型结构等,也可以用于存储根据AI子模型结构确定构建的AI模型。AI模型存储模块106与用户I/O模块101、模型训练模块103均可以进行通信。AI模型存储模块106接收并存储模型训练模块103传输的训练完成的初始AI模型和优化AI模型。AI模型存储模块106为模型训练模块103提供构建的AI模型或者初始AI模型。AI模型存储模块106对用户I/O模块101接收到的用户上传的初始AI模型,进行存储。应理解,在另一个实施例中,AI模型存储模块106也可作为模型训练模块103中的一部分。AI model storage module 106: used to store the initial AI model, optimized AI model, and AI sub-model structure, etc., and can also be used to store the AI model determined and constructed according to the AI sub-model structure. The AI model storage module 106 can communicate with the user I/O module 101 and the model training module 103. The AI model storage module 106 receives and stores the trained initial AI model and the optimized AI model transmitted by the model training module 103. The AI model storage module 106 provides the constructed AI model or the initial AI model for the model training module 103. The AI model storage module 106 stores the initial AI model uploaded by the user and received by the user I/O module 101. It should be understood that, in another embodiment, the AI model storage module 106 may also be used as a part of the model training module 103.
数据存储模块107(如:可以是云服务提供商提供的对象存储服务(Object Storage Service,OBS)对应的数据存储资源):用于存储用户上传的训练图像集和推理图像集,也用于存储数据预处理模块105处理后的数据。The data storage module 107 (for example, it can be the data storage resource corresponding to the Object Storage Service (OBS) provided by the cloud service provider): used to store the training image set and inference image set uploaded by the user, and also used for storage Data processed by the data preprocessing module 105.
需要说明的是,本申请中的AI平台可以是一个可以与用户交互的***,这个***可以是软件***也可以是硬件***,也可以是软硬结合的***,本申请中不进行限定。It should be noted that the AI platform in this application can be a system that can interact with users. This system can be a software system, a hardware system, or a combination of software and hardware, which is not limited in this application.
由于上述各模块的功能,本申请实施例提供的AI平台可向用户提供训练AI模型的业务,使得AI平台可以提供训练后的优化AI模型。该AI平台可以从未标注的图像中挖掘出难例,进一步基于难例继续训练初始AI模型,得到优化AI模型,使AI模型进行推理的推理结果更准确。Due to the functions of the aforementioned modules, the AI platform provided by the embodiments of the present application can provide users with services for training AI models, so that the AI platform can provide optimized AI models after training. The AI platform can dig out difficult cases from unlabeled images, and further train the initial AI model based on the difficult cases to obtain an optimized AI model, so that the reasoning results of the AI model are more accurate.
图2为本申请实施例提供的一种AI平台100的应用场景示意图,如图2所示,在一种实施例中,AI平台100可全部部署在云环境中。云环境是云计算模式下利用基础资源向用户提供云服务的实体。云环境包括云数据中心和云服务平台,云数据中心包括云服务提供商拥有的大量基础资源(包括计算资源、存储资源和网络资源),云数据中心包括的计算资源可以是大量的计算设备(例如服务器)。AI平台100可以独立地部署在云数据 中心中的服务器或虚拟机上,AI平台100也可以分布式地部署在云数据中心中的多台服务器上、或者分布式地部署在云数据中心中的多台虚拟机上、再或者分布式地部署在云数据中心中的服务器和虚拟机上。如图2所示,AI平台100由云服务提供商在云服务平台抽象成一种AI云服务提供给用户,用户在云服务平台购买该云服务后(可预充值再根据最终资源的使用情况进行结算),云环境利用部署在云数据中心的AI平台100向用户提供AI平台云服务。在使用AI平台云服务时,用户可以通过应用程序接口(application program interface,API)或者GUI确定要AI模型完成的任务、上传训练图像集和推理图像集至云环境,云环境中的AI平台100接收用户的任务信息、训练图像集和推理图像集,执行数据预处理、AI模型训练、使用训练完成的AI模型对推理图像集中推理图像进行推理,进行难例挖掘以及基于挖掘出的难例重新训练AI模型等操作。AI平台通过API或者GUI向用户返回挖掘出的难例等内容。用户进一步选择是否要基于难例重新训练AI模型。训练完成的AI模型可被用户下载或者在线使用,用于完成特定的任务。FIG. 2 is a schematic diagram of an application scenario of an AI platform 100 provided by an embodiment of the application. As shown in FIG. 2, in an embodiment, the AI platform 100 may be all deployed in a cloud environment. The cloud environment is an entity that uses basic resources to provide cloud services to users in the cloud computing mode. The cloud environment includes cloud data centers and cloud service platforms. Cloud data centers include a large number of basic resources (including computing resources, storage resources, and network resources) owned by cloud service providers. The computing resources included in cloud data centers can be a large number of computing equipment ( For example server). The AI platform 100 can be independently deployed on a server or a virtual machine in a cloud data center. The AI platform 100 can also be deployed on multiple servers in a cloud data center or distributed in a cloud data center. Multiple virtual machines, or distributed deployment on servers and virtual machines in the cloud data center. As shown in Figure 2, the AI platform 100 is abstracted by the cloud service provider into an AI cloud service provided to users on the cloud service platform. After the user purchases the cloud service on the cloud service platform (pre-rechargeable and then based on the final resource usage) Settlement), the cloud environment uses the AI platform 100 deployed in the cloud data center to provide users with AI platform cloud services. When using the AI platform cloud service, the user can determine the tasks to be completed by the AI model through the application program interface (API) or GUI, upload the training image set and the reasoning image set to the cloud environment, the AI platform in the cloud environment 100 Receive user task information, training image set, and reasoning image set, perform data preprocessing, AI model training, use the trained AI model to reason about the reasoning image in the reasoning image set, perform hard case mining and rebuild based on the hard cases mined Training AI models and other operations. AI platform returns content such as difficult cases unearthed to users through API or GUI. The user further chooses whether to retrain the AI model based on difficult cases. The trained AI model can be downloaded by users or used online to complete specific tasks.
在本申请的另一种实施例中,云环境下的AI平台100抽象成AI云服务向用户提供时,可分为两部分,即:基础AI云服务和AI难例挖掘云服务。用户在云服务平台可先仅购买基础AI云服务,在需要使用AI难例挖掘云服务时再进行购买,购买后由云服务提供商提供AI难例挖掘云服务API,最终按照调用API的次数对AI难例挖掘云服务进行额外计费。In another embodiment of the present application, when the AI platform 100 in a cloud environment is abstracted into an AI cloud service provided to users, it can be divided into two parts, namely: a basic AI cloud service and an AI hard case mining cloud service. Users can purchase only basic AI cloud services on the cloud service platform, and then purchase them when they need to use AI to mine cloud services. After purchase, the cloud service provider will provide AI to mine cloud service APIs, and finally according to the number of API calls Additional billing for AI difficult mining cloud services.
本申请提供的AI平台100的部署较为灵活,如图3所示,在另一种实施例中,本申请提供的AI平台100还可以分布式地部署在不同的环境中。本申请提供的AI平台100可以在逻辑上分成多个部分,每个部分具有不同的功能。例如,在一种实施例中AI平台100包括用户I/O模块101、难例挖掘模块102、模型训练模块103、AI模型存储模块105和数据存储模块106。AI平台100中的各部分可以分别部署在终端计算设备、边缘环境和云环境中的任意两个或三个环境中。终端计算设备包括:终端服务器、智能手机、笔记本电脑、平板电脑、个人台式电脑、智能摄相机等。边缘环境为包括距离终端计算设备较近的边缘计算设备集合的环境,边缘计算设备包括:边缘服务器、拥有计算能力的边缘小站等。部署在不同环境或设备的AI平台100的各个部分协同实现为用户提供构建的AI模型确定和训练等功能。例如,在一种场景中,终端计算设备中部署AI平台100中的用户I/O模块101、数据存储模块106和数据预处理模块107,边缘环境的边缘计算设备中部署AI平台100中的难例挖掘模块102、模型训练模块103、推理模块104和AI模型存储模块105。用户将训练图像集和推理图像集发送至终端计算设备中的用户I/O模块101,终端计算设备将训练图像集和推理图像集存储至数据存储模块106,数据预处理模块102对训练图像集中训练图像和推理图像集中推理图像进行预处理,将预处理后的训练图像集中训练图像和推理图像集中推理图像也存储在数据存储模块106。边缘计算设备中模型训练模块103根据用户的任务目标确定构建的AI模型,基于构建的AI模型和训练图像集中训练图像训练得到初始AI模型,进一步基于训练图像集中的未标注的图像中的难例和初始AI模型训练得到优化初始AI模型。可选的,难例挖掘模块102还可以基于优化AI模型挖掘出推理图像集中包括的难例。模型训练模块103基于难例对优化AI模型进行训练,获得更加优化的AI模型。应理解,本申请不对AI平台100的哪些部分部署具体部署在什么环境进行限制性的划分,实际应用时可根据终端计算设备的计算 能力、边缘环境和云环境的资源占有情况或具体应用需求进行适应性的部署。The deployment of the AI platform 100 provided in the present application is relatively flexible. As shown in FIG. 3, in another embodiment, the AI platform 100 provided in the present application can also be deployed in different environments in a distributed manner. The AI platform 100 provided in this application can be logically divided into multiple parts, and each part has a different function. For example, in one embodiment, the AI platform 100 includes a user I/O module 101, a difficult example mining module 102, a model training module 103, an AI model storage module 105, and a data storage module 106. Each part of the AI platform 100 can be respectively deployed in any two or three environments among the terminal computing device, the edge environment, and the cloud environment. Terminal computing devices include: terminal servers, smart phones, notebook computers, tablet computers, personal desktop computers, smart cameras, etc. The edge environment is an environment that includes a collection of edge computing devices that are closer to the terminal computing device. The edge computing device includes: edge servers, edge small stations with computing capabilities, and so on. The various parts of the AI platform 100 deployed in different environments or devices are collaboratively implemented to provide users with functions such as determining and training the constructed AI model. For example, in a scenario, the user I/O module 101, the data storage module 106, and the data preprocessing module 107 in the AI platform 100 are deployed in the terminal computing device. It is difficult to deploy the AI platform 100 in the edge computing device in the edge environment. Example mining module 102, model training module 103, inference module 104, and AI model storage module 105. The user sends the training image set and the inference image set to the user I/O module 101 in the terminal computing device. The terminal computing device stores the training image set and the inference image set in the data storage module 106, and the data preprocessing module 102 collects the training image The training image and the inference image are preprocessed in the centralized inference image, and the preprocessed training image is also stored in the data storage module 106. The model training module 103 in the edge computing device determines the constructed AI model according to the user’s task goals, and obtains the initial AI model based on the constructed AI model and training image set training images, and further based on the difficult examples in the unlabeled images in the training image set And the initial AI model training to get the optimized initial AI model. Optionally, the difficult example mining module 102 may also mine the difficult examples included in the reasoning image set based on the optimized AI model. The model training module 103 trains the optimized AI model based on difficult cases to obtain a more optimized AI model. It should be understood that this application does not restrict the deployment of which parts of the AI platform 100 are deployed in which environment, and the actual application can be carried out according to the computing capabilities of the terminal computing device, the resource occupancy of the edge environment and the cloud environment, or specific application requirements. Adaptive deployment.
AI平台100也可以单独部署在任意环境中的一个计算设备上(如单独部署在边缘环境的一个边缘服务器上)。图4为部署有AI平台100的计算设备400的硬件结构示意图,图4所示的计算设备400包括存储器401、处理器402、通信接口403以及总线404。其中,存储器401、处理器402、通信接口403通过总线404实现彼此之间的通信连接。The AI platform 100 may also be separately deployed on a computing device in any environment (for example, separately deployed on an edge server in an edge environment). 4 is a schematic diagram of the hardware structure of a computing device 400 on which the AI platform 100 is deployed. The computing device 400 shown in FIG. 4 includes a memory 401, a processor 402, a communication interface 403, and a bus 404. Among them, the memory 401, the processor 402, and the communication interface 403 realize the communication connection between each other through the bus 404.
存储器401可以是只读存储器(ROM),随机存取存储器(RAM),硬盘,快闪存储器或其任意组合。存储器401可以存储程序,当存储器401中存储的程序被处理器402执行时,处理器402和通信接口403用于执行AI平台100为用户训练AI模型、挖掘难例,进一步基于难例优化AI模型的方法。存储器还可以存储图像集。例如,存储器401中的一部分存储资源被划分成一个数据存储模块106,用于存储AI平台100所需的数据,存储器401中的一部分存储资源被划分成一个AI模型存储模块105,用于存储AI模型库。The memory 401 may be a read only memory (ROM), a random access memory (RAM), a hard disk, a flash memory or any combination thereof. The memory 401 can store programs. When the programs stored in the memory 401 are executed by the processor 402, the processor 402 and the communication interface 403 are used to execute the AI platform 100 to train AI models for users, mine difficult examples, and further optimize the AI model based on the difficult examples Methods. The memory can also store image collections. For example, a part of the storage resources in the memory 401 is divided into a data storage module 106 for storing data required by the AI platform 100, and a part of the storage resources in the memory 401 is divided into an AI model storage module 105 for storing AI Model library.
处理器402可以采用中央处理器(CPU),应用专用集成电路(ASIC),图形处理器(GPU)或其任意组合。处理器402可以包括一个或多个芯片。处理器402可以包括AI加速器,例如神经网络处理器(neural processing unit,NPU)。The processor 402 may adopt a central processing unit (CPU), an application specific integrated circuit (ASIC), a graphics processing unit (GPU) or any combination thereof. The processor 402 may include one or more chips. The processor 402 may include an AI accelerator, such as a neural network processor (neural processing unit, NPU).
通信接口403使用例如收发器一类的收发模块,来实现计算设备400与其他设备或通信网络之间的通信。例如,可以通过通信接口403获取数据。The communication interface 403 uses a transceiver module such as a transceiver to implement communication between the computing device 400 and other devices or a communication network. For example, data can be obtained through the communication interface 403.
总线404可包括在计算设备400各个部件(例如,存储器401、处理器402、通信接口403)之间传送信息的通路。The bus 404 may include a path for transferring information between various components of the computing device 400 (for example, the memory 401, the processor 402, and the communication interface 403).
随着AI技术的发展,AI技术被广泛应用于众多领域中,例如,AI技术被应用于车辆的自助驾驶和辅助驾驶领域中,具体是进行车道线识别、红绿灯识别、自动识别停车位、检测人行道等处理。这些处理总结起来可以认为是使用AI平台中的AI模型进行图像分类和/或目标检测。例如,使用AI模型确定红绿灯识别,使用AI模型识别车道线。图像分类主要用于判断图像所属的类别(即输入一帧图像,输出图像所属的类别)。目标检测可以包括两个方面,一方面用于判断属于某个特定类别的目标是否出现在图像中,另一方面是用于对目标进行定位(即确定目标出现在图像中的位置)。本申请实施例将以图像分类和目标检测为例说明在AI平台中如何提供AI模型。With the development of AI technology, AI technology is widely used in many fields. For example, AI technology is used in the field of self-driving and assisted driving of vehicles, specifically for lane line recognition, traffic light recognition, automatic parking spot recognition, and detection Sidewalks and other processing. In summary, these processes can be considered as using the AI model in the AI platform for image classification and/or target detection. For example, the AI model is used to determine traffic light recognition, and the AI model is used to recognize lane lines. Image classification is mainly used to determine the category of the image (that is, input a frame of image, output the category of the image). Target detection can include two aspects. On the one hand, it is used to determine whether a target belonging to a certain category appears in the image, and on the other hand, it is used to locate the target (that is, to determine where the target appears in the image). This embodiment of the application will take image classification and target detection as examples to illustrate how to provide an AI model in an AI platform.
下面结合图5描述在一种实施例中提供AI模型的方法的具体流程,以该方法由AI平台执行为例进行说明:The following describes the specific process of the method of providing an AI model in an embodiment with reference to FIG. 5, taking the method executed by the AI platform as an example for description:
步骤501,AI平台接收第一用户的未标注的多个图像。Step 501: The AI platform receives multiple unlabeled images of the first user.
其中,第一用户为在AI平台注册账号的实体的用户。例如,AI模型的开发者等。Among them, the first user is a user of an entity who has registered an account on the AI platform. For example, developers of AI models, etc.
本实施例中,第一用户想要在AI平台上获得AI模型,可以将未标注的多个图像设置在一个文件夹中,然后打开AI平台提供的图像的上传界面。在上传界面中包括图像的输入位置,第一用户可以在图像的输入位置处添加训练图像集的存储位置,将未标注的多个图像上传至AI平台。这样,AI平台可以接收到第一用户的未标注的多个图像。In this embodiment, if the first user wants to obtain an AI model on the AI platform, he can set multiple unlabeled images in a folder, and then open the image upload interface provided by the AI platform. The upload interface includes the input location of the image. The first user can add the storage location of the training image set at the input location of the image, and upload multiple unlabeled images to the AI platform. In this way, the AI platform can receive multiple unlabeled images of the first user.
如图6所示,在该上传界面中还显示有标识(用于标记本次上传的图像)、标注类型(用于指示使用图像训练的AI模型的用途,如目标检测或图像分类等)、创建时间、图像输入位置、图像的标签集(如人、车等)、名称(如目标、物体等)、描述、版本名称等。As shown in Figure 6, the upload interface also displays the logo (used to mark the image uploaded this time), the annotation type (used to indicate the purpose of the AI model trained with the image, such as target detection or image classification, etc.), Creation time, image input location, image tag set (such as person, car, etc.), name (such as target, object, etc.), description, version name, etc.
步骤502,AI平台根据初始AI模型标注多个图像。In step 502, the AI platform annotates multiple images according to the initial AI model.
本实施例中,AI平台可以获取初始AI模型,然后将未标注的多个图像输入至初始AI模型中,获得未标注的多个图像的标注结果。此处若初始AI模型用于图像分类,则图像的标注结果为图像所属的类别,如图像为苹果的图像,则图像所属的类别为苹果。若初始AI模型用于目标检测,则图像的标注结果为图像中包括的目标的边界框的位置以及目标所属的类别,其中目标可以是图像中包括的物体,如车、人、猫等。若初始AI模型既用于图像分类,也用于目标检测,则图像的标注结果为图像所属的类别、图像中包括的目标的边界框的位置以及目标所属的类别。In this embodiment, the AI platform can obtain the initial AI model, and then input multiple unlabeled images into the initial AI model to obtain labeling results of the multiple unlabeled images. Here, if the initial AI model is used for image classification, the annotation result of the image is the category to which the image belongs. If the image is an image of apple, the category to which the image belongs is apple. If the initial AI model is used for target detection, the labeling result of the image is the position of the bounding box of the target included in the image and the category to which the target belongs, where the target may be an object included in the image, such as a car, a person, a cat, etc. If the initial AI model is used for both image classification and target detection, the labeling result of the image is the category to which the image belongs, the position of the bounding box of the target included in the image, and the category to which the target belongs.
可选的,第一用户也可以上传带标注的图像至AI平台。Optionally, the first user can also upload annotated images to the AI platform.
步骤503,AI平台根据标注结果确定多个图像中的难例。Step 503: The AI platform determines difficult cases in multiple images according to the annotation results.
本实施例中,AI平台在获取未标注的多个图像的标注结果后,可以根据标注结果,在未标注的多个图像中,确定其中包括的难例(难例的概念在前文中有解释,此处不再赘述)。In this embodiment, after the AI platform obtains the labeling results of the unlabeled images, it can determine the difficult examples included in the unlabeled images according to the labeling results (the concept of difficult examples is explained in the previous section). , I won’t repeat it here).
步骤504,AI平台利用难例训练初始AI模型以获得优化AI模型。In step 504, the AI platform trains the initial AI model using difficult examples to obtain an optimized AI model.
本实施例中,AI平台在确定出难例后,可以使用难例继续训练初始AI模型,具体处理是:将难例中的部分难例输入初始AI模型中,得到输出结果。确定该输出结果与难例的标注结果进行差别,基于差别,调整初始AI模型中的参数,继续使用难例中的另一部分难例,循环执行上述过程,直到难例全部用于训练,或者直到优化AI模型预测出的结果与标注结果的差别小于一定阈值,则认为获得优化AI模型。In this embodiment, the AI platform can use the difficult examples to continue training the initial AI model after determining the difficult examples. The specific processing is: input some of the difficult examples into the initial AI model to obtain output results. Determine the difference between the output result and the labeling result of the difficult case. Based on the difference, adjust the parameters in the initial AI model, continue to use another part of the difficult case, and repeat the above process until all the difficult cases are used for training, or until If the difference between the result predicted by the optimized AI model and the annotation result is less than a certain threshold, it is considered that the optimized AI model is obtained.
可选的,本申请中还提供了获得步骤502中的初始AI模型的方法,以及基于初始AI模型标注多个未标注图像的处理,处理如下:Optionally, this application also provides a method for obtaining the initial AI model in step 502, and the processing of labeling multiple unlabeled images based on the initial AI model, and the processing is as follows:
AI平台向第一用户提供标注选择界面,标注选择界面上包括第一用户可选择的至少一种标注方式。接收第一用户选择的标注方式,根据第一用户选择的标注方式对应的初始AI模型标注多个未标注的图像。The AI platform provides a label selection interface to the first user, and the label selection interface includes at least one labeling method selectable by the first user. Receive the labeling mode selected by the first user, and label a plurality of unlabeled images according to the initial AI model corresponding to the labeling mode selected by the first user.
本实施例中,第一用户提供未标注的多个图像后,第一用户对未标注的多个图像中部分图像进行标注,剩下的图像若不想进行标注,第一用户可以点击智能标注选项,触发进入标注选择界面。或者第一用户提供未标注的多个图像后,第一用户可以不进行标注,而直接点击智能标注选项,触发进入标注选择界面。标注选择界面中提供一种或多种标注方式。若仅存在一种标注方式,在标注选择界面中显示是否选择标注方式的选项。若第一用户要选择标注方式,则可以点击“是”选项,可以触发选择标注方式,若第一用户不选择标注方式,则可以点击“否”选项,不会选择标注方式。若标注选择界面中提供多种标注方式,在标注选择界面中可以显示出这多种标注方式,并对应每种标注方式显示选择选项,第一用户可以通过标注方式对应的选择选项,选择想要使用的标注方式,然后进行提交,这样AI平台会接收到用户选择的标注方式。In this embodiment, after the first user provides multiple unlabeled images, the first user labels some of the unlabeled images. If the remaining images do not want to be labeled, the first user can click the smart label option , Trigger to enter the label selection interface. Or after the first user provides multiple unlabeled images, the first user may directly click the smart label option without labeling to trigger the entry of the label selection interface. One or more labeling methods are provided in the label selection interface. If there is only one labeling method, the option of whether to select the labeling method is displayed in the labeling selection interface. If the first user wants to select the labeling method, he can click the "Yes" option to trigger the selection of the labeling method. If the first user does not select the labeling method, he can click the "No" option and the labeling method will not be selected. If multiple labeling methods are provided in the labeling selection interface, these multiple labeling methods can be displayed in the labeling selection interface, and the selection options corresponding to each labeling method are displayed. The first user can select the desired options through the corresponding selection options of the labeling method. The labeling method used, and then submit, so that the AI platform will receive the labeling method selected by the user.
如图7所示,标注选择界面中提供多种标注方式可以包括主动学***台的处理为:AI平台首先使用第一用户提供的带标注的多个图像训练构建的AI模型,得到初始AI模型。然后基于初始AI模型标注未标注的多个图像,获得多个图像的标注结果。预标注方式下AI平台的处理为:AI平台直接获取已有的初始AI模型,基于初始AI模标注未标注的多个图像,获得多个图像的标注结果。另外,在标 注选择界面中还显示有全部的图像的数目、未标注的图像的数目、带标注的图像的数目、待确认的数目(待确认指待用户确认难例的数目)。As shown in Fig. 7, multiple labeling methods provided in the label selection interface may include active learning methods and pre-labeling methods. The processing of the AI platform in the active learning mode is: the AI platform first uses the AI model constructed by training and constructing multiple labeled images provided by the first user to obtain the initial AI model. Then, based on the initial AI model, the unlabeled images are annotated, and the annotation results of the images are obtained. The processing of the AI platform in the pre-labeling mode is: the AI platform directly obtains the existing initial AI model, annotates multiple unlabeled images based on the initial AI model, and obtains the labeling results of multiple images. In addition, the label selection interface also displays the number of all images, the number of unlabeled images, the number of labeled images, and the number to be confirmed (to be confirmed refers to the number of difficult cases to be confirmed by the user).
若AI平台接收到的第一用户选择的标注方式为主动学***台可以使用第一用户提供的带标注的多个图像训练构建的AI模型,得到初始AI模型。然后AI平台将未标注的多个图像输入至初始AI模型,得到未标注的多个图像的标注结果。此处需要说明的是,带标注的多个图像可以是第一用户对未标注的多个图像中的一些图像进行标注,得到带标注的多个图像,还可以是第一用户直接提供的带标注的多个图像。If the labeling mode selected by the first user received by the AI platform is an active learning mode, the AI platform may use the AI model constructed by training and constructing multiple labeled images provided by the first user to obtain the initial AI model. Then the AI platform inputs multiple unlabeled images into the initial AI model, and obtains the labeling results of the multiple unlabeled images. It should be noted here that the multiple images with annotations may be the first user annotating some of the unlabeled images to obtain multiple images with annotations, or they may be directly provided by the first user. Annotated multiple images.
若AI平台接收到的第一用户选择的标注方式为预标注方式,则AI平台可以直接获取初始AI模型(该初始AI模型可以是第一用户上传的AI模型,也可以是AI平台中预置的AI模型)。然后AI平台将未标注的多个图像输入至初始AI模型,得到未标注的多个图像的标注结果。If the labeling method selected by the first user received by the AI platform is the pre-labeling method, the AI platform can directly obtain the initial AI model (the initial AI model can be the AI model uploaded by the first user, or it can be preset in the AI platform AI model). Then the AI platform inputs multiple unlabeled images into the initial AI model, and obtains the labeling results of the multiple unlabeled images.
另外,本申请实施例中,还提供了第一用户在AI平台对训练图像进行标注的过程,具体处理是:在第一用户选择自己标注未标注的图像时,第一用户可以确定自己所要训练的AI模型是用于图像分类的场景中,还是用于目标检测的场景中,还是用户图像分类和目标检测结合的场景中。若所要训练的AI模型应用于图像分类的场景中,则启动AI平台提供图像分类的场景的图像标注界面。若所要训练的AI模型应用于目标检测的场景中,则启动AI平台提供的目标检测的场景的图像标注界面。如图8所示,在目标检测的场景的图像标注界面中,提供了选择图像、边界框、返回键、放大图像、缩小图像等选项,第一用户可以通过选择图像选项,打开一帧图像。然后在该图像中,使用边界框标注目标,并为目标添加标注,标注可以包括边界框中目标的类别、边界框在图像中的位置(由于边界框一般的矩形,所以位置可以使用左上角和右下角的位置坐标标识)。在第一用户使用边界框标注目标后,AI平台则会获得目标的边界框,图像标注界面中还可以显示出该图像的标注信息栏。在标注信息栏显示有第一用户已经标注的目标的信息,包括标注、边界框和操作,标注用于指示目标所属的类别,边界框用于指示所使用的框的形状,操作包括删除和修改选项。第一用户可以通过操作对该图像中已添加的标注进行修改。In addition, in the embodiment of this application, a process for the first user to label training images on the AI platform is also provided. The specific processing is: when the first user chooses to label unlabeled images by himself, the first user can determine what he wants to train Is the AI model used in the scene of image classification, in the scene of target detection, or in the scene of the combination of user image classification and target detection. If the AI model to be trained is applied to the scene of image classification, the AI platform is started to provide an image annotation interface of the scene of image classification. If the AI model to be trained is applied to the scene of target detection, the image annotation interface of the scene of target detection provided by the AI platform is started. As shown in Figure 8, in the image annotation interface of the target detection scene, options such as selecting image, bounding box, return key, zooming in image, and zooming out image are provided. The first user can open a frame of image by selecting the image option. Then in the image, use the bounding box to label the target, and add a label to the target. The label can include the category of the target in the bounding box and the position of the bounding box in the image (because the bounding box is generally rectangular, the position can use the upper left corner and The position coordinate mark in the lower right corner). After the first user uses the bounding box to label the target, the AI platform will obtain the bounding box of the target, and the labeling information column of the image can also be displayed in the image labeling interface. The label information column displays the information of the target that the first user has labelled, including label, bounding box and operation. The label is used to indicate the category of the target, and the bounding box is used to indicate the shape of the used frame. The operations include deletion and modification. Options. The first user can modify the added annotations in the image through operations.
需要说明的是,上述边界框为能将目标全部包围的矩形框。It should be noted that the above-mentioned bounding box is a rectangular box that can completely surround the target.
另外,本申请中,还可以使用带标注的图像训练处初始AI模型,具体处理是:In addition, in this application, the initial AI model of the image training department with annotations can also be used, and the specific processing is as follows:
从第一用户获取带标注的一个或多个图像。利用带标注的一个或多个图像,获得初始AI模型。Obtain one or more tagged images from the first user. Use one or more labeled images to obtain the initial AI model.
本实施例中,第一用户在提供未标注的多个图像时,还可以提供带标注的一个或多个图像,二者可以是一起上传,也可以是先上传带标注的一个或多个图像,再上传未标注的多个图像。AI平台可以获取预选的AI模型,预选的AI模型可以是用户选择的AI模型(可以包括用户上传的AI模型或用户在AI平台中选取的AI模型),也可以是AI平台基于本次的任务目标选择的AI模型。In this embodiment, when the first user provides multiple unlabeled images, he can also provide one or more labeled images. The two can be uploaded together, or one or more labeled images can be uploaded first. , And upload multiple unlabeled images. The AI platform can obtain the pre-selected AI model. The pre-selected AI model can be the AI model selected by the user (which can include the AI model uploaded by the user or the AI model selected by the user in the AI platform), or the AI platform based on this task AI model for target selection.
然后AI平台使用带标注的一个或多个图像,对预选的AI模型进行训练,获得初始AI模型(训练过程参见监督训练的过程)。Then the AI platform uses one or more labeled images to train the pre-selected AI model to obtain the initial AI model (see the process of supervised training for the training process).
可选的,在步骤503之后,AI平台还可以将难例提供给第一用户,使第一用户进一 步确认AI平台筛选出的候选难例是不是难例,具体处理为:Optionally, after step 503, the AI platform may also provide the difficult cases to the first user, so that the first user can further confirm whether the candidate difficult cases selected by the AI platform are difficult cases. The specific processing is as follows:
AI平台向第一用户提供确认界面,在确认界面中向第一用户展示候选难例,候选难例为多个图像中的至少一个图像。根据第一用户在确认界面上的操作,确定候选难例中的难例。The AI platform provides a confirmation interface to the first user, and displays the candidate difficult example to the first user in the confirmation interface. The candidate difficult example is at least one image among multiple images. According to the operation of the first user on the confirmation interface, the hard examples among the candidate hard examples are determined.
本实施例中,AI平台基于标注结果,确定出未标注的多个图像中的候选难例后(候选难例指仅经过标注结果,确定出未标注的多个图像中的一个或多个图像,且还未经过第一用户确认)。AI平台可以将候选难例提供给第一用户,提供方式为向第一用户提供一个确认界面,在确认界面中向第一用户展示未标注的多个图像中的候选难例。第一用户可以打开任意一个候选难例,然后可以主观判断该候选难例的标注结果是否正确,若该候选难例的标注结果正确,则可以进行确认操作,AI平台可以接收到确认操作,确定该候选难例为难例。这样,由于提供给第一用户确认,使得确定出的难例更正确。In this embodiment, the AI platform determines the candidate difficult examples in the multiple unlabeled images based on the annotation results (the candidate difficult examples refer to the identification of one or more images in the multiple unlabeled images only after the annotation results , And has not been confirmed by the first user). The AI platform can provide the candidate difficult examples to the first user by providing a confirmation interface to the first user, in which the first user is shown the candidate difficult examples in the multiple unlabeled images. The first user can open any candidate difficult case, and then can subjectively judge whether the labeling result of the candidate difficult case is correct, if the labeling result of the candidate difficult case is correct, the confirmation operation can be performed, and the AI platform can receive the confirmation operation and confirm The candidate hard case is a hard case. In this way, since confirmation is provided to the first user, the determined difficult cases are more correct.
另外,在第一用户确定某些难例的标注结果不正确时,还可以对这些难例的标注结果进行矫正,该处理可以是在步骤503之后,也可以是在用户确认难例后,处理如下:In addition, when the first user determines that the labeling results of some difficult cases are incorrect, the labeling results of these difficult cases can also be corrected. This processing can be performed after step 503 or after the user confirms the difficult cases. as follows:
AI平台接收用户对难例的矫正标注。利用难例训练初始AI模型以获得优化AI模型包括:利用难例和对应的矫正标注训练初始AI模型以获得优化AI模型。The AI platform receives users' correction annotations for difficult cases. Training the initial AI model using difficult examples to obtain an optimized AI model includes: using difficult examples and corresponding correction annotations to train the initial AI model to obtain an optimized AI model.
本实施例中,在步骤503中,AI平台确定出难例之后,AI平台可以将难例提供给第一用户,提供方式为向第一用户提供一个确认界面,在确认界面中向第一用户展示未标注的多个图像中的难例。第一用户可以打开任意一个难例,然后第一用户可以主观判断该难例的标注结果是否正确。若不正确,第一用户可以对标注结果进行矫正,在矫正完成后,第一用户对该难例的矫正进行确认,AI平台则会接收到确认操作,AI平台可以确认该难例可用,且该难例的标注结果为第一用户提交的矫正标注。In this embodiment, in step 503, after the AI platform determines the difficult case, the AI platform can provide the difficult case to the first user by providing a confirmation interface to the first user. Show difficult examples in multiple unlabeled images. The first user can open any difficult case, and then the first user can subjectively judge whether the labeling result of the difficult case is correct. If it is not correct, the first user can correct the marked result. After the correction is completed, the first user confirms the correction of the difficult case, the AI platform will receive the confirmation operation, and the AI platform can confirm that the difficult case is available, and The marking result of this difficult case is the correction marking submitted by the first user.
后续在步骤504中,AI平台可以使用步骤503中确定的难例和该难例对应的矫正标注,对初始AI模型进行训练,得到优化AI模型。或者AI平台可以使用第一用户确认的难例和该难例对应的矫正标注,对初始AI模型进行训练,得到优化AI模型。这样,由于第一用户对难例的标注结果进行矫正处理,所以难例的标注结果是正确的,进而训练处的优化AI模型的推理能力更强。或者,AI平台可以使用难例的矫正标注,对初始AI模型进行训练,获得优化AI模型。Subsequently, in step 504, the AI platform may use the difficult case determined in step 503 and the corrective annotation corresponding to the difficult case to train the initial AI model to obtain an optimized AI model. Or, the AI platform may use the difficult case confirmed by the first user and the correction label corresponding to the difficult case to train the initial AI model to obtain an optimized AI model. In this way, since the first user corrects the labeling result of the difficult example, the labeling result of the difficult example is correct, and the reasoning ability of the optimized AI model of the training office is stronger. Alternatively, the AI platform can use correction annotations for difficult cases to train the initial AI model to obtain an optimized AI model.
可选的,在步骤504之后,本申请中还可以提供优化AI模型,给第二用户使用,具体可以包括两种提供方式离线提供方式和在线提供方式,以下方式一为离线提供方式,方式二为在线提供方式:Optionally, after step 504, this application may also provide an optimized AI model for the second user to use. Specifically, it may include two providing methods, offline providing methods and online providing methods. The following method one is offline providing method, and method two Ways to provide online:
方式一:将优化AI模型提供给第二用户的AI设备,以使得AI设备用优化AI模型执行任务目标。Manner 1: Provide the optimized AI model to the AI device of the second user, so that the AI device uses the optimized AI model to perform the task goal.
其中,AI设备指运行AI模型的设备,如行车记录仪等。Among them, AI equipment refers to equipment that runs AI models, such as driving recorders.
本实施例中,在第二用户通过一定的方式(如购买优化AI模型的使用权)获得优化AI模型的使用权之后,AI平台可以将优化AI模型发送至AI设备,AI设备接收到优化AI模型之后,可以在AI设备上运行优化AI模型,使AI设备使用该优化AI模型执行任务目标。例如,AI设备为行车记录仪,优化AI模型可以用于检测车道线等。In this embodiment, after the second user obtains the right to use the optimized AI model through a certain method (such as purchasing the right to use the optimized AI model), the AI platform may send the optimized AI model to the AI device, and the AI device receives the optimized AI After the model, the optimized AI model can be run on the AI device, so that the AI device can use the optimized AI model to perform mission goals. For example, the AI device is a driving recorder, and the optimized AI model can be used to detect lane lines and so on.
或者第二用户可以将优化AI模型从AI平台下载到某个设备上,然后将优化AI模型安装到AI设备上,这样,AI设备可以使用该优化AI模型执行任务目标。Or the second user can download the optimized AI model from the AI platform to a certain device, and then install the optimized AI model on the AI device, so that the AI device can use the optimized AI model to perform mission goals.
方式二:接收第二用户的设备发送的推理图像,利用优化AI模型对推理图像进行推理,并向第二用户的设备提供推理结果。Manner 2: Receive the reasoning image sent by the second user's device, use the optimized AI model to reason about the reasoning image, and provide the reasoning result to the second user's device.
本实施例中,第二用户想使用优化AI模型,可以在通过自己的设备打开AI平台,在AI平台上注册一个账号,然后使用注册的账户登录AI平台。然后第二用户可以在AI平台提供的AI模型中,找到优化AI模型,使用AI平台提供的操作指导,上传推理图像至AI平台。AI平台在接收到推理图像后,可以将推理图像输入至优化AI模型,得到推理图像的推理结果,然后向第二用户的设备发送推理结果。其中,若优化AI模型用于图像分类,则推理结果为推理图像所属的类别。若优化AI模型用于目标检测,则推理结果为推理图像中包括的目标的边界框的位置和目标所属的类别。若优化AI模型用于目标检测和图像分类,则推理结果为推理图像所属的类别,以及推理图像中包括的目标的边界框的位置和目标所属的类别。In this embodiment, if the second user wants to use the optimized AI model, he can open the AI platform through his own device, register an account on the AI platform, and then use the registered account to log in to the AI platform. Then the second user can find the optimized AI model in the AI model provided by the AI platform, and use the operation guidance provided by the AI platform to upload the reasoning image to the AI platform. After receiving the reasoning image, the AI platform can input the reasoning image to the optimized AI model to obtain the reasoning result of the reasoning image, and then send the reasoning result to the device of the second user. Among them, if the optimized AI model is used for image classification, the inference result is the category to which the inferred image belongs. If the optimized AI model is used for target detection, the inference result is the position of the bounding box of the target included in the inference image and the category to which the target belongs. If the optimized AI model is used for target detection and image classification, the inference result is the category to which the inferred image belongs, and the position of the bounding box of the target included in the inferred image and the category to which the target belongs.
这样,由于训练得到的优化AI模型,在训练过程中使用了难例,所以可以使训练得到的优化AI模型的推理能力更强。In this way, because the optimized AI model obtained by training uses difficult examples in the training process, the reasoning ability of the optimized AI model obtained by training can be made stronger.
可选的,在步骤503中确定出难例后,在将难例反馈给第一用户时,在确认界面中,还提供了一键上线选项,用户可以通过操作一键上线选项,触发AI平台自动使用难例对初始AI模型进行训练,获得优化AI模型。Optionally, after the difficult case is determined in step 503, when the difficult case is fed back to the first user, a one-click online option is also provided in the confirmation interface, and the user can trigger the AI platform by operating the one-key online option Automatically use difficult examples to train the initial AI model to obtain an optimized AI model.
可选的,在训练完成优化AI模型之后,可以将优化AI模型用于对推理图像进行推理,如图9所示,具体处理为:Optionally, after the optimized AI model is trained, the optimized AI model can be used to infer the inference image, as shown in Figure 9, the specific processing is:
步骤901,AI平台接收用户上传的多个推理图像。Step 901: The AI platform receives multiple inference images uploaded by the user.
本实施中,在训练完成优化AI模型后,用户想要使用优化AI模型对推理图像进行推理,可以在推理图像的上传界面中,上传推理图像,推理图像中包括多个推理图像(推理图像是也是未标注的图像)。此处上传多个推理图像的过程和前文中上传未标注的多个图像的过程相同,此处不再赘述。In this implementation, after the optimized AI model is trained, the user wants to use the optimized AI model to reason about the reasoning image. You can upload the reasoning image in the reasoning image upload interface. The reasoning image includes multiple reasoning images (the reasoning image is It is also an unlabeled image). The process of uploading multiple inference images here is the same as the process of uploading multiple unmarked images in the previous article, and will not be repeated here.
步骤902,AI平台向用户提供难例筛选选择界面,难例筛选选择界面中包括用户可选择的难例筛选参数。In step 902, the AI platform provides the user with a selection interface for selection of difficult cases, and the selection interface for selection of difficult cases includes difficult cases selection parameters that can be selected by the user.
本实施例中,在用户上传多个推理图像后,若用户还想对优化AI模型进行优化,可以触发显示难例筛选选择界面,在该难例筛选选择界面中可以包括用户可选择的难例筛选参数。用户可以按照推理图像和实际自身的需求选择难例筛选参数。如图10所示,难例筛选参数可以包括难例筛选方式、推理图像类型、任务目标的类型、难例输出路径信息中的一种多种。难例筛选方式可以包括按置信度、按算法。推理图像类型可以包括连续(连续用于指示多个推理图像在时序上连续)、非连续(非连续用于指示多个推理图像在时序上不连续)。任务目标的类型可以包括目标检测和图像分类。难例输出路径信息可以用于指示在推理图像中挖掘出的难例所要存储的存储位置。若多个推理图像在时序上连续(说明推理图像是视频片段),则推理图像类型选择为连续。若多个推理图像在时序上不连续(说明推理图像不是视频片段),则推理图像类型选择为非连续。若用户想要对多个推理图像进行图像分类,则可以选择任务目标类型为图像分类,若用户想要对多个推理图像进行目标检测,则可以选择任务目标类型为目标检测。In this embodiment, after the user uploads multiple inference images, if the user still wants to optimize the optimized AI model, it can trigger the display of the hard case selection interface, and the hard case selection interface may include hard cases that the user can select. Filter parameters. The user can select the difficult case screening parameters according to the reasoning image and actual needs. As shown in FIG. 10, the difficult case screening parameters may include one or more of the difficult case screening methods, the reasoning image type, the task target type, and the difficult case output path information. Difficult cases can be screened by confidence and algorithm. The inference image types can include continuous (continuous to indicate that multiple inference images are continuous in time series) and discontinuous (non-continuous to indicate that multiple inference images are not continuous in time sequence). The types of task targets can include target detection and image classification. The output path information of difficult examples can be used to indicate the storage location of the difficult examples excavated in the inference image. If multiple inference images are continuous in time sequence (indicating that the inference image is a video segment), the inference image type is selected as continuous. If multiple inference images are not continuous in time sequence (indicating that the inference image is not a video segment), the inference image type is selected as non-continuous. If the user wants to perform image classification on multiple inference images, he can select the task target type as image classification. If the user wants to perform target detection on multiple inference images, he can select the task target type as target detection.
此处需要说明的是,在推理图像类型为非连续时,难例筛选参数还包括带标注的训 练图像的存储位置信息。What needs to be explained here is that when the inferred image type is non-continuous, the hard-case screening parameters also include the storage location information of the marked training image.
步骤903,AI平台根据优化AI模型对多个推理图像进行推理,获得推理结果。Step 903: The AI platform performs inference on multiple inference images according to the optimized AI model, and obtains inference results.
本实施例中,AI平台可以将多个推理图像输入至优化AI模型中,优化AI模型则会输出多个推理图像的推理结果。若优化AI模型用于图像分类,则对于多个推理图像,输出的推理结果为图像所属的类别。若优化AI模型用于目标检测,则对于多个推理图像,输出的推理结果为每帧推理图像中包括的边界框中目标的类别以及边界框在推理图像中的位置。In this embodiment, the AI platform can input multiple inference images into the optimized AI model, and the optimized AI model will output the inference results of the multiple inference images. If the optimized AI model is used for image classification, for multiple inference images, the output inference result is the category to which the image belongs. If the optimized AI model is used for target detection, for multiple reasoning images, the output reasoning result is the target category in the bounding box included in each frame of reasoning image and the position of the bounding box in the reasoning image.
步骤904,AI平台根据推理结果和用户选择的难例筛选参数确定多个推理图像中的难例。In step 904, the AI platform determines the difficult cases in the multiple reasoning images according to the reasoning result and the difficult case screening parameters selected by the user.
本实施例中,AI平台可以使用推理结果和用户选择的难例筛选参数中的任务目标的类型和筛选方式,筛选出多个推理图像中的难例。然后将多个推理图像中的难例,通过难例筛选参数中的难例输出路径进行存储。In this embodiment, the AI platform can use the reasoning result and the type and filtering method of the task target in the difficult example filtering parameter selected by the user to filter out the difficult examples in the multiple reasoning images. Then, the difficult cases in the multiple reasoning images are stored through the difficult case output path in the difficult case filtering parameters.
步骤905,AI平台根据难例对优化AI模型进行训练,获得再优化AI模型。In step 905, the AI platform trains the optimized AI model according to the difficult cases, and obtains the re-optimized AI model.
本实施例中,由于在优化AI模型的推理过程中,还可以继续挖掘难例,对优化AI模型进行训练,获得再优化AI模型。In this embodiment, because in the reasoning process of optimizing the AI model, it is possible to continue to dig out difficult cases, train the optimized AI model, and obtain the re-optimized AI model.
可选的,在步骤904中输出难例后,AI平台还可以将难例提供给用户,使用户进一步确认是不是难例,具体处理为:Optionally, after the difficult case is output in step 904, the AI platform can also provide the difficult case to the user so that the user can further confirm whether it is a difficult case. The specific processing is as follows:
AI平台向第一用户提供确认界面,在确认界面中向第一用户展示候选难例,候选难例为多个图像中的至少一个图像。AI平台根据第一用户在确认界面上的操作,确定候选难例中的难例。The AI platform provides a confirmation interface to the first user, and displays the candidate difficult example to the first user in the confirmation interface. The candidate difficult example is at least one image among multiple images. The AI platform determines the hard cases among the candidate hard cases according to the operations of the first user on the confirmation interface.
本实施例中,第一用户可以根据推理结果和用户选择的难例筛选参数,在多个推理图像中,确定出其中包括的至少一个候选难例。然后AI平台将至少一个候选难例,提供给用户I/O模块,用户I/O模块给用户提供确认界面,在确认界面中向用户展示多个推理图像中的候选难例。用户可以打开任意一个候选难例,用户可以主观判断该难例的标注信息是否准确,若不准确,用户可以对标注信息进行修改,在修改完成后,对该候选难例的修改进行确认,AI平台则会接收到确认操作,确认该难例可用,且该难例的标注信息是用户修改后的矫正标注。或者用户主观判断难例的标注没有问题,则可以直接对该候选难例的修改进行确认,AI平台则会接收到确认操作。AI平台可以确认该难例可用,且该难例的标注为原来AI平台提供的标注。In this embodiment, the first user may determine at least one candidate difficult example included in the multiple reasoning images according to the reasoning result and the difficult case screening parameters selected by the user. Then the AI platform provides at least one candidate difficult case to the user I/O module, and the user I/O module provides the user with a confirmation interface, in which the user is shown the candidate difficult case in multiple reasoning images. The user can open any candidate difficult case, the user can subjectively judge whether the labeling information of the difficult case is accurate, if it is not accurate, the user can modify the labeling information, after the modification is completed, the modification of the candidate difficult case is confirmed, AI The platform will receive the confirmation operation to confirm that the difficult case is available, and the annotation information of the difficult case is the correction annotation after the user's modification. Or the user may subjectively judge that there is no problem with the labeling of the difficult case, and the modification of the candidate difficult case can be directly confirmed, and the AI platform will receive the confirmation operation. The AI platform can confirm that the difficult case is available, and the label of the difficult case is the label provided by the original AI platform.
此处需要说明的是,步骤503中,使用初始AI模型确定出难例的过程,具体是使用初始AI模型提取未标注的图像的特征,基于未标注的图像的特征,确定未标注的图像的标注结果,然后基于标注结果找出未标注的图像中的难例。步骤504中是基于未标注的图像中的难例继续对初始AI模型进行训练,获得优化AI模型。步骤904中,是使用优化AI模型确定难例的过程,具体是使用优化AI模型提取推理图像的特征,基于推理图像的特征,确定推理图像的推理结果,然后基于推理结果找出推理图像中的难例。步骤905中是基于推理图像中的难例继续对优化AI模型进行训练,获得再优化AI模型。可见步骤503和步骤904的处理原理相似,均是使用一个AI模型确定出未标注的图像中的难例,而且使用的AI模型的区别仅在于优化AI模型的推理能力高于初始AI模型的推理能力。而且步骤504和步骤905的处理原理相似,均是基于难例对已有的AI模型进行训练, 使获得的AI模型的推理能力优于当前的AI模型的推理能力。所以上述图5的流程和图9的流程实际上就是找出难例,优化当前的AI模型。通过该方法,AI平台可向AI模型开发者提供推理能力更强的优化AI模型,使得开发者可一键式部署AI模型,并且无须关心开发过程。What needs to be explained here is that in step 503, the initial AI model is used to determine the process of identifying difficult cases. Specifically, the initial AI model is used to extract the features of the unlabeled image, and based on the features of the unlabeled image, determine the Annotate the results, and then find difficult cases in the unannotated images based on the annotated results. In step 504, the initial AI model is continuously trained based on the difficult examples in the unlabeled image to obtain an optimized AI model. In step 904, it is the process of using the optimized AI model to determine the difficult cases. Specifically, the optimized AI model is used to extract the features of the inference image, based on the features of the inference image, determine the inference result of the inference image, and then find out the inference image based on the inference result. Hard case. In step 905, the optimized AI model is continuously trained based on the difficult examples in the reasoning image to obtain the re-optimized AI model. It can be seen that the processing principles of step 503 and step 904 are similar. Both use an AI model to identify difficult cases in unlabeled images, and the difference between the AI models used is that the reasoning ability of the optimized AI model is higher than that of the initial AI model. ability. In addition, the processing principles of step 504 and step 905 are similar, and both are based on difficult cases to train the existing AI model, so that the reasoning ability of the obtained AI model is better than the reasoning ability of the current AI model. Therefore, the above-mentioned process in Fig. 5 and Fig. 9 are actually to find difficult cases and optimize the current AI model. Through this method, the AI platform can provide AI model developers with optimized AI models with stronger reasoning capabilities, so that developers can deploy AI models with one click and do not need to care about the development process.
在上述步骤503中,确定难例的实现过程可以如下:In the above step 503, the implementation process of determining the difficult case may be as follows:
使用初始AI模型标注未标注的图像,获得未标注的图像中各图像的标注信息,判断未标注的图像是否是视频片段。若未标注的图像是视频片段,则根据未标注的图像中各图像的标注结果,确定未标注的图像中的难例。若未标注的图像中的图像不是视频片段,则根据未标注的图像中各图像的标注结果和训练图像集,确定未标注的图像中的难例。Use the initial AI model to annotate the unlabeled images, obtain the label information of each image in the unlabeled images, and determine whether the unlabeled images are video clips. If the unlabeled image is a video segment, the difficult cases in the unlabeled image are determined according to the labeling results of each image in the unlabeled image. If the image in the unlabeled image is not a video segment, then according to the labeling results of each image in the unlabeled image and the training image set, the difficult cases in the unlabeled image are determined.
本实施例中,AI平台可以使用光流法、汉明距离中任意一种或多种判断未标注的多个图像是否是难例。例如,AI平台可以通过汉明距离确定每帧图像与该图像时序上相邻的下一帧图像的距离。若该图像与该图像时序上相邻的下一帧图像的汉明距离小于一定数值,则确定该图像与该下一帧图像在时序上连续,若该汉明距离大于或等于一定数值,则确定该图像与该下一帧图像在时序上不连续。在判断出该图像与该下一帧图像在时序上连续时,还可以使用光流法再次判断该图像与该下一帧图像是否连续,若使用光流法判断出该图像与该下一帧图像连续,则最终确定该图像与该下一帧图像在时序上连续。若使用光流法判断出该图像与该下一帧图像在时序上不连续,则最终确定该图像与该下一帧图像在时序上不连续。这样,继续遍历每帧图像则会确定出未标注的多个图像是连续的图像,还是非连续的图像。若未标注的多个图像是连续的图像,则确定未标注的多个图像是视频片段,若未标注的多个图像不是连续的图像,则确定未标注的多个图像不是视频片段。此处由于使用多种方式相结合确定图像在时序上是否连续,所以确定出的结果准确率比较高。In this embodiment, the AI platform can use any one or more of the optical flow method and the Hamming distance to determine whether multiple unlabeled images are difficult cases. For example, the AI platform can use the Hamming distance to determine the distance between each frame of image and the next frame of image adjacent to that image in time series. If the Hamming distance between the image and the next frame of the image in time sequence is less than a certain value, it is determined that the image and the next frame of image are continuous in time sequence. If the Hamming distance is greater than or equal to a certain value, then It is determined that the image and the next frame of image are not continuous in time sequence. When it is judged that the image and the next frame of image are continuous in time sequence, the optical flow method can also be used to judge whether the image is continuous with the next frame of image. If the optical flow method is used to determine whether the image is continuous with the next frame If the image is continuous, it is finally determined that the image and the next frame of image are continuous in time sequence. If the optical flow method is used to determine that the image and the next frame of image are not continuous in time sequence, it is finally determined that the image and the next frame of image are not continuous in time sequence. In this way, continuing to traverse each frame of image will determine whether the multiple unlabeled images are continuous images or non-continuous images. If the multiple unlabeled images are continuous images, it is determined that the multiple unlabeled images are video clips, and if the multiple unlabeled images are not continuous images, it is determined that the multiple unlabeled images are not video clips. Here, a combination of multiple methods is used to determine whether the image is continuous in time sequence, so the accuracy of the determined result is relatively high.
在未标注的多个图像在时序上连续时,AI平台可以使用未标注的多个图像的标注结果,确定出未标注的多个图像中的难例。在未标注的多个图像在时序上不连续时,AI平台可以使用未标注的多个图像中各图像的标注结果和训练图像集,确定出未标注的图像中的难例。此处的训练图像集指训练得到初始AI模型的训练图像组成的集合。When multiple unlabeled images are continuous in time series, the AI platform can use the labeling results of the unlabeled multiple images to determine the difficult cases in the unlabeled multiple images. When multiple unlabeled images are not continuous in time sequence, the AI platform can use the labeling results of each image in the multiple unlabeled images and the training image set to determine the difficult cases in the unlabeled images. The training image set here refers to the set of training images obtained by training the initial AI model.
此处需要说明的是,图像在时序上相邻指编号相邻,例如,某帧图像编号为1,另一帧图像编号为2,则这两帧图像相邻。图像在时序上相邻还可以指上传的顺序相邻,例如,某帧图像是第一个上传,另一帧图像是第二个上传,则说明这两帧图像在时序上相邻。再例如,某帧图像是第一个上传,另一帧图像是第三个上传,则说明这两帧图像在时序上不相邻。It should be noted here that if the images are adjacent in time sequence, the numbers are adjacent. For example, if the image number of a certain frame is 1 and the image number of another frame is 2, then the two frames are adjacent. If the images are adjacent in time sequence, it can also mean that the upload sequence is adjacent. For example, if a certain frame of image is uploaded first, and another frame of image is uploaded second, it means that the two frames of images are adjacent in time sequence. For another example, if a certain frame of image is uploaded first, and another frame of image is uploaded third, it means that the two frames of images are not adjacent in time sequence.
以下将分别描述AI平台应用于图像分类的场景和目标检测的场景中,难例的确定方式:The following describes how the AI platform is applied to image classification scenarios and target detection scenarios, and how to determine difficult cases:
在AI平台用于确定一个AI模型用于图像分类的场景中时,对于未标注的多个图像为视频片段,难例的确定方式为:When the AI platform is used to determine an AI model to be used in an image classification scene, for multiple unlabeled images as video clips, the difficult way to determine is as follows:
AI平台确定未标注的多个图像中的目标图像,其中,目标图像的标注结果与目标图像在时序上相邻的图像的标注结果不相同。将目标图像确定为未标注的多个图像中的难例。The AI platform determines the target image in the multiple unlabeled images, where the labeling result of the target image is different from the labeling result of the image adjacent to the target image in time series. Determine the target image as a difficult example among the unlabeled images.
本实施例中,步骤502中输出的每帧图像的标注结果中可以包括图像所属的类别。对于任一帧图像,AI平台可以确定该图像所属的类别与相邻的帧图像所属的类别是否相同,此处相邻的帧图像指与该图像在时序上相邻的帧图像。若相同,则可以确定该图像不是难例,若不相同,说明优化AI模型对该图像的识别错误率比较高,则可以确定该图像为难例。该图像即为目标图像。In this embodiment, the annotation result of each frame of image output in step 502 may include the category to which the image belongs. For any frame of image, the AI platform can determine whether the category to which the image belongs is the same as the category to which the adjacent frame image belongs. The adjacent frame image here refers to the frame image adjacent to the image in time sequence. If they are the same, it can be determined that the image is not a difficult example. If they are not the same, it means that the optimized AI model has a relatively high recognition error rate for the image, and the image can be determined as a difficult example. This image is the target image.
此处需要说明的,对于连续的图像中,第一帧图像仅有时序上相邻的下一帧图像,对于最后一帧图像仅有时序上相邻的上一帧图像。It should be noted here that for the continuous image, the first frame of image has only the next frame of image that is adjacent in time series, and the last frame of image has only the previous frame of image that is adjacent in time series.
在AI平台应用于图像分类的场景中时,对于未标注的多个图像不是视频片段,确定难例的过程,如图11所示:When the AI platform is applied to the scene of image classification, the process of determining difficult cases for multiple unlabeled images that are not video clips is shown in Figure 11:
步骤1101,AI平台获取未标注的多个图像中各图像在各类别下的置信度,根据未标注的多个图像中各图像的最高的两个置信度,确定未标注的多个图像中各图像的第一难例值。Step 1101: The AI platform obtains the confidence level of each image in the multiple unlabeled images in each category, and determines each of the multiple unlabeled images according to the two highest confidence levels of each image in the multiple unlabeled images. The first hard case value of the image.
其中,难例值用于衡量图像是否是难例的程度,难例值越大,图像是难例的概率就越大,反之难例值越小,图像是难例的概率就越小。Among them, the hard case value is used to measure the degree of whether the image is a hard case. The larger the hard case value, the greater the probability that the image is a hard case. Conversely, the smaller the hard case value is, the smaller the probability that the image is a hard case.
本实施例中,在步骤502中,优化AI模型的输出可以包括未标注的多个图像在每个类别下的置信度。其中,优化AI模型的输出在每个类别的置信度指示该优化AI模型对输入数据进行推理后的标注结果属于每一个类别的可能性。对于未标注的多个图像中的任一图像,可以获取该图像对应的最大的两个置信度。使用这两个置信度中最大的置信度减去最小的置信度,得到这两个置信度的差值。然后获取数据存储模块中存储的置信度的差值范围与难例值的对应关系,在该对应关系中,确定这两个置信度的差值所属的置信度的差值范围对应的第一难例值。这样,对于任一图像均按照该方法,即可以确定出未标注的多个图像中各图像的第一难例值。In this embodiment, in step 502, the output of the optimized AI model may include the confidence levels of multiple unlabeled images in each category. Among them, the confidence of the output of the optimized AI model in each category indicates the possibility that the labeled result after the optimized AI model infers the input data belongs to each category. For any one of the unlabeled images, the two largest confidence levels corresponding to the image can be obtained. Use the largest confidence score minus the smallest confidence score among the two confidence scores to get the difference between the two confidence scores. Then obtain the corresponding relationship between the difference range of the confidence level stored in the data storage module and the difficult case value, and in the corresponding relationship, determine the first difficulty corresponding to the difference range of the confidence level to which the difference between the two confidence levels belongs. Example value. In this way, according to this method for any image, the first difficult example value of each image in the multiple unlabeled images can be determined.
步骤1102,AI平台获取训练图像集中训练图像的表层特征分布信息,根据表层特征分布信息和未标注的多个图像中各图像的表层特征,确定未标注的多个图像中各图像的第二难例值。Step 1102: The AI platform obtains the surface feature distribution information of the training images in the training image set, and determines the second difficulty of each image in the unlabeled multiple images based on the surface feature distribution information and the surface features of each image in the multiple unlabeled images. Example value.
本实施例中,对于未标注的多个图像中每帧图像,可以确定每帧图像的表层特征,表层特征可以包括图像的分辨率、图像的长宽比、图像的红绿蓝(Red Green Blue,RGB)的均值和方差、图像的亮度、图像的饱和度或图像的清晰度中的一种或多种。In this embodiment, for each frame of the unlabeled multiple images, the surface features of each frame can be determined. The surface features can include the resolution of the image, the aspect ratio of the image, and the red, green, and blue (Red Green Blue) of the image. , RGB) one or more of the mean and variance, the brightness of the image, the saturation of the image, or the sharpness of the image.
具体的,AI平台可以在图像的属性中获取到图像的分辨率和图像的亮度,图像的分辨率指单位英寸中所包含的像素点数,图像的亮度决定颜色空间中颜色的明暗程度。Specifically, the AI platform can obtain the resolution and brightness of the image from the attributes of the image. The resolution of the image refers to the number of pixels contained in a unit inch, and the brightness of the image determines the brightness of the color in the color space.
AI平台可以使用图像的长度除以宽度,得到图像的长宽比。The AI platform can divide the length of the image by the width to get the aspect ratio of the image.
AI平台可以使用图像中每个像素点的R、G、B,分别出确定图像的R的均值、B的均值和G的均值,即为图像的RGB的均值。AI平台然后确定图像中所有像素点的R的均值、G的均值和B的均值,计算每个像素点的R与R的均值之差的平方,计算图像中所有像素点对应的平方之和,即得到图像中R的方差,使用相同的方式,AI平台即可确定出G的方差和B的方差,R的方差、G的方差和B的方差即为图像的RGB的方差。The AI platform can use the R, G, and B of each pixel in the image to determine the average value of R, the average of B, and the average of G respectively, which is the average of RGB of the image. The AI platform then determines the average value of R, the average value of G, and the average value of B for all pixels in the image, calculates the square of the difference between the average value of R and R of each pixel, and calculates the sum of squares corresponding to all pixels in the image. That is, the variance of R in the image is obtained. Using the same method, the AI platform can determine the variance of G and B. The variance of R, G, and B are the variances of RGB of the image.
AI平台可以计算图像的饱和度,饱和度是指色彩的鲜艳程度,也称色彩的纯度。对于未标注的多个图像中的任一图像,该图像的饱和度的计算方式为:(max(R,G,B)- min(R,G,B))/max(R,G,B),max(R,G,B)表示该图像中R、G、B中的最大值,min(R,G,B)表示该图像中R、G、B中的最小值。The AI platform can calculate the saturation of the image. Saturation refers to the vividness of the color, also known as the purity of the color. For any of the unlabeled images, the saturation of the image is calculated as: (max(R,G,B)- min(R,G,B))/max(R,G,B ), max (R, G, B) represents the maximum value of R, G, B in the image, and min (R, G, B) represents the minimum value of R, G, B in the image.
AI平台还可以计算图像的清晰度,清晰度是衡量图像质量优劣的指标,可以通过Brenner梯度函数或Laplacian梯度函数等确定图像的清晰度。The AI platform can also calculate the sharpness of the image. The sharpness is an index to measure the quality of the image. The sharpness of the image can be determined by the Brenner gradient function or the Laplacian gradient function.
然后AI平台获取训练图像集中各图像的表层特征,确定每种表层特征上的图像的分布。具体的,每种表层特征上图像的分布可以使用直方图表示。如图12所示,图12(a)为图像的R的均值的直方图,横轴为R的均值、纵轴为图像的数目,训练图像集中的图像为1000个,R的均值为10~20的图像为52个,R的均值为20~30的图像为204个,R的均值为30~40的图像为320个,R的均值为40~50的图像为215,R的均值为50~60的图像为99个,R的均值为60~70的图像为69个,R的均值为70~80的图像为22,R的均值为80~90的图像为13个,R的均值为90~100的图像为5个,R的均值为100~110的图像为1个。图12(b)为图像的饱和度的直方图,横轴为饱和度、纵轴为图像的数目,训练图像集中的图像为1000个,此处不一一列出。Then the AI platform obtains the surface features of each image in the training image set, and determines the distribution of the images on each surface feature. Specifically, the distribution of images on each surface feature can be represented by a histogram. As shown in Figure 12, Figure 12(a) is a histogram of the mean value of R of the image. The horizontal axis is the mean value of R and the vertical axis is the number of images. There are 1000 images in the training image set, and the mean value of R is 10~ There are 52 images in 20, 204 images in which the average value of R is 20-30, 320 images in which the average value of R is 30-40, 215 images in which the average value of R is 40-50, and the average value of R is 50 There are 99 images from 60 to 60, 69 images with an average value of R from 60 to 70, 22 images with an average value of R from 70 to 80, 13 images with an average value of R from 80 to 90, and the average value of R There are 5 images from 90 to 100, and one image with an average value of R from 100 to 110. Figure 12(b) is a histogram of the saturation of the image, the horizontal axis is the saturation, and the vertical axis is the number of images. There are 1000 images in the training image set, which are not listed here.
然后AI平台获取存储的预设数值,对于任一表层特征上的图像的分布,将预设数值与未标注的多个图像中的图像的数目相乘,得到目标数值。AI平台将未标注的多个图像中的所有图像的该种表层特征的数值按照升序顺序进行排列,按照升序的方式找到第目标数值位置处的数值,得到该表层特征的极限值。在训练图像集中,确定该表层特征大于极限值的图像和表层特征小于或等于极限值的图像,确定表层特征大于极限值的图像的难例值为a,确定表层特征小于或等于极限值的难例值为b。例如,表层特征为图像的亮度,图像的数目为1000,预设数值为90%,目标数值为1000*90%=900,在亮度的直方图中,按照升序排列的亮度值中第900个数值为202.5,将训练图像集中亮度大于202.5的图像的难例值确定为1,将训练图像集中小于或等于202.5的图像的难例值确定为0。按照基于亮度确定难例值的方式,即可确定出每帧图像在每种表层特征下的难例值。以上仅为一种可选的实现方式,也可以使用其他方式确定出极限值。Then the AI platform obtains the stored preset value, and for the distribution of the image on any surface feature, multiplies the preset value by the number of images in the multiple unlabeled images to obtain the target value. The AI platform arranges the value of the surface feature of all images in the multiple unlabeled images in ascending order, finds the value at the target value position in ascending order, and obtains the limit value of the surface feature. In the training image set, determine the image with the surface feature greater than the limit value and the image with the surface feature less than or equal to the limit value, determine the difficult example value of the image with the surface feature greater than the limit value, and determine the difficulty of determining the surface feature less than or equal to the limit value The example value is b. For example, the surface feature is the brightness of the image, the number of images is 1000, the preset value is 90%, and the target value is 1000*90%=900. In the brightness histogram, the 900th value among the brightness values arranged in ascending order If the value is 202.5, the hard case value of images in the training image set with a brightness greater than 202.5 is determined to be 1, and the hard case value of images in the training image set less than or equal to 202.5 is determined to be 0. According to the method of determining the hard case value based on the brightness, the hard case value of each frame image under each surface feature can be determined. The above is only an optional implementation method, and other methods may also be used to determine the limit value.
这样,对于未标注的多个图像中各图像中的每种表层特征,均可以确定出一个难例值,然后获取每种表层特征对应的权重。对于未标注的多个图像中的每帧图像,将该图像中的每种表层特征的难例值与该表层特征对应的权重相乘,得到每种表层特征对应的一个数值。然后AI平台将所有表层特征对应的数值相加,得到该图像的第二难例值。In this way, for each surface feature in each image in a plurality of unlabeled images, a difficult case value can be determined, and then the weight corresponding to each surface feature can be obtained. For each frame of the unlabeled multiple images, the hard-case value of each surface feature in the image is multiplied by the weight corresponding to the surface feature to obtain a value corresponding to each surface feature. Then the AI platform adds the values corresponding to all the surface features to obtain the second hard case value of the image.
此处需要说明的是,对于不同的表层特征,权重可以不相同,所有表层特征的权重之和等于1。例如,图像的亮度和图像的清晰度的权重要大于图像的长宽比的权重。It should be noted here that for different surface features, the weights can be different, and the sum of the weights of all surface features is equal to 1. For example, the brightness of the image and the definition of the image are more important than the aspect ratio of the image.
另外需要说明的是,上述每种表层特征对应的预设数值,也可以不相同。上述步骤1102中是AI平台确定每帧图像的表层特征,在实际处理时,也可以是用户直接上传训练图像集和未标注的多个图像中各图像的表层特征,存储在数据存储模块。AI平台在使用时,从数据存储模块中,获取未标注的多个图像中各图像的表层特征。In addition, it should be noted that the preset values corresponding to each of the above-mentioned surface features may also be different. In the above step 1102, the AI platform determines the surface features of each frame of image. In actual processing, the user can directly upload the training image set and the surface features of each image in the multiple unlabeled images, and store them in the data storage module. When the AI platform is in use, it obtains the surface features of each of the multiple unlabeled images from the data storage module.
步骤1103,AI平台使用第一特征提取模型,分别提取训练图像集中各图像的深层特征和未标注的多个图像中各图像的深层特征,根据训练图像集中各图像的深层特征,对训练图像集中各图像进行聚类处理,得到图像聚类结果;根据未标注的多个图像中各图像的深层特征、图像聚类结果和未标注的多个图像中各图像的标注结果,确定未标注的多个图像中各图像的第三难例值。 Step 1103, the AI platform uses the first feature extraction model to extract the deep features of each image in the training image set and the deep features of each image in the unlabeled multiple images. According to the deep features of each image in the training image set, the training image is concentrated Perform clustering processing on each image to obtain the image clustering result; according to the deep features of each image in the unlabeled multiple images, the image clustering result, and the labeling result of each image in the unlabeled multiple images, determine the unlabeled multiple The third difficult example value of each image in each image.
本实施例中,如图13所示,AI平台可以获取第一特征提取模型,第一特征提取模型可以是CNN,然后AI平台将训练图像集中各图像输入至第一特征提取模型,确定出各图像的深层特征。AI平台还可以将未标注的多个图像中各图像也输入至第一特征提取模型,确定出各图像的深层特征。每帧图像的深层特征可以使用一个一维数组表示,且每帧图像的深层特征的一维数组的维度相等。In this embodiment, as shown in Figure 13, the AI platform can obtain the first feature extraction model, which can be CNN, and then the AI platform inputs each image in the training image set to the first feature extraction model to determine each The deep features of the image. The AI platform can also input each image of the multiple unlabeled images to the first feature extraction model to determine the deep features of each image. The deep features of each frame of image can be represented by a one-dimensional array, and the dimensions of the one-dimensional array of deep features of each frame of image are equal.
然后AI平台可以将训练图像集中各图像的深层特征,输入至聚类算法(聚类算法可以是任何一种聚类算法,如K-means聚类算法等)中,得到图像聚类结果。图像聚类结果中包括多个图像组,每帧图像组中包括一个或多个图像。Then the AI platform can input the deep features of each image in the training image set into the clustering algorithm (the clustering algorithm can be any clustering algorithm, such as K-means clustering algorithm, etc.) to obtain the image clustering result. The image clustering result includes multiple image groups, and each frame of the image group includes one or more images.
对于每帧图像组,AI平台可以确定该图像组中每帧图像的第i维的数值的平均值。例如,图像组中包括3个图像,每帧图像的深层特征使用三维数组表示,3个图像的深层特征依次为(1,2,5)、(4,2,4)、(4,8,9),第1维的数值的平均值为3,第2维的数值的平均值为4,第3维的数值的平均值为6,这样,该图像组中的中心为(3,4,6)。这样,依照此方式可以确定每帧图像组的中心。For each frame of image group, the AI platform can determine the average value of the i-th dimension of each frame of image in the image group. For example, the image group includes 3 images, and the deep features of each frame are represented by a three-dimensional array. The deep features of the 3 images are (1, 2, 5), (4, 2, 4), (4, 8, 9), the average value of the first dimension is 3, the average of the second dimension is 4, and the average of the third dimension is 6, so the center of the image group is (3, 4, 6). In this way, the center of each frame of the image group can be determined in this way.
对于未标注的多个图像中的任一图像,AI平台可以确定该图像的深层特征与图像聚类结果中每帧图像组的中心的距离,具体处理是计算该图像与中心的欧式距离,公式可以是:d=√(∑_(i=1)^N(x_1i-x_2i)^2)。其中,i为深层特征中的任一维度,N为深层特征中的总维度数。x_1i为该图像的深层特征中的第i维,x_2i为中心的深层特征中的第i维。将距离最小的图像组,确定为该图像所属的图像组(此处的过程可以认为是未标注的多个图像的聚类结果)。判断该图像与图像组中图像的类别是否相同。若相同,则确定难例值为a,该图像的第三难例值为a。若不相同,则确定难例值为b,该图像的第三难例值为b。同样,将未标注的多个图像中的任一图像聚类到已有的帧图像组,也可以使用K-means聚类方法,确定任一图像所属的帧图像组。另外也可以使用其他方式进行聚类。For any of the unlabeled images, the AI platform can determine the distance between the deep features of the image and the center of each frame of the image group in the image clustering result. The specific processing is to calculate the Euclidean distance between the image and the center, the formula It can be: d=√(∑_(i=1)^N(x_1i-x_2i)^2). Among them, i is any dimension in the deep features, and N is the total number of dimensions in the deep features. x_1i is the i-th dimension in the deep features of the image, and x_2i is the i-th dimension in the deep features of the center. The image group with the smallest distance is determined as the image group to which the image belongs (the process here can be considered as a clustering result of multiple unlabeled images). Determine whether the image is the same category as the image in the image group. If they are the same, it is determined that the hard case value is a, and the third hard case value of the image is a. If they are not the same, it is determined that the hard case value is b, and the third hard case value of the image is b. Similarly, to cluster any image among the unlabeled images into an existing frame image group, the K-means clustering method can also be used to determine the frame image group to which any image belongs. In addition, other methods can also be used for clustering.
步骤1104,AI难例值根据第一难例值、第二难例值和第三难例值中的一个或多个,确定未标注的多个图像各图像的目标难例值。In step 1104, the AI hard case value determines the target hard case value of each image of the multiple unlabeled images according to one or more of the first hard case value, the second hard case value, and the third hard case value.
本实施例中,对于未标注的多个图像中任一图像,AI平台可以使用该图像的第一难例值、第二难例值和第三难例值中的一个或多个,确定出该图像的目标难例值。具体可以是AI平台可以将第一难例值确定为目标难例值,也可以将第二难例值确定为目标难例值,还可以是将第一难例值和第二难例值加权后,得到目标难例值,还可以是将第一难例值和第三难例值加权后,得到目标难例值,还可以将第二难例值和第三难例值加权后,得到目标难例值,还可以将第一难例值、第二难例值和第三难例值加权后,得到目标难例值。In this embodiment, for any one of the unlabeled images, the AI platform can use one or more of the first, second, and third difficult values of the image to determine The target difficulty value of the image. Specifically, the AI platform can determine the first hard case value as the target hard case value, or the second hard case value as the target hard case value, or weight the first hard case value and the second hard case value. After that, the target hard case value is obtained, or the target hard case value can be obtained after the first hard case value and the third hard case value are weighted, and the second hard case value and the third hard case value can also be weighted to obtain For the target hard case value, the first hard case value, the second hard case value, and the third hard case value may be weighted to obtain the target hard case value.
在同时使用第一难例值、第二难例值和第三难例值时,由于同时考虑三种层面的难例值,所以确定出的目标难例值更准确。When using the first, second, and third difficulty values at the same time, since the three levels of difficulty values are considered at the same time, the determined target difficulty value is more accurate.
步骤1105,AI平台将未标注的多个图像中目标难例值最大的第一数目个图像,确定为未标注的多个图像中的难例。Step 1105: The AI platform determines the first number of images with the largest target difficulty value among the unlabeled images as the difficult examples in the unlabeled images.
其中,第一数目可以预设,存储在AI平台的数据存储模块中。Among them, the first number can be preset and stored in the data storage module of the AI platform.
本实施例中,AI平台可以按照目标难例值从大到小的顺序,将未标注的多个图像进行排序,选取排序最前的第一数目个图像,确定为未标注的多个图像中的难例。In this embodiment, the AI platform can sort the unlabeled images in the descending order of the target difficult case value, select the first number of images that are ranked first, and determine it as one of the unlabeled images. Hard case.
在AI平台应用于目标检测的场景中时,对于未标注的多个图像是视频片段,如图14所示,难例的确定方式为:When the AI platform is applied to the scene of target detection, the multiple unlabeled images are video clips, as shown in Figure 14, the difficult example of determining method is:
步骤1401,AI平台对于未标注的多个图像中第一图像的第一目标框,在与第一图像在时序上的间隔小于或等于第二数目的图像中,确定与第一目标框的相似度最高的追踪框。Step 1401: For the first target frame of the first image among the unlabeled images, the AI platform determines that the first target frame is similar to the first target frame in the images whose time-series interval with the first image is less than or equal to the second number The highest tracking frame.
其中,第二数目可以预设,如2等。Among them, the second number can be preset, such as 2 and so on.
本实施例中,未标注的多个图像中的任一图像可以称为是第一图像,第一图像中的任一边界框可以称为是第一目标框。AI平台可以确定第一图像在时序上的间隔小于或等于第二数目的图像。例如,第一图像为第5帧图像,第二数目为2,那么第一图像在时序上的间隔小于或等于第二数目的图像为第3帧图像、第4帧图像、第6帧图像和第7帧图像。这样,在第二数目大于或等于2时,不仅考虑相邻的一帧图像,而是考虑了相邻的多帧图像,所以可以提升误检和漏检的判断精度。In this embodiment, any one of the unlabeled images may be referred to as the first image, and any bounding box in the first image may be referred to as the first target frame. The AI platform can determine that the time sequence of the first image is less than or equal to the second number of images. For example, if the first image is the 5th frame image and the second number is 2, then the images whose time sequence interval of the first image is less than or equal to the second number are the 3rd frame image, the 4th frame image, the 6th frame image and The 7th frame image. In this way, when the second number is greater than or equal to 2, not only an adjacent frame of image is considered, but also adjacent frames of images are considered, so the judgment accuracy of false detection and missed detection can be improved.
AI平台可以获取第一图像在时序上的间隔小于或等于第二数目的图像中的多个边界框,然后确定这多个边界框与第一目标框的相似度。具体可以是:对于每个边界框,计算该边界框的面积与第一目标框的面积的差值的第一绝对值,计算该边界框的长度与第一目标框的长度的差值的第二绝对值,计算该边界框的宽度与第一目标框的宽度的差值的第三绝对值。将第一绝对值与面积对应的权重相乘,得到第一权值,将第二绝对值与长度对应的权重相乘,得到第二权值,将第三绝对值与宽度对应的权重相乘,得到第三权值。将第一权值、第二权值和第三权值相加,得到第一目标框与该边界框的相似度。此处需要说明的是,第一权重、第二权重和第三权重之和等于1,且第二权重和第三权重可以相等。The AI platform may acquire multiple bounding boxes in the images whose time-series interval of the first image is less than or equal to the second number, and then determine the similarity between the multiple bounding boxes and the first target box. Specifically, for each bounding box, calculate the first absolute value of the difference between the area of the bounding box and the area of the first target box, and calculate the first absolute value of the difference between the length of the bounding box and the length of the first target box. Two absolute values, calculating the third absolute value of the difference between the width of the bounding box and the width of the first target box. Multiply the first absolute value with the weight corresponding to the area to get the first weight, multiply the second absolute value with the weight corresponding to the length to get the second weight, and multiply the third absolute value with the weight corresponding to the width , Get the third weight. The first weight, the second weight, and the third weight are added to obtain the similarity between the first target box and the bounding box. It should be noted here that the sum of the first weight, the second weight, and the third weight is equal to 1, and the second weight and the third weight may be equal.
AI平台可以在上述多个边界框中,确定与第一目标框的相似度最高的边界框。然后将相似度最高的边界框,确定为第一目标框对应的追踪框。这样,由于考虑了相邻的多帧图像中最相似的边界框,所以可以避免目标由于运动而丢失。The AI platform may determine the bounding box with the highest similarity to the first target box in the above multiple bounding boxes. Then the bounding box with the highest similarity is determined as the tracking box corresponding to the first target box. In this way, since the most similar bounding box in the adjacent multi-frame images is considered, it is possible to prevent the target from being lost due to motion.
步骤1402,AI平台根据追踪框、在与第一图像在时序上的间隔小于或等于第二数目的图像中的所有的边界框和第一目标框,确定第一目标框与各边界框的重叠率。In step 1402, the AI platform determines the overlap between the first target frame and each bounding box according to the tracking frame, all the bounding boxes and the first target frame in the images whose time-series interval from the first image is less than or equal to the second number rate.
本实施例中,AI平台可以使用如下公式,确定第一目标框与各边界框的重叠率:In this embodiment, the AI platform can use the following formula to determine the overlap ratio of the first target box and each bounding box:
overlap=max(iou(curbox,bbox),iou(trackedbox,bbox))      (1)overlap=max(iou(curbox,bbox),iou(trackedbox,bbox)) (1)
其中,overlap指第一目标框与边界框的重叠率。第一目标框使用curbox表示,边界框使用bbox表示,iou(curbox,bbox)表示第一目标框和边界框的交并比(intersection over union,iou)。第一目标框的追踪框使用trackedbox表示,iou(trackedbox,bbox)表示追踪框与边界框的交并比。overlap等于这两个交并比中的最大值。其中,第一目标框和边界框的交并比等于第一目标框与边界框的交集的面积,与第一目标框与边界框的并集的面积的比值。同理,追踪框和边界框的交并比等于追踪框与边界框的交集的面积,与追踪框与边界框的并集的面积的比值。Wherein, overlap refers to the overlap rate of the first target box and the bounding box. The first target box is represented by curbox, the bounding box is represented by bbox, and iou (curbox, bbox) represents the intersection over union (iou) of the first target box and the bounding box. The tracking box of the first target box is represented by trackedbox, and iou (trackedbox, bbox) represents the intersection ratio of the tracking box and the bounding box. The overlap is equal to the maximum value of these two intersections. Wherein, the intersection ratio of the first target box and the bounding box is equal to the ratio of the area of the intersection of the first target box and the bounding box to the area of the union of the first target box and the bounding box. In the same way, the intersection ratio of the tracking box and the bounding box is equal to the ratio of the area of the intersection of the tracking box and the bounding box to the area of the union of the tracking box and the bounding box.
步骤1403,若存在重叠率大于第二数值的边界框,则AI平台将重叠率大于第二数值的边界框,确定为第一目标框对应的相似框;若不存在重叠率大于第二数值的边界框,则确定未存在第一目标框对应的相似框。Step 1403: If there is a bounding box with an overlap rate greater than the second value, the AI platform will determine the bounding box with an overlap rate greater than the second value as a similar box corresponding to the first target frame; if there is no bounding box with an overlap rate greater than the second value Bounding box, it is determined that there is no similar box corresponding to the first target box.
其中,第二数值可以预设,且存储在数据存储模块中。如第二数值为0.5等。Among them, the second value can be preset and stored in the data storage module. For example, the second value is 0.5 and so on.
本实施例中,AI平台在确定第一目标框与各边界框的重叠率后,可以判断每个重叠率与第二数值的大小。若第一目标框与某个边界框的重叠率大于第二数值,则确定该边界框为第一目标框对应的相似框。此处重叠率大于第二数值的边界框有可能存在多个,那么第一目标框有可能对应多个相似框。In this embodiment, after determining the overlap rate of the first target frame and each bounding box, the AI platform can determine the magnitude of each overlap rate and the second value. If the overlap ratio between the first target frame and a certain bounding box is greater than the second value, it is determined that the bounding box is a similar frame corresponding to the first target frame. Here, there may be multiple bounding boxes with an overlap ratio greater than the second value, and then the first target box may correspond to multiple similar boxes.
若第一目标框与任一边界框的重叠率均小于或等于第二数值,则确定第一图像在时序上的间隔小于或等于第二数目的图像中不存在第一目标框对应的相似框。If the overlap ratio of the first target frame and any bounding box is less than or equal to the second value, it is determined that there is no similar frame corresponding to the first target frame in the images whose time-series interval of the first image is less than or equal to the second number .
步骤1404,若未存在第一目标框对应的相似框,则AI平台将第一目标框确定为难例框。Step 1404: If there is no similar frame corresponding to the first target frame, the AI platform determines the first target frame as a difficult case frame.
本实施例中,在步骤1403中,确定出未存在第一目标框对应的相似框,说明第一目标框为突然出现的框,可以认为是误检框。AI平台可以将第一目标框,确定为难例框。In this embodiment, in step 1403, it is determined that there is no similar frame corresponding to the first target frame, indicating that the first target frame is a frame that appears suddenly, which can be regarded as a false detection frame. The AI platform can determine the first target frame as a difficult case frame.
步骤1405,若存在第一目标框对应的相似框,且第一目标框所属的第一图像和相似框所属的第二图像在时序上不相邻,则AI平台根据第一目标框与相似框,确定第一图像和第二图像之间的图像中的难例框。Step 1405: If there is a similar frame corresponding to the first target frame, and the first image to which the first target frame belongs and the second image to which the similar frame belongs are not adjacent in time sequence, the AI platform determines whether the first target frame and the similar frame , Determine the difficult frame in the image between the first image and the second image.
本实施例中,在步骤1403中,确定出存在第一目标框对应的相似框,AI平台可以判断相似框与第一目标框所在的图像是否在时序上相邻。若在时序上相邻,则说明不存在漏检框。若在时序上不相邻,则说明有框突然消失,存在漏检框,AI平台可以使用相似框与第一目标框之间进行滑动平均,标记出第一图像和第二图像之间的图像中的漏检框,该漏检框即为第一图像和第二图像之间的图像中的难例框。这样是采用少数服从多数的原则,将连续帧图像中的漏检框和误检框标记处理。In this embodiment, in step 1403, it is determined that there is a similar frame corresponding to the first target frame, and the AI platform can determine whether the similar frame and the image where the first target frame is located are adjacent in time sequence. If they are adjacent in time sequence, it means that there is no missing frame. If they are not adjacent in time sequence, it means that the frame suddenly disappears and there is a missing frame. The AI platform can use the sliding average between the similar frame and the first target frame to mark the image between the first image and the second image The missed frame in, the missed frame is the difficult frame in the image between the first image and the second image. In this way, the principle that the minority obeys the majority is adopted, and the missed and falsely detected frames in the continuous frame image are marked.
步骤1405中滑动平均的处理可以为:一般边界框是矩形,且使用边界框的左上角和右下角的位置坐标标记边界框在所属图像中的位置。该位置坐标指的是在图像中的位置坐标。AI平台可以使用第一目标框的左上角的横坐标减去相似框的左上角的横坐标,得到横坐标差值,将横坐标差值乘以x/(n+1)(其中,n等于第一目标框所属图像与相似框所属图像之间的图像的数目,x为第一目标框所属图像与相似框所属图像之间的第x个图像)。将第一目标框所属图像与相似框所属图像中时序靠前的框的左上角的横坐标与横坐标差值乘以x/(n-1)的数值相加,得到第一目标框所属图像与相似框所属图像之间的第x个图像的难例框的左上角的横坐标。同理可以得到第一目标框所属图像与相似框所属图像之间的第x个图像的难例框的左上角的纵坐标,以及第一目标框所属图像与相似框所属图像之间的第x个图像中难例框的右下角的位置坐标。The moving average processing in step 1405 may be: the general bounding box is a rectangle, and the position coordinates of the upper left corner and the lower right corner of the bounding box are used to mark the position of the bounding box in the image to which it belongs. The position coordinates refer to the position coordinates in the image. The AI platform can use the abscissa of the upper left corner of the first target frame to subtract the abscissa of the upper left corner of the similar frame to obtain the abscissa difference, and multiply the abscissa difference by x/(n+1) (where n is equal to The number of images between the image to which the first target frame belongs and the image to which the similar frame belongs, and x is the xth image between the image to which the first target frame belongs and the image to which the similar frame belongs). Add the difference between the abscissa and abscissa of the upper-left corner of the image to which the first target frame belongs and the image to which the similar frame belongs, multiplying the abscissa and abscissa by x/(n-1) to obtain the image to which the first target frame belongs The abscissa of the upper left corner of the difficult example frame of the xth image between the image to which the similar frame belongs. In the same way, the ordinate of the upper left corner of the difficult case frame of the xth image between the image to which the first target frame belongs and the image to which the similar frame belongs, and the xth coordinate between the image to which the first target frame belongs and the image to which the similar frame belongs can be obtained. The position coordinates of the lower right corner of the difficult box in each image.
步骤1406,根据未标注的多个图像中各图像的难例框的数目,确定未标注的多个图像中的难例。Step 1406: Determine the difficult cases in the multiple unlabeled images according to the number of difficult cases in each image in the multiple unlabeled images.
本实施例中,基于步骤1401至步骤1405的处理,可以确定出未标注的多个图像中各图像的难例框的数目,然后AI平台可以将难例框的数目超过第三数目的图像,确定为未标注的多个图像中的难例。In this embodiment, based on the processing from step 1401 to step 1405, the number of difficult cases in each image in the unlabeled multiple images can be determined, and then the AI platform can make the number of difficult cases exceed the third number of images, Determined as a difficult case among multiple unlabeled images.
在AI平台应用于目标检测的场景中时,对于未标注的多个图像不是视频片段,如图15所示,难例的确定方式为:When the AI platform is applied to the scene of target detection, the unlabeled multiple images are not video clips, as shown in Figure 15, the difficult example of determining method is:
步骤1501,AI平台获取训练图像集中图像的表层特征分布信息,根据训练图像集中 图像的表层特征分布信息和未标注的多个图像的表层特征,确定未标注的多个图像中各图像的第四难例值。In step 1501, the AI platform obtains the surface feature distribution information of the images in the training image set, and determines the fourth of each image in the unlabeled multiple images according to the surface feature distribution information of the images in the training image set and the surface features of multiple unlabeled images. Hard case value.
本实施例中,对于未标注的多个图像中每帧图像,可以确定每帧图像的表层特征,表层特征可以包括图像的表层特征和边界框的表层特征。图像的表层特征可以包括图像的分辨率、图像的长宽比、图像的RGB的均值和方差、图像的亮度、图像的饱和度或图像的清晰度、单帧图像中框的数目或单帧图像中框的面积的方差中的一种或多种。边界框的表层特征可以包括单帧图像中每个边界框的长宽比、单帧图像中每个边界框的面积占图像面积的比例、单帧图像中每个边界框的边缘化程度、单帧图像中每个边界框的堆叠图、单帧图像中每个边界框的亮度或单帧图像中每个边界框的模糊度中的一种或多种。In this embodiment, for each frame of the unlabeled multiple images, the surface feature of each frame of image can be determined, and the surface feature may include the surface feature of the image and the surface feature of the bounding box. The surface characteristics of the image can include the resolution of the image, the aspect ratio of the image, the mean and variance of the RGB of the image, the brightness of the image, the saturation of the image or the sharpness of the image, the number of frames in a single frame image, or a single frame image One or more of the variances of the area of the middle frame. The surface characteristics of the bounding box can include the aspect ratio of each bounding box in a single frame image, the ratio of the area of each bounding box in a single frame image to the image area, the degree of marginalization of each bounding box in a single frame image, and the single frame image. One or more of the stacked image of each bounding box in the frame image, the brightness of each bounding box in the single frame image, or the blur degree of each bounding box in the single frame image.
具体的,AI平台确定未标注的多个图像中各图像的表层特征中的图像的分辨率、图像的长宽比、图像的RGB的均值和方差、图像的亮度、图像的饱和度或图像的清晰度的方式,可以参见步骤1102中的处理,此处不再赘述。Specifically, the AI platform determines the resolution of the image, the aspect ratio of the image, the mean and variance of the RGB of the image, the brightness of the image, the saturation of the image, or the image's surface characteristics in the surface features of each image in the unlabeled multiple images. For the way of definition, please refer to the processing in step 1102, which will not be repeated here.
AI平台可以确定每帧图像中框的数目。The AI platform can determine the number of frames in each frame of the image.
AI平台可以确定每帧图像中各个框的面积,然后计算出每帧图像中所有框的面积的均值,然后将每个边界框的面积减去均值后,进行平方,得到每个边界框对应的一个数值,将每个边界框对应的数值相加,即得到单帧图像中框的面积的方差。The AI platform can determine the area of each frame in each frame of image, and then calculate the average value of the area of all frames in each frame of image, and then subtract the average value from the area of each bounding box, and then square it to obtain the corresponding value of each bounding box. A value, the value corresponding to each bounding box is added to obtain the variance of the area of the box in a single frame of image.
AI平台可以计算每帧图像中每个边界框的长宽比。AI平台可以计算每帧图像中每个边界框的面积占图像面积的比例。The AI platform can calculate the aspect ratio of each bounding box in each frame of image. The AI platform can calculate the ratio of the area of each bounding box to the image area in each frame of image.
AI平台可以计算单帧图像中每个边界框的边缘化程度,具体处理是:对于某帧图像中的任一边界框,计算该边界框的中心的横坐标与图像的中心的横坐标的差值的绝对值(称为是横坐标差值)、该边界框的中心的纵坐标与图像的中心的纵坐标的差值的绝对值(称为是纵坐标差值),计算横坐标差值与该图像的长度的第一比值,并计算纵坐标差值与该图像的宽度的第二比值,(第一比值,第二比值)反映该边界框的边缘化程度,一般是第一比值和第二比值越大,边缘化越严重。The AI platform can calculate the degree of marginalization of each bounding box in a single frame of image. The specific processing is: for any bounding box in a frame of image, calculate the difference between the abscissa of the center of the bounding box and the abscissa of the center of the image The absolute value of the value (called the abscissa difference), the absolute value of the difference between the ordinate of the center of the bounding box and the ordinate of the center of the image (called the ordinate difference), calculate the abscissa difference The first ratio to the length of the image, and the second ratio of the ordinate difference to the width of the image is calculated, (the first ratio, the second ratio) reflects the degree of marginalization of the bounding box, generally the first ratio and The larger the second ratio, the more serious the marginalization.
AI平台可以计算单帧图像中每个边界框的堆叠程度,具体处理是:对于某帧图像中的任一边界框,计算该边界框与该图像中其余边界框的交集的面积,将交集的面积分别与该边界框的面积相比,再相加,得到该图像中该边界框的堆叠层度。The AI platform can calculate the stacking degree of each bounding box in a single frame image. The specific processing is: for any bounding box in a certain frame of image, calculate the area of the intersection of the bounding box and the rest of the bounding boxes in the image, and the intersection The area is respectively compared with the area of the bounding box, and then added to obtain the stacking level of the bounding box in the image.
AI平台可以计算单帧图像中每个边界框的亮度,具体处理是:对于某帧图像中的任一边界框,计算该边界框中的像素点的R的均值的平方、G的均值的平方和B的均值的平方。然后将R的均值的平方与0.241相乘,得到乘积a,将G的均值的平方与0.691相乘,得到乘积b,将B的均值的平方与0.068相乘,得到乘积c。将乘积a、乘积b和乘积c相加后,开平方得到该边界框的亮度。使用公式表示可以如下:The AI platform can calculate the brightness of each bounding box in a single frame image. The specific processing is: For any bounding box in a certain frame of image, calculate the square of the mean value of R and the square of the mean value of G of the pixels in the bounding box And the square of the mean of B. Then multiply the square of the mean value of R by 0.241 to get the product a, multiply the square of the mean value of G by 0.691 to get the product b, and multiply the square of the mean value of B by 0.068 to get the product c. After the product a, the product b, and the product c are added, the square root is used to obtain the brightness of the bounding box. The formula can be expressed as follows:
亮度=√(0.241*(R -)^2+0.691*(G -)^2+0.068*(B -)^2)      (2) Brightness=√(0.241*(R - )^2+0.691*(G - )^2+0.068*(B - )^2) (2)
AI平台可以计算单帧图像中每个边界框的模糊度,具体处理是:对于某帧图像中的任一边界框,使用拉普拉斯算子对该边界框进行滤波,得到边缘值,将边缘值求方差,即得到该边界框的模糊度。需要说明的是,求方差得到的值越大,表示边界框越清晰。另外,此处确定框的模糊度仅为一种举例的形式,凡是可以用于确定边界框的模糊度,均可以应用于本实施例中。The AI platform can calculate the ambiguity of each bounding box in a single frame of image. The specific processing is: for any bounding box in a frame of image, use the Laplacian to filter the bounding box to obtain the edge value, and Find the variance of the edge value to get the ambiguity of the bounding box. It should be noted that the larger the value obtained by calculating the variance, the clearer the bounding box. In addition, the ambiguity of the determination box here is only an example form, and any ambiguity that can be used to determine the bounding box can be applied to this embodiment.
然后AI平台获取训练图像集中训练图像的表层特征,确定每种表层特征上的图像的 分布(处理与获取未标注的多个图像的表层特征相同,可参见未标注的多个图像的表层特征),具体的,每种表层特征上图像的分布可以使用直方图表示。Then the AI platform obtains the surface features of the training images in the training image set, and determines the distribution of the images on each surface feature (processing is the same as obtaining the surface features of multiple unlabeled images, see the surface features of multiple unlabeled images) Specifically, the distribution of images on each surface feature can be represented by a histogram.
然后AI平台获取存储的预设数值,对于任一表层特征上的图像的分布,将预设数值与未标注的多个图像中的图像的数目相乘,得到目标数值。AI平台将未标注的多个图像中的所有图像的该种表层特征的数值按照升序顺序进行排列,按照升序的方式找到第目标数值位置处的数值,得到该表层特征的极限值。在训练图像集中,确定该表层特征大于极限值的图像和表层特征小于或等于极限值的图像,确定表层特征大于极限值的图像的难例值为a,确定表层特征小于或等于极限值的难例值为b。这样,对于未标注的多个图像中各图像中的每种表层特征,均可以确定出一个难例值,然后获取每种表层特征对应的权重。对于未标注的多个图像中的每帧图像,AI平台将该图像的边界框的表层特征的难例值与该表层特征对应的权重相乘,得到边界框的每种表层特征对应的一个数值,然后AI平台将边界框的所有表层特征对应的数值相加,得到该图像的边界框的难例值。AI平台将该图像的图像的表层特征的难例值与该表层特征对应的权重相乘,得到图像的每种表层特征对应的一个数值,然后AI平台将图像的所有表层特征对应的数值相加,得到该图像的难例值。然后AI平台将边界框的难例值与图像的难例值进行加权(边界框的难例值与图像的难例值的权重之和等于1),得到图像的第四难例值。Then the AI platform obtains the stored preset value, and for the distribution of the image on any surface feature, multiplies the preset value by the number of images in the multiple unlabeled images to obtain the target value. The AI platform arranges the value of the surface feature of all images in the multiple unlabeled images in ascending order, finds the value at the target value position in ascending order, and obtains the limit value of the surface feature. In the training image set, determine the image with the surface feature greater than the limit value and the image with the surface feature less than or equal to the limit value, determine the difficult example value of the image with the surface feature greater than the limit value a, and determine the difficulty of determining the surface feature less than or equal to the limit value The example value is b. In this way, for each surface feature in each image in a plurality of unlabeled images, a difficult case value can be determined, and then the weight corresponding to each surface feature can be obtained. For each frame of the unlabeled images, the AI platform multiplies the hard-case value of the surface feature of the image's bounding box with the weight corresponding to the surface feature to obtain a value corresponding to each surface feature of the bounding box , And then the AI platform adds up the values corresponding to all the surface features of the bounding box to obtain the difficult value of the bounding box of the image. The AI platform multiplies the difficult value of the surface feature of the image by the weight corresponding to the surface feature to obtain a value corresponding to each surface feature of the image, and then the AI platform adds the values corresponding to all the surface features of the image , Get the difficult example value of the image. Then the AI platform weights the hard case value of the bounding box and the hard case value of the image (the sum of the weight of the hard case value of the bounding box and the hard case value of the image is equal to 1) to obtain the fourth hard case value of the image.
此处需要说明的是,对于不同的表层特征,权重可以不相同,所有表层特征的权重之和等于1。例如,图像的亮度和图像的清晰度的权重要大于图像的长宽比的权重。It should be noted here that for different surface features, the weights can be different, and the sum of the weights of all surface features is equal to 1. For example, the brightness of the image and the definition of the image are more important than the aspect ratio of the image.
另外需要说明的是,上述每种表层特征对应的预设数值,也可以不相同。上述步骤1010中是AI平台确定每帧图像的表层特征,在实际处理时,也可以是用户直接上传未标注的多个图像中各图像的表层特征,存储在数据存储模块。AI平台在使用时,从数据存储模块中,获取未标注的多个图像中各图像的表层特征。In addition, it should be noted that the preset values corresponding to each of the above-mentioned surface features may also be different. In the above step 1010, the AI platform determines the surface features of each frame of image. In actual processing, the user can directly upload the surface features of each of the multiple unlabeled images and store them in the data storage module. When the AI platform is in use, it obtains the surface features of each of the multiple unlabeled images from the data storage module.
步骤1502,AI平台使用第二特征提取模型,分别提取训练图像集中各图像中每个边界框的深层特征和未标注的多个图像中各图像中每个边界框的深层特征,根据训练图像集中各图像中每个边界框的深层特征,对训练图像集中各图像中每个边界框进行聚类处理,得到框聚类结果;根据未标注的多个图像中各图像中每个边界框的深层特征、框聚类结果和未标注的多个图像中各图像中每个边界框的推理结果,确定未标注的多个图像中各图像的第五难例值。In step 1502, the AI platform uses the second feature extraction model to extract the deep features of each bounding box in each image in the training image set and the deep features of each bounding box in each image in the multiple unlabeled images, according to the training image set The deep features of each bounding box in each image are clustered for each bounding box in each image in the training image set to obtain the box clustering result; according to the deep layer of each bounding box in each image in multiple unlabeled images The feature, the frame clustering result, and the inference result of each bounding box in each image in the multiple unlabeled images determine the fifth hard case value of each image in the multiple unlabeled images.
本实施例中,如图16所示,AI平台可以获取第二特征提取模型,第二特征提取模型可以与上述提到的第一特征提取模块相同,可以是CNN。然后AI平台将训练图像集中各图像输入至第二特征提取模型,确定出各图像中每个边界框的深层特征。AI平台还可以将未标注的多个图像中各图像也输入至第二特征提取模型,确定出各图像中每个边界框的深层特征。每个边界框的深层特征可以使用一个一维数组表示,且每个边界框的深层特征的一维数组的维度相等。In this embodiment, as shown in FIG. 16, the AI platform can obtain a second feature extraction model, and the second feature extraction model can be the same as the first feature extraction module mentioned above, and can be a CNN. Then the AI platform inputs each image in the training image set to the second feature extraction model, and determines the deep features of each bounding box in each image. The AI platform can also input each of the multiple unlabeled images to the second feature extraction model to determine the deep features of each bounding box in each image. The deep features of each bounding box can be represented by a one-dimensional array, and the one-dimensional arrays of the deep features of each bounding box have the same dimensions.
然后AI平台可以将训练图像集中各图像中每个边界框的深层特征,输入至聚类算法(聚类算法可以是任何一种聚类算法,如K-means聚类算法等)中,得到边界框聚类结果。边界框聚类结果中包括多个边界框组,每个边界框组中包括一个或多个边界框。Then the AI platform can input the deep features of each bounding box in each image in the training image set into the clustering algorithm (the clustering algorithm can be any clustering algorithm, such as K-means clustering algorithm, etc.) to obtain the boundary Box clustering results. The bounding box clustering result includes multiple bounding box groups, and each bounding box group includes one or more bounding boxes.
对于每个边界框组,AI平台可以确定该边界框组中每个边界框的第i维的数值的平均值。例如,边界框组中包括3个边界框,每个边界框的深层特征使用三维数组表示,3 个边界框的深层特征依次为(7,2,5)、(4,2,4)、(4,14,9),第1维的数值的平均值为3,第2维的数值的平均值为4,第3维的数值的平均值为6,这样,该边界框组中的中心为(5,6,6)。这样,依照此方式可以确定每个边界框组的中心。For each bounding box group, the AI platform can determine the average value of the i-th dimension of each bounding box in the bounding box group. For example, the bounding box group includes 3 bounding boxes, and the deep features of each bounding box are represented by a three-dimensional array. The deep features of the three bounding boxes are (7, 2, 5), (4, 2, 4), ( 4,14,9), the average value of the first dimension is 3, the average value of the second dimension is 4, and the average value of the third dimension is 6, so the center of the bounding box group is (5, 6, 6). In this way, the center of each bounding box group can be determined in this way.
对于未标注的多个图像中的任一图像中任一边界框,AI平台可以确定该边界框的深层特征与图像聚类结果中每个边界框组的中心的距离,具体处理是计算该边界框与中心的欧式距离,公式可以是:d=√(∑_(i=1)^N(x_1i-x_2i)^2)。其中,i为深层特征中的任一维度,N为深层特征中的总维度数。x_1i为该边界框的深层特征中的第i维,x_2i为中心的深层特征中的第i维。将距离最小的边界框组,确定为该边界框所属的图像组。同样,将未标注的多个图像中的任一边界框聚类到已有的边界框组,也可以使用K-means聚类方法,确定任一边界框所属的边界框组。另外也可以使用其他方式进行聚类。判断该边界框与边界框组中边界框的类别是否相同。若相同,则确定难例值为c。若不相同,则确定难例值为d。对于每帧图像将所有边界框的难例值相加,即得到每帧图像的第五难例值。For any bounding box in any one of the unlabeled images, the AI platform can determine the distance between the deep features of the bounding box and the center of each bounding box group in the image clustering result. The specific processing is to calculate the boundary The Euclidean distance between the frame and the center, the formula can be: d=√(∑_(i=1)^N(x_1i-x_2i)^2). Among them, i is any dimension in the deep features, and N is the total number of dimensions in the deep features. x_1i is the i-th dimension in the deep features of the bounding box, and x_2i is the i-th dimension in the deep features of the center. Determine the bounding box group with the smallest distance as the image group to which the bounding box belongs. Similarly, clustering any bounding box in multiple unlabeled images into an existing bounding box group can also use the K-means clustering method to determine the bounding box group to which any bounding box belongs. In addition, other methods can also be used for clustering. Determine whether the category of the bounding box and the bounding box in the bounding box group are the same. If they are the same, determine the hard case value as c. If they are not the same, the hard case value is determined to be d. For each frame of image, the hard case values of all bounding boxes are added together to obtain the fifth hard case value of each frame of image.
步骤1503,AI平台根据第四难例值和第五难例值中的一个或多个,确定未标注的多个图像各图像的目标难例值。In step 1503, the AI platform determines the target difficulty value of each image of the multiple unlabeled images according to one or more of the fourth difficulty value and the fifth difficulty value.
本实施例中,对于未标注的多个图像中任一图像,AI平台可以使用该图像的第四难例值、第五难例值中的一个或多个,确定出该图像的目标难例值。具体可以是AI平台可以将第四难例值确定为目标难例值,也可以将第五难例值确定为目标难例值,还可以将第四难例值和第五难例值加权后,得到目标难例。在同时使用第四难例值和第五难例值时,由于同时考虑两种层面的难例值,所以确定出的目标难例值更准确。In this embodiment, for any image among the multiple unlabeled images, the AI platform can use one or more of the fourth and fifth difficulty values of the image to determine the target difficulty of the image value. Specifically, the AI platform can determine the fourth difficulty value as the target difficulty value, or the fifth difficulty value as the target difficulty value, and the fourth difficulty value and the fifth difficulty value can be weighted. , It is difficult to get the target. When using the fourth and fifth difficulty values at the same time, since the two levels of difficulty values are considered at the same time, the determined target difficulty value is more accurate.
步骤1504,AI平台将未标注的多个图像中目标难例值最大的第一数目个图像,确定为未标注的多个图像中的难例。In step 1504, the AI platform determines the first number of images with the largest target difficulty value among the unlabeled images as the difficult examples among the unlabeled images.
其中,第一数目可以预设,存储在AI平台的数据存储模块中。Among them, the first number can be preset and stored in the data storage module of the AI platform.
本实施例中,AI平台可以按照目标难例值从大到小的顺序,将未标注的多个图像进行排序,选取排序最前的第一数目个图像,确定为未标注的多个图像中的难例。In this embodiment, the AI platform can sort the unlabeled images in the descending order of the target difficult case value, select the first number of images that are ranked first, and determine it as one of the unlabeled images. Hard case.
本申请实施例中,AI平台可以获取未标注的多个图像,将未标注的多个图像输入至初始AI模型,得到未标注的多个图像中各数据的标注结果,然后使用未标注的多个图像中各图像的标注结果,确定未标注的多个图像中的难例,基于难例,重新对初始AI模型进行训练,得到优化AI模型。由于在AI平台中使用难例进行训练初始AI模型,所以可以使训练完成的优化AI模型的推理准确率更高。In this embodiment of the application, the AI platform can obtain multiple unlabeled images, input the multiple unlabeled images into the initial AI model, obtain the labeling results of each data in the multiple unlabeled images, and then use the unlabeled multiple images. Based on the labeling results of each image in each image, the difficult cases in the multiple unlabeled images are determined, and based on the difficult cases, the initial AI model is retrained to obtain the optimized AI model. Because difficult examples are used in the AI platform to train the initial AI model, the optimized AI model after training can be more accurate in reasoning.
以下本申请实施例中还提供了优化AI模型的方法,如图17所示,处理可以为:The following embodiments of the present application also provide a method for optimizing the AI model. As shown in FIG. 17, the processing may be:
步骤1701,AI平台根据训练图像集对初始AI模型进行训练,获得优化AI模型。Step 1701: The AI platform trains the initial AI model according to the training image set to obtain an optimized AI model.
本实施例中,训练图像集是用户提供给AI平台的图像集。训练图像集可以仅包括未标注的多个图像。训练图像集可包括未标注的多个图像和带标注的多个图像。In this embodiment, the training image set is the image set provided by the user to the AI platform. The training image set may only include multiple unlabeled images. The training image set may include multiple unlabeled images and multiple labeled images.
在训练图像集可以仅包括未标注的多个图像时,优化初始AI模型的过程可以见图5所示的流程。在训练图像集包括未标注的多个图像和带标注的多个图像时,可以首先使用带标注的多个图像训练出初始AI模型,基于初始AI模型,确定未标注的多个图像的标注结果。基于标注结果确定难例,基于难例训练初始AI模型,获得优化AI模型。此 处的处理可以参见图5所示额流程。When the training image set may only include multiple unlabeled images, the process of optimizing the initial AI model can be shown in the process shown in FIG. 5. When the training image set includes multiple unlabeled images and multiple labeled images, you can first use the multiple labeled images to train the initial AI model, and based on the initial AI model, determine the labeling results of the multiple unlabeled images . Determine difficult cases based on the labeling results, train the initial AI model based on the difficult cases, and obtain an optimized AI model. The processing here can refer to the flow shown in Figure 5.
步骤1702,AI平台接收推理图像集,根据优化AI模型对推理图像集中的每个推理图像进行推理,获得推理结果。In step 1702, the AI platform receives the reasoning image set, and performs reasoning on each reasoning image in the reasoning image set according to the optimized AI model to obtain the reasoning result.
本实施例中,若想使用优化AI模型进行推理,则用户可以上传推理图像集,将推理图像集中图像输入优化AI模型,得到推理结果。In this embodiment, if you want to use the optimized AI model for reasoning, the user can upload a set of reasoning images, and input the concentrated images of the reasoning images into the optimized AI model to obtain the reasoning result.
步骤1703,AI平台根据推理结果,确定推理图像集中的难例,其中,难例指示通过优化AI模型进行推理获得的推理结果的错误率高于目标阈值的推理图像。In step 1703, the AI platform determines the hard cases in the reasoning image set according to the reasoning results, where the hard cases indicate the reasoning images whose error rate of the reasoning results obtained by optimizing the AI model for reasoning is higher than the target threshold.
其中,推理结果与标注结果等同。Among them, the inference result is equivalent to the annotation result.
本实施例中,此过程可以参见步骤503的处理,与步骤503的区别仅在于此处是推理图像集,步骤503中是未标注的多个图像,实际上均是未标注的多个图像。详细过程参见步骤503的描述。In this embodiment, this process can be referred to the processing of step 503. The difference from step 503 is only that this is the inference image set. In step 503, there are multiple unlabeled images, which are actually multiple unlabeled images. See the description of step 503 for the detailed process.
步骤1704,根据难例对优化AI模型进行训练,获得再优化AI模型。In step 1704, the optimized AI model is trained according to the difficult cases, and the re-optimized AI model is obtained.
本实施例中,在确定出推理图像集中的难例之后,可以对优化AI模型进行继续训练,得到再优化AI模型(训练过程可参见前面的描述)。In this embodiment, after the difficult cases in the reasoning image set are determined, the optimized AI model can be continuously trained to obtain the re-optimized AI model (see the previous description for the training process).
由于又使用难例对优化AI模型进行训练,所以得到的再优化AI模型的推理能力更强。Since difficult examples are used to train the optimized AI model, the resulting re-optimized AI model has stronger reasoning ability.
需要说明的是,上述提供AI模型的方法可以由AI平台100上的一个或多个模块共同实现,具体的,用户I/O模块,用于实现图5中的步骤501、图9中的步骤901和步骤902。难例挖掘模块,用于实现图5中的步骤503、步骤904、图11所示的流程、图14所示的流程、图15所示的流程以及图17中步骤1703。模型训练模块用于实现图5中的步骤502、步骤504、图9中的步骤905、图17中的步骤1704。推理模块用于实现图9所示的步骤903、图17所示的步骤1702。It should be noted that the above method of providing an AI model can be jointly implemented by one or more modules on the AI platform 100. Specifically, the user I/O module is used to implement steps 501 in FIG. 5 and steps in FIG. 9 901 and step 902. The difficult example mining module is used to implement step 503 and step 904 in FIG. 5, the flow shown in FIG. 11, the flow shown in FIG. 14, the flow shown in FIG. 15, and the step 1703 in FIG. The model training module is used to implement step 502, step 504 in FIG. 5, step 905 in FIG. 9, and step 1704 in FIG. 17. The reasoning module is used to implement step 903 shown in FIG. 9 and step 1702 shown in FIG. 17.
本申请还提供一种如图4所示的计算设备400,计算设备400中的处理器402读取存储器401存储的程序和图像集合以执行前述AI平台执行的方法。The present application also provides a computing device 400 as shown in FIG. 4. The processor 402 in the computing device 400 reads the program and image collection stored in the memory 401 to execute the method executed by the aforementioned AI platform.
由于本申请提供的AI平台100中的各个模块可以分布式地部署在同一环境或不同环境中的多个计算机上,因此,本申请还提供一种如图18所示的计算设备,该计算设备包括多个计算机1800,每个计算机1800包括存储器1801、处理器1802、通信接口1803以及总线1804。其中,存储器1801、处理器1802、通信接口1803通过总线1804实现彼此之间的通信连接。Since each module in the AI platform 100 provided in this application can be distributed on multiple computers in the same environment or in different environments, this application also provides a computing device as shown in FIG. 18, the computing device A plurality of computers 1800 are included, and each computer 1800 includes a memory 1801, a processor 1802, a communication interface 1803, and a bus 1804. Among them, the memory 1801, the processor 1802, and the communication interface 1803 implement communication connections between each other through the bus 1804.
存储器1801可以是只读存储器,静态存储设备,动态存储设备或者随机存取存储器。存储器1801可以存储程序,当存储器1801中存储的程序被处理器502执行时,处理器1802和通信接口1803用于执行AI平台为获取AI模型的部分方法。存储器还可以存储图像集合,例如:存储器1801中的一部分存储资源被划分成一个图像集存储模块,用于存储AI平台所需的图像集,存储器1801中的一部分存储资源被划分成一个AI模型存储模块,用于存储AI模型库。The memory 1801 may be a read-only memory, a static storage device, a dynamic storage device, or a random access memory. The memory 1801 may store a program. When the program stored in the memory 1801 is executed by the processor 502, the processor 1802 and the communication interface 1803 are used to execute a part of the method used by the AI platform to obtain an AI model. The memory can also store image collections. For example, part of the storage resources in the memory 1801 is divided into an image collection storage module for storing image collections required by the AI platform, and part of the storage resources in the memory 1801 is divided into an AI model storage module. Module, used to store AI model library.
处理器1802可以采用通用的中央处理器,微处理器,应用专用集成电路,图形处理器或者一个或多个集成电路。The processor 1802 may adopt a general-purpose central processing unit, a microprocessor, an application specific integrated circuit, a graphics processor, or one or more integrated circuits.
通信接口1803使用例如但不限于收发器一类的收发模块,来实现计算机1800与其 他设备或通信网络之间的通信。例如,可以通过通信接口1803获取图像集。The communication interface 1803 uses a transceiver module such as but not limited to a transceiver to implement communication between the computer 1800 and other devices or communication networks. For example, the image collection can be acquired through the communication interface 1803.
总线504可包括在计算机1800各个部件(例如,存储器1801、处理器1802、通信接口1803)之间传送信息的通路。The bus 504 may include a path for transferring information between various components of the computer 1800 (for example, the memory 1801, the processor 1802, and the communication interface 1803).
上述每个计算机1800间通过通信网络建立通信通路。每个计算机1800上运行用户I/O模块101、难例挖掘模块102、模型训练模块103、推理模块104、AI模型存储模块105、数据存储模块106和数据预处理模块107中的任意一个或多个。任一计算机1800可以为云数据中心中的计算机(例如:服务器),或边缘数据中心中的计算机,或终端计算设备。Each of the above-mentioned computers 1800 establishes a communication path through a communication network. Each computer 1800 runs any one or more of the user I/O module 101, the difficult case mining module 102, the model training module 103, the inference module 104, the AI model storage module 105, the data storage module 106, and the data preprocessing module 107. A. Any computer 1800 may be a computer in a cloud data center (for example, a server), a computer in an edge data center, or a terminal computing device.
上述各个附图对应的流程的描述各有侧重,某个流程中没有详述的部分,可以参见其他流程的相关描述。The descriptions of the processes corresponding to each of the above drawings have their respective focuses. For parts that are not described in detail in a certain process, please refer to the related descriptions of other processes.
在上述实施例中,可以全部或部分地通过软件、硬件或者其组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。提供AI平台的计算机程序产品包括一个或多个进AI平台的计算机指令,在计算机上加载和执行这些计算机程序指令时,全部或部分地产生按照本申请实施例图5、图11、图14或图15所述的流程或功能。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, or a combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product that provides the AI platform includes one or more computer instructions that enter the AI platform. When these computer program instructions are loaded and executed on the computer, they are generated in whole or in part according to the embodiments of the present application as shown in Figure 5, Figure 11, Figure 14 or The process or function described in Figure 15.
所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、双绞线或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质存储有提供AI平台的计算机程序指令。所述计算机可读存储介质可以是计算机能够存取的任何介质或者是包含一个或多个介质集成的服务器、数据中心等数据存储设备。所述介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,光盘)、或者半导体介质(例如SSD)。The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center. Transmission to another website site, computer, server, or data center via wired (such as coaxial cable, optical fiber, twisted pair, or wireless (such as infrared, wireless, microwave, etc.)). The computer-readable storage medium stores and provides The computer program instructions of the AI platform. The computer-readable storage medium can be any medium that can be accessed by a computer or a data storage device such as a server or data center that includes one or more integrated media. The medium can be a magnetic medium, (For example, floppy disk, hard disk, magnetic tape), optical medium (for example, optical disc), or semiconductor medium (for example, SSD).

Claims (32)

  1. 一种提供人工智能AI模型的方法,其特征在于,所述方法包括:A method for providing an artificial intelligence AI model, characterized in that the method includes:
    AI平台接收第一用户的未标注的多个图像,所述第一用户为在所述AI平台注册账号的实体;The AI platform receives a plurality of unlabeled images of a first user, and the first user is an entity that has registered an account on the AI platform;
    所述AI平台根据初始AI模型标注所述多个图像;The AI platform annotates the multiple images according to the initial AI model;
    所述AI平台根据标注结果确定所述多个图像中的难例;The AI platform determines the difficult examples in the multiple images according to the annotation result;
    所述AI平台利用所述难例训练所述初始AI模型以获得优化AI模型。The AI platform uses the difficult examples to train the initial AI model to obtain an optimized AI model.
  2. 根据权利要求1所述的方法,其特征在于,所述AI平台根据标注结果确定所述多个图像中的难例,包括:The method according to claim 1, wherein the AI platform determines the difficult examples in the plurality of images according to the annotation result, comprising:
    所述AI平台向所述第一用户提供确认界面,在所述确认界面中向所述第一用户展示候选难例,所述候选难例为所述多个图像中的至少一个图像;The AI platform provides a confirmation interface to the first user, and displays a candidate difficult example to the first user in the confirmation interface, where the candidate difficult example is at least one image of the plurality of images;
    所述AI平台根据所述第一用户在所述确认界面上的操作,确定所述候选难例中的难例。The AI platform determines the hard examples among the candidate hard examples according to the operation of the first user on the confirmation interface.
  3. 根据权利要求1或2所述的方法,其特征在于,The method according to claim 1 or 2, characterized in that:
    所述方法还包括:所述AI平台接收所述第一用户对所述难例的矫正标注;The method further includes: the AI platform receives correction annotations of the first user for the difficult case;
    所述AI平台利用所述难例训练所述初始AI模型以获得优化AI模型包括:所述AI平台利用所述难例和对应的矫正标注训练所述初始AI模型以获得所述优化AI模型。The AI platform training the initial AI model by using the difficult example to obtain the optimized AI model includes: the AI platform training the initial AI model by using the difficult example and the corresponding correction annotation to obtain the optimized AI model.
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    所述AI平台从所述第一用户获取带标注的一个或多个图像;The AI platform obtains one or more tagged images from the first user;
    所述AI平台利用带标注的一个或多个图像获得所述初始AI模型。The AI platform obtains the initial AI model by using one or more labeled images.
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-4, wherein the method further comprises:
    所述AI平台将所述优化AI模型提供给第二用户的设备,以使得所述设备用所述优化AI模型执行任务目标;The AI platform provides the optimized AI model to the device of the second user, so that the device uses the optimized AI model to perform task goals;
    or
    所述AI平台接收所述第二用户的设备发送的推理图像,利用所述优化AI模型对所述推理图像进行推理,并向所述第二用户的设备提供推理结果。The AI platform receives the inference image sent by the device of the second user, uses the optimized AI model to perform inference on the inference image, and provides the inference result to the device of the second user.
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述AI平台根据初始AI模型标注所述多个未标注的图像,包括:The method according to any one of claims 1-5, wherein the AI platform annotating the plurality of unlabeled images according to an initial AI model comprises:
    所述AI平台向所述第一用户提供标注选择界面,所述标注选择界面上包括所述第一用户可选择的至少一种标注方式;The AI platform provides a label selection interface to the first user, and the label selection interface includes at least one labeling method selectable by the first user;
    所述AI平台接收所述第一用户选择的标注方式,根据所述第一用户选择的标注方式对应的所述初始AI模型标注所述未标注的多个图像。The AI platform receives the labeling method selected by the first user, and labels the unlabeled multiple images according to the initial AI model corresponding to the labeling method selected by the first user.
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述AI平台根据初始AI模型标注所述多个图像包括:根据所述初始AI模型对所述多个图像分类和/或根据所述初始AI模型对所述多个图像执行物体检测。The method according to any one of claims 1-6, wherein the AI platform annotating the plurality of images according to an initial AI model comprises: classifying the plurality of images according to the initial AI model and/or Perform object detection on the plurality of images according to the initial AI model.
  8. 一种人工智能AI平台,其特征在于,所述AI平台包括:An artificial intelligence AI platform, characterized in that the AI platform includes:
    用户输入输出I/O模块,用于接收第一用户的未标注的多个图像,所述第一用户为在所述AI平台注册账号的实体;A user input and output I/O module, configured to receive a plurality of unlabeled images of a first user, the first user being an entity that has registered an account on the AI platform;
    数据预处理模块,用于根据初始AI模型标注所述多个图像;A data preprocessing module, configured to annotate the multiple images according to the initial AI model;
    难例挖掘模块,用于根据标注结果确定所述多个图像中的难例;A difficult case mining module, which is used to determine difficult cases in the multiple images according to the labeling results;
    模型训练模块,用于利用所述难例训练所述初始AI模型以获得优化AI模型。The model training module is used to train the initial AI model using the difficult examples to obtain an optimized AI model.
  9. 根据权利要求8所述的AI平台,其特征在于,The AI platform according to claim 8, wherein:
    所述用户I/O模块,还用于向所述第一用户提供确认界面,在所述确认界面中向所述第一用户展示候选难例,所述候选难例为所述多个图像中的至少一个图像;The user I/O module is further configured to provide a confirmation interface to the first user, in the confirmation interface to display candidate difficult examples to the first user, and the candidate difficult examples are among the plurality of images At least one image of
    所述难例挖掘模块,还用于根据所述第一用户在所述确认界面上的操作,确定所述候选难例中的难例。The difficult case mining module is further configured to determine the difficult case among the candidate difficult cases according to the operation of the first user on the confirmation interface.
  10. 根据权利要求8或9所述的AI平台,其特征在于,The AI platform according to claim 8 or 9, characterized in that:
    所述用户I/O模块,还用于接收所述第一用户对所述难例的矫正标注;The user I/O module is further configured to receive correction annotations of the first user for the difficult case;
    所述模型训练模块,用于利用所述难例和对应的矫正标注训练所述初始AI模型以获得所述优化AI模型。The model training module is configured to train the initial AI model by using the difficult examples and corresponding correction annotations to obtain the optimized AI model.
  11. 根据权利要求8-10任一项所述的AI平台,其特征在于,The AI platform according to any one of claims 8-10, wherein:
    所述用户I/O模块,还用于从所述第一用户获取带标注的一个或多个图像;The user I/O module is further configured to obtain one or more tagged images from the first user;
    所述模型训练模块,还用于利用带标注的一个或多个图像获得所述初始AI模型。The model training module is also used to obtain the initial AI model by using one or more labeled images.
  12. 根据权利要求8-11任一项所述的AI平台,其特征在于,The AI platform according to any one of claims 8-11, wherein:
    所述用户I/O模块,还用于将所述优化AI模型提供给第二用户的设备,以使得所述设备用所述优化AI模型执行任务目标;The user I/O module is further configured to provide the optimized AI model to a device of a second user, so that the device uses the optimized AI model to perform task goals;
    or
    所述AI平台还包括推理模块,The AI platform also includes an inference module,
    所述用户I/O模块,还用于接收所述第二用户的设备发送的推理图像;The user I/O module is further configured to receive the inference image sent by the second user's device;
    所述推理模块,用于利用所述优化AI模型对所述推理图像进行推理;The reasoning module is configured to use the optimized AI model to reason about the reasoning image;
    所述用户I/O模块,还用于向所述第二用户的设备提供推理结果。The user I/O module is also used to provide a reasoning result to the device of the second user.
  13. 根据权利要求8-12任一项所述的AI平台,其特征在于,The AI platform according to any one of claims 8-12, wherein:
    所述用户I/O模块,还用于向所述第一用户提供标注选择界面,所述标注选择界面上包括所述第一用户可选择的至少一种标注方式;The user I/O module is further configured to provide a label selection interface to the first user, and the label selection interface includes at least one labeling method selectable by the first user;
    所述用户I/O模块,还用于接收所述第一用户选择的标注方式;The user I/O module is further configured to receive the marking method selected by the first user;
    所述数据预处理模块,用于根据所述第一用户选择的标注方式对应的所述初始AI模型标注所述未标注的多个图像。The data preprocessing module is configured to annotate the plurality of unlabeled images according to the initial AI model corresponding to the annotation mode selected by the first user.
  14. 根据权利要求8-13任一项所述的AI平台,其特征在于,所述数据预处理模块,用于根据所述初始AI模型对所述多个图像分类和/或根据所述初始AI模型对所述多个图像执行物体检测。The AI platform according to any one of claims 8-13, wherein the data preprocessing module is configured to classify the plurality of images according to the initial AI model and/or according to the initial AI model Object detection is performed on the plurality of images.
  15. 一种优化人工智能AI模型的方法,其特征在于,所述方法包括:A method for optimizing an artificial intelligence AI model, characterized in that the method includes:
    根据训练图像集对初始AI模型进行训练,获得优化AI模型;Train the initial AI model according to the training image set to obtain an optimized AI model;
    接收推理图像集,根据所述优化AI模型对所述推理图像集中的每个推理图像进行推理,获得推理结果;Receiving a reasoning image set, and performing reasoning on each reasoning image in the reasoning image set according to the optimized AI model to obtain a reasoning result;
    根据所述推理结果,确定所述推理图像集中的难例,其中,所述难例指示通过所述优化AI模型进行推理获得的推理结果的错误率高于目标阈值的推理图像;According to the reasoning result, determine the hard cases in the reasoning image set, where the hard cases indicate reasoning images in which the error rate of the reasoning result obtained by reasoning through the optimized AI model is higher than the target threshold;
    根据所述难例对所述优化AI模型进行训练,获得再优化AI模型。Training the optimized AI model according to the difficult examples to obtain a re-optimized AI model.
  16. 根据权利要求15所述的方法,其特征在于,The method of claim 15, wherein:
    所述根据所述推理结果,确定所述推理图像集中的难例,包括:The determining the difficult cases in the reasoning image set according to the reasoning result includes:
    确定所述推理图像集为视频片段;Determining that the inference image set is a video segment;
    根据所述推理图像集中各图像的推理结果,确定所述推理图像集中的难例;Determine the difficult cases in the reasoning image set according to the reasoning result of each image in the reasoning image set;
    or
    确定所述推理图像集为非视频片段,根据所述推理图像集中各图像的推理结果和所述训练图像集,确定所述推理图像集中的难例。It is determined that the reasoning image set is a non-video segment, and the hard cases in the reasoning image set are determined according to the reasoning result of each image in the reasoning image set and the training image set.
  17. 根据权利要求16所述的方法,其特征在于,The method of claim 16, wherein:
    所述根据所述推理图像集中各图像的推理结果,确定所述推理图像集中的难例,包括:The determining the difficult cases in the reasoning image set according to the reasoning result of each image in the reasoning image set includes:
    确定所述推理图像集中的目标图像,其中,所述目标图像的推理结果与所述目标图像在所述视频片段中的相邻的图像的推理结果不相同;Determining a target image in the inference image set, wherein the inference result of the target image is different from the inference result of the adjacent image of the target image in the video segment;
    将所述目标图像确定为所述推理图像中的难例。The target image is determined as a difficult example in the reasoning image.
  18. 根据权利要求16或17所述的方法,其特征在于,The method according to claim 16 or 17, characterized in that:
    所述根据所述推理图像集中各图像的推理结果和所述训练图像集,确定所述推理图像集中的难例,包括:The determining the difficult cases in the reasoning image set according to the reasoning result of each image in the reasoning image set and the training image set includes:
    获取所述推理图像集中各图像在各类别下的置信度,根据所述推理图像集中各图像的最高的两个置信度,确定所述推理图像集中各图像的第一难例值;Acquiring the confidence of each image in the reasoning image set in each category, and determining the first hard case value of each image in the reasoning image set according to the two highest confidences of each image in the reasoning image set;
    获取所述训练图像集中图像的表层特征分布信息,根据所述表层特征分布信息和所述推理图像集中各图像的表层特征,确定所述推理图像集中各图像的第二难例值;Acquiring the surface feature distribution information of the images in the training image set, and determining the second hard case value of each image in the inference image set according to the surface feature distribution information and the surface features of each image in the inference image set;
    获取所述训练图像集中各图像的深层特征和所述推理图像集中各图像的深层特征,根据所述训练图像集中各图像的深层特征,对所述训练图像集中各图像进行聚类处理,得到图像聚类结果;根据所述推理图像集中各图像的深层特征、所述图像聚类结果和所述推理图像集中各图像的推理结果,确定所述推理图像集中各图像的第三难例值;Acquire the deep features of each image in the training image set and the deep features of each image in the inference image set, and perform clustering processing on each image in the training image set according to the deep features of each image in the training image set to obtain an image Clustering result; according to the deep features of each image in the reasoning image set, the image clustering result, and the reasoning result of each image in the reasoning image set, determine the third hard case value of each image in the reasoning image set;
    根据所述第一难例值、第二难例值和所述第三难例值中的一个或多个,确定所述推理图像集中各图像的目标难例值;Determine the target hard case value of each image in the reasoning image set according to one or more of the first hard case value, the second hard case value, and the third hard case value;
    将所述推理图像集中目标难例值最大的第一数目个图像,确定为所述推理图像集中的难例。The first number of images with the largest target difficulty example value in the reasoning image set is determined as the hard example in the reasoning image set.
  19. 根据权利要求16所述的方法,其特征在于,The method of claim 16, wherein:
    所述根据所述推理图像集中各图像的推理结果,确定所述推理图像集中的难例,包括:The determining the difficult cases in the reasoning image set according to the reasoning result of each image in the reasoning image set includes:
    对于所述推理图像集中第一图像的第一目标框,在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,判断是否存在所述第一目标框对应的相似框;For the first target frame of the first image in the inference image set, among the images in the video segment whose time-series interval from the first image is less than or equal to a second number, it is determined whether there is the first target frame. Similar boxes corresponding to the target box;
    若未存在所述第一目标框对应的相似框,则将所述第一目标框确定为难例框;If there is no similar frame corresponding to the first target frame, determine the first target frame as a difficult case frame;
    若存在所述第一目标框对应的相似框,且所述第一目标框所属的第一图像和所述相似框所属的第二图像在所述视频片段中不相邻,则根据所述第一目标框与所述相似框,确定所述第一图像和所述第二图像之间的图像中的难例框;If there is a similar frame corresponding to the first target frame, and the first image to which the first target frame belongs and the second image to which the similar frame belongs are not adjacent in the video segment, then according to the first image A target frame and the similar frame, determining a difficult frame in the image between the first image and the second image;
    根据所述推理图像集中各图像的难例框的数目,确定所述推理图像集中的难例。According to the number of difficult cases in each image in the reasoning image set, the hard cases in the reasoning image set are determined.
  20. 根据权利要求19所述的方法,其特征在于,所述在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,判断是否存在所述第一目标框对应的相似框,包括:20. The method according to claim 19, characterized in that, among the images whose time sequence interval from the first image in the video segment is less than or equal to a second number, it is determined whether there is the first image. Similar boxes corresponding to the target box, including:
    在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,确定与第一目标框的相似度最高的追踪框;Determine the tracking frame with the highest similarity to the first target frame among the images in the video segment whose time-series interval with the first image is less than or equal to a second number;
    根据所述追踪框、在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中的所有的边界框和所述第一目标框,确定所述第一目标框与各边界框的重叠率;Determine the first target frame according to the tracking frame, all bounding boxes in the image whose time-series interval from the first image in the video segment is less than or equal to a second number, and the first target frame An overlap ratio of the target box and each bounding box;
    若存在重叠率大于第二数值的边界框,则将重叠率大于第二数值的边界框,确定为所述第一目标框对应的相似框;If there is a bounding box with an overlap rate greater than the second value, determine the bounding box with an overlap rate greater than the second value as a similar box corresponding to the first target box;
    若不存在重叠率大于第二数值的边界框,则确定未存在所述第一目标框对应的相似框。If there is no bounding box with an overlap ratio greater than the second value, it is determined that there is no similar box corresponding to the first target box.
  21. 根据权利要求16、19或20中任一项所述的方法,其特征在于,The method according to any one of claims 16, 19 or 20, characterized in that:
    所述根据所述推理图像集中各图像的推理结果和所述训练图像集,确定所述推理图像集中的难例,包括:The determining the difficult cases in the reasoning image set according to the reasoning result of each image in the reasoning image set and the training image set includes:
    获取所述训练图像集中图像的表层特征分布信息,根据所述训练图像集中图像的表层特征分布信息和所述推理图像集中图像的表层特征,确定所述推理图像集中各图像的第四难例值,其中,所述表层特征包括边界框的表层特征和图像的表层特征;Obtain the surface feature distribution information of the images in the training image set, and determine the fourth hard case value of each image in the inference image set based on the surface feature distribution information of the images in the training image set and the surface features of the images in the inference image set , Wherein the surface features include the surface features of the bounding box and the surface features of the image;
    获取所述训练图像集中各图像中每个框的深层特征和所述推理图像集中各图像中每个框的深层特征,根据所述训练图像集中各图像中每个框的深层特征,对所述训练图像集中各图像中每个框进行聚类处理,得到框聚类结果;根据所述推理图像集中各图像中每个框的深层特征、所述框聚类结果和所述推理图像集中各图像中每个框的推理结果,确定所述推理图像集中各图像的第五难例值;Obtain the deep features of each frame in each image in the training image set and the deep features of each frame in each image in the inference image set. According to the deep features of each frame in each image in the training image set, the Perform clustering processing on each frame in each image in the training image set to obtain a frame clustering result; according to the deep features of each frame in each image in the inference image set, the frame clustering result, and each image in the inference image set The reasoning result of each frame in the frame, determine the fifth difficult example value of each image in the reasoning image set;
    根据所述第四难例值和所述第五难例值中的一个或多个,确定所述推理图像集各图像的目标难例值;Determine the target hard case value of each image in the inference image set according to one or more of the fourth hard case value and the fifth hard case value;
    将所述推理图像集中目标难例值最大的第一数目个图像,确定为所述推理图像集中的难例。The first number of images with the largest target difficulty example value in the reasoning image set is determined as the hard example in the reasoning image set.
  22. 一种人工智能AI平台,其特征在于,所述AI平台包括:An artificial intelligence AI platform, characterized in that the AI platform includes:
    模型训练模块,用于根据训练图像集对初始AI模型进行训练,获得优化AI模型;The model training module is used to train the initial AI model according to the training image set to obtain an optimized AI model;
    推理模块,用于接收推理图像集,根据所述优化AI模型对所述推理图像集中的每个推理图像进行推理,获得推理结果;The reasoning module is configured to receive a reasoning image set, and perform reasoning on each reasoning image in the reasoning image set according to the optimized AI model to obtain a reasoning result;
    难例挖掘模块,用于根据所述推理结果,确定所述推理图像集中的难例,其中,所述难例指示通过所述优化AI模型进行推理获得的推理结果的错误率高于目标阈值的推理图像;The hard case mining module is used to determine the hard case in the reasoning image set according to the reasoning result, wherein the hard case indicates that the error rate of the reasoning result obtained by reasoning through the optimized AI model is higher than the target threshold Reasoning image
    所述模型训练模块,还用于根据所述难例对所述优化AI模型进行训练,获得再优化AI模型。The model training module is also used to train the optimized AI model according to the difficult case to obtain a re-optimized AI model.
  23. 根据权利要求22所述的AI平台,其特征在于,The AI platform of claim 22, wherein:
    所述难例挖掘模块,用于:The hard case mining module is used for:
    确定所述推理图像集为视频片段;Determining that the inference image set is a video segment;
    根据所述推理图像集中各图像的推理结果,确定所述推理图像集中的难例;Determine the difficult cases in the reasoning image set according to the reasoning result of each image in the reasoning image set;
    or
    确定所述推理图像集为非视频片段,根据所述推理图像集中各图像的推理结果和所述训练图像集,确定所述推理图像集中的难例。It is determined that the reasoning image set is a non-video segment, and the hard cases in the reasoning image set are determined according to the reasoning result of each image in the reasoning image set and the training image set.
  24. 根据权利要求23所述的AI平台,其特征在于,The AI platform of claim 23, wherein:
    所述难例挖掘模块,用于:The hard case mining module is used for:
    确定所述推理图像集中的目标图像,其中,所述目标图像的推理结果与所述目标图像在所述视频片段中的相邻的图像的推理结果不相同;Determining a target image in the inference image set, wherein the inference result of the target image is different from the inference result of the adjacent image of the target image in the video segment;
    将所述目标图像确定为所述推理图像中的难例。The target image is determined as a difficult example in the reasoning image.
  25. 根据权利要求23或24所述的AI平台,其特征在于,The AI platform according to claim 23 or 24, wherein:
    所述难例挖掘模块,用于:The hard case mining module is used for:
    获取所述推理图像集中各图像在各类别下的置信度,根据所述推理图像集中各图像的最高的两个置信度,确定所述推理图像集中各图像的第一难例值;Acquiring the confidence of each image in the reasoning image set in each category, and determining the first hard case value of each image in the reasoning image set according to the two highest confidences of each image in the reasoning image set;
    获取所述训练图像集中图像的表层特征分布信息,根据所述表层特征分布信息和所述推理图像集中各图像的表层特征,确定所述推理图像集中各图像的第二难例值;Acquiring the surface feature distribution information of the images in the training image set, and determining the second hard case value of each image in the inference image set according to the surface feature distribution information and the surface features of each image in the inference image set;
    获取所述训练图像集中各图像的深层特征和所述推理图像集中各图像的深层特征,根据所述训练图像集中各图像的深层特征,对所述训练图像集中各图像进行聚类处理,得到图像聚类结果;根据所述推理图像集中各图像的深层特征、所述图像聚类结果和所述推理图像集中各图像的推理结果,确定所述推理图像集中各图像的第三难例值;Acquire the deep features of each image in the training image set and the deep features of each image in the inference image set, and perform clustering processing on each image in the training image set according to the deep features of each image in the training image set to obtain an image Clustering result; according to the deep features of each image in the reasoning image set, the image clustering result, and the reasoning result of each image in the reasoning image set, determine the third hard case value of each image in the reasoning image set;
    根据所述第一难例值、第二难例值和所述第三难例值中的一个或多个,确定所述推理图像集中各图像的目标难例值;Determine the target hard case value of each image in the reasoning image set according to one or more of the first hard case value, the second hard case value, and the third hard case value;
    将所述推理图像集中目标难例值最大的第一数目个图像,确定为所述推理图像集中的难例。The first number of images with the largest target difficulty example value in the reasoning image set is determined as the hard example in the reasoning image set.
  26. 根据权利要求23所述的AI平台,其特征在于,The AI platform of claim 23, wherein:
    所述难例挖掘模块,用于:The hard case mining module is used for:
    对于所述推理图像集中第一图像的第一目标框,在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,判断是否存在所述第一目标框对应的相似框;For the first target frame of the first image in the inference image set, among the images in the video segment whose time-series interval from the first image is less than or equal to a second number, it is determined whether there is the first target frame. Similar boxes corresponding to the target box;
    若未存在所述第一目标框对应的相似框,则将所述第一目标框确定为难例框;If there is no similar frame corresponding to the first target frame, determine the first target frame as a difficult case frame;
    若存在所述第一目标框对应的相似框,且所述第一目标框所属的第一图像和所述相似框所属的第二图像在所述视频片段中不相邻,则根据所述第一目标框与所述相似框,确定所述第一图像和所述第二图像之间的图像中的难例框;If there is a similar frame corresponding to the first target frame, and the first image to which the first target frame belongs and the second image to which the similar frame belongs are not adjacent in the video segment, then according to the first image A target frame and the similar frame, determining a difficult frame in the image between the first image and the second image;
    根据所述推理图像集中各图像的难例框的数目,确定所述推理图像集中的难例。According to the number of difficult cases in each image in the reasoning image set, the hard cases in the reasoning image set are determined.
  27. 根据权利要求26所述的AI平台,其特征在于,所述难例挖掘模块,用于:The AI platform according to claim 26, wherein the hard case mining module is used for:
    在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中,确定与第一目标框的相似度最高的追踪框;Determine the tracking frame with the highest similarity to the first target frame among the images in the video segment whose time-series interval with the first image is less than or equal to a second number;
    根据所述追踪框、在所述视频片段中的与所述第一图像在时序上的间隔小于或等于第二数目的图像中的所有的边界框和所述第一目标框,确定所述第一目标框与各边界框的重叠率;Determine the first target frame according to the tracking frame, all bounding boxes in the image whose time-series interval from the first image in the video segment is less than or equal to a second number, and the first target frame An overlap ratio of the target box and each bounding box;
    若存在重叠率大于第二数值的边界框,则将重叠率大于第二数值的边界框,确定为所述第一目标框对应的相似框;If there is a bounding box with an overlap rate greater than the second value, determine the bounding box with an overlap rate greater than the second value as a similar box corresponding to the first target box;
    若不存在重叠率大于第二数值的边界框,则确定未存在所述第一目标框对应的相似框。If there is no bounding box with an overlap ratio greater than the second value, it is determined that there is no similar box corresponding to the first target box.
  28. 根据权利要求23、26或27中任一项所述的AI平台,其特征在于,The AI platform according to any one of claims 23, 26 or 27, wherein:
    所述难例挖掘模块,用于:The hard case mining module is used for:
    获取所述训练图像集中图像的表层特征分布信息,根据所述训练图像集中图像的表层特征分布信息和所述推理图像集中图像的表层特征,确定所述推理图像集中各图像的第四难例值,其中,所述表层特征包括边界框的表层特征和图像的表层特征;Obtain the surface feature distribution information of the images in the training image set, and determine the fourth hard case value of each image in the inference image set based on the surface feature distribution information of the images in the training image set and the surface features of the images in the inference image set , Wherein the surface features include the surface features of the bounding box and the surface features of the image;
    获取所述训练图像集中各图像中每个框的深层特征和所述推理图像集中各图像中每个框的深层特征,根据所述训练图像集中各图像中每个框的深层特征,对所述训练图像集中各图像中每个框进行聚类处理,得到框聚类结果;根据所述推理图像集中各图像中每个框的深层特征、所述框聚类结果和所述推理图像集中各图像中每个框的推理结果,确定所述推理图像集中各图像的第五难例值;Obtain the deep features of each frame in each image in the training image set and the deep features of each frame in each image in the inference image set. According to the deep features of each frame in each image in the training image set, the Perform clustering processing on each frame in each image in the training image set to obtain a frame clustering result; according to the deep features of each frame in each image in the inference image set, the frame clustering result, and each image in the inference image set The reasoning result of each frame in the frame, determine the fifth difficult example value of each image in the reasoning image set;
    根据所述第四难例值和所述第五难例值中的一个或多个,确定所述推理图像集各图像的目标难例值;Determine the target hard case value of each image in the inference image set according to one or more of the fourth hard case value and the fifth hard case value;
    将所述推理图像集中目标难例值最大的第一数目个图像,确定为所述推理图像集中的难例。The first number of images with the largest target difficulty example value in the reasoning image set is determined as the hard example in the reasoning image set.
  29. 一种计算设备,其特征在于,所述计算设备包括存储器和处理器,所述存储器用于存储一组计算机指令;A computing device, characterized in that the computing device includes a memory and a processor, and the memory is used to store a set of computer instructions;
    所述处理器执行所述存储器存储的一组计算机指令,以执行上述权利要求1至7中任一项所述的方法。The processor executes a set of computer instructions stored in the memory to execute the method according to any one of claims 1 to 7.
  30. 一种计算设备,其特征在于,所述计算设备包括存储器和处理器,所述存储器用于存储一组计算机指令;A computing device, characterized in that the computing device includes a memory and a processor, and the memory is used to store a set of computer instructions;
    所述处理器执行所述存储器存储的一组计算机指令,以执行上述权利要求15至21中任一项所述的方法。The processor executes a set of computer instructions stored in the memory to execute the method according to any one of claims 15 to 21.
  31. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算设备执行时,所述计算设备执行上述权利要求1至7中任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores computer program code, and when the computer program code is executed by a computing device, the computing device executes any of the above claims 1 to 7 The method described in one item.
  32. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算设备执行时,所述计算设备执行上述权利要求15至21中任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores computer program code, and when the computer program code is executed by a computing device, the computing device executes any of the foregoing claims 15 to 21 The method described in one item.
PCT/CN2020/097856 2019-09-17 2020-06-24 Method for providing ai model, ai platform, computing device, and storage medium WO2021051918A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910878323.3 2019-09-17
CN201910878323.3A CN112529026B (en) 2019-09-17 2019-09-17 Method for providing AI model, AI platform, computing device and storage medium

Publications (1)

Publication Number Publication Date
WO2021051918A1 true WO2021051918A1 (en) 2021-03-25

Family

ID=74883931

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097856 WO2021051918A1 (en) 2019-09-17 2020-06-24 Method for providing ai model, ai platform, computing device, and storage medium

Country Status (2)

Country Link
CN (2) CN112529026B (en)
WO (1) WO2021051918A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435409A (en) * 2021-07-23 2021-09-24 北京地平线信息技术有限公司 Training method and device of image recognition model, storage medium and electronic equipment
CN113468365A (en) * 2021-09-01 2021-10-01 北京达佳互联信息技术有限公司 Training method of image type recognition model, image retrieval method and device
CN113505261A (en) * 2021-08-04 2021-10-15 城云科技(中国)有限公司 Data annotation method and device and data annotation model training method and device
CN113705648A (en) * 2021-08-19 2021-11-26 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
CN116560857A (en) * 2023-06-29 2023-08-08 北京轻松筹信息技术有限公司 AGI platform call management method and device, storage medium and electronic equipment
CN116894986A (en) * 2023-09-11 2023-10-17 深圳亘存科技有限责任公司 Automatic labeling method, system and computer equipment
EP4163822A4 (en) * 2020-06-29 2023-12-20 Huawei Cloud Computing Technologies Co., Ltd. Data annotation method and apparatus, and computer device and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052328B (en) * 2021-04-02 2023-05-12 上海商汤科技开发有限公司 Deep learning model production system, electronic device, and storage medium
CN114418021B (en) * 2022-01-25 2024-03-26 腾讯科技(深圳)有限公司 Model optimization method, device and computer program product
WO2023179038A1 (en) * 2022-03-24 2023-09-28 华为云计算技术有限公司 Data labeling method, ai development platform, computing device cluster, and storage medium
CN114676790A (en) * 2022-04-12 2022-06-28 北京百度网讯科技有限公司 Object labeling method, object labeling device, object labeling model processing method, object labeling model processing device, object labeling model processing equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633232A (en) * 2017-09-26 2018-01-26 四川长虹电器股份有限公司 A kind of low-dimensional faceform's training method based on deep learning
WO2018184195A1 (en) * 2017-04-07 2018-10-11 Intel Corporation Joint training of neural networks using multi-scale hard example mining
CN108647577A (en) * 2018-04-10 2018-10-12 华中科技大学 A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
CN109815988A (en) * 2018-12-27 2019-05-28 北京奇艺世纪科技有限公司 Model generating method, classification method, device and computer readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372658A (en) * 2016-08-30 2017-02-01 广东工业大学 Vehicle classifier training method
CN106529424B (en) * 2016-10-20 2019-01-04 中山大学 A kind of logo detection recognition method and system based on selective search algorithm
CN109271877A (en) * 2018-08-24 2019-01-25 北京智芯原动科技有限公司 A kind of human figure identification method and device
CN110147709A (en) * 2018-11-02 2019-08-20 腾讯科技(深圳)有限公司 Training method, device, terminal and the storage medium of vehicle attribute model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018184195A1 (en) * 2017-04-07 2018-10-11 Intel Corporation Joint training of neural networks using multi-scale hard example mining
CN107633232A (en) * 2017-09-26 2018-01-26 四川长虹电器股份有限公司 A kind of low-dimensional faceform's training method based on deep learning
CN108647577A (en) * 2018-04-10 2018-10-12 华中科技大学 A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
CN109815988A (en) * 2018-12-27 2019-05-28 北京奇艺世纪科技有限公司 Model generating method, classification method, device and computer readable storage medium

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4163822A4 (en) * 2020-06-29 2023-12-20 Huawei Cloud Computing Technologies Co., Ltd. Data annotation method and apparatus, and computer device and storage medium
CN113435409A (en) * 2021-07-23 2021-09-24 北京地平线信息技术有限公司 Training method and device of image recognition model, storage medium and electronic equipment
CN113505261A (en) * 2021-08-04 2021-10-15 城云科技(中国)有限公司 Data annotation method and device and data annotation model training method and device
CN113505261B (en) * 2021-08-04 2024-02-02 城云科技(中国)有限公司 Data labeling method and device and data labeling model training method and device
CN113705648A (en) * 2021-08-19 2021-11-26 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
CN113705648B (en) * 2021-08-19 2024-03-01 杭州海康威视数字技术股份有限公司 Data processing method, device and equipment
CN113468365A (en) * 2021-09-01 2021-10-01 北京达佳互联信息技术有限公司 Training method of image type recognition model, image retrieval method and device
CN116560857A (en) * 2023-06-29 2023-08-08 北京轻松筹信息技术有限公司 AGI platform call management method and device, storage medium and electronic equipment
CN116560857B (en) * 2023-06-29 2023-09-22 北京轻松筹信息技术有限公司 AGI platform call management method and device, storage medium and electronic equipment
CN116894986A (en) * 2023-09-11 2023-10-17 深圳亘存科技有限责任公司 Automatic labeling method, system and computer equipment
CN116894986B (en) * 2023-09-11 2023-11-24 深圳亘存科技有限责任公司 Automatic labeling method, system and computer equipment

Also Published As

Publication number Publication date
CN112529026A (en) 2021-03-19
CN117893845A (en) 2024-04-16
CN112529026B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
WO2021051918A1 (en) Method for providing ai model, ai platform, computing device, and storage medium
JP6944548B2 (en) Automatic code generation
Gollapudi Learn computer vision using OpenCV
US11423076B2 (en) Image similarity-based group browsing
US10354159B2 (en) Methods and software for detecting objects in an image using a contextual multiscale fast region-based convolutional neural network
US10437878B2 (en) Identification of a salient portion of an image
US10503775B1 (en) Composition aware image querying
WO2021227726A1 (en) Methods and apparatuses for training face detection and image detection neural networks, and device
US9940577B2 (en) Finding semantic parts in images
US20180096457A1 (en) Methods and Software For Detecting Objects in Images Using a Multiscale Fast Region-Based Convolutional Neural Network
US9449026B2 (en) Sketch-based image search
WO2022001501A1 (en) Data annotation method and apparatus, and computer device and storage medium
US20150363660A1 (en) System for automated segmentation of images through layout classification
US10380461B1 (en) Object recognition
US11620330B2 (en) Classifying image styles of images based on image style embeddings
WO2023010758A1 (en) Action detection method and apparatus, and terminal device and storage medium
US10963700B2 (en) Character recognition
WO2021217543A1 (en) Image annotation method, apparatus, device and medium
JP6787831B2 (en) Target detection device, detection model generation device, program and method that can be learned by search results
US11468571B2 (en) Apparatus and method for generating image
Shete et al. TasselGAN: An application of the generative adversarial model for creating field-based maize tassel data
Zhao et al. Learning best views of 3D shapes from sketch contour
Shi et al. Weakly supervised deep learning for objects detection from images
Kapur et al. Mastering opencv android application programming
CN107948721A (en) The method and apparatus of pushed information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20864459

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20864459

Country of ref document: EP

Kind code of ref document: A1